Skip to content

[FEATURE REQUEST] - Turbo Quant #2075

@SabinStargem

Description

@SabinStargem

It is looking like a new method for handling KV Cache has arrived, one that improves over Q8 KV, both in speed and reducing the memory footprint. IMO, it seems likely to become a feature in LlamaCPP, as I have seen several people use Turbo Quant.

The thread below details the subject.

TurboQuant - Extreme KV Cache Quantization

I figured that I should draw attention to this, as it seems very promising.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions