[FEATURE REQUEST] - Turbo Quant

It is looking like a new method for handling KV Cache has arrived, one that improves over Q8 KV, both in speed and reducing the memory footprint.   IMO, it seems likely to become a feature in LlamaCPP, as I have seen several people use Turbo Quant.

The thread below details the subject.

[TurboQuant - Extreme KV Cache Quantization](https://github.com/ggml-org/llama.cpp/discussions/20969)

I figured that I should draw attention to this, as it seems very promising.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST] - Turbo Quant #2075

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEATURE REQUEST] - Turbo Quant #2075

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions