4 Comments
User's avatar
Sandeep Devhare (Sandy)'s avatar

This is great insight thanks for sharing it.

Expand full comment
Bhavishya Pandit's avatar

Thanks for reading and supporting :)

Expand full comment
01¹¹a QQ q¹01's avatar

is it possible to implement this in ANY framework? I know the plain API call implicitly does caching.. but for some reason i want this in a framework like langchain..NO i dont want semantic caching. i want THIS. does ANY framework offer this? dspy does caching. but again that is not kv caching..

Expand full comment
Bhavishya Pandit's avatar

KV caching is not a framework think, it's more a model thing, it exists inside the model’s attention layers, implemented at the inference engine level. If you want explicit, controllable KV caching, you’ll need direct access to the model code OR any server that exposes its caching behaviour.

I hope this helps :)

Expand full comment