is it possible to implement this in ANY framework? I know the plain API call implicitly does caching.. but for some reason i want this in a framework like langchain..NO i dont want semantic caching. i want THIS. does ANY framework offer this? dspy does caching. but again that is not kv caching..
KV caching is not a framework think, it's more a model thing, it exists inside the model’s attention layers, implemented at the inference engine level. If you want explicit, controllable KV caching, you’ll need direct access to the model code OR any server that exposes its caching behaviour.
This is great insight thanks for sharing it.
Thanks for reading and supporting :)
is it possible to implement this in ANY framework? I know the plain API call implicitly does caching.. but for some reason i want this in a framework like langchain..NO i dont want semantic caching. i want THIS. does ANY framework offer this? dspy does caching. but again that is not kv caching..
KV caching is not a framework think, it's more a model thing, it exists inside the model’s attention layers, implemented at the inference engine level. If you want explicit, controllable KV caching, you’ll need direct access to the model code OR any server that exposes its caching behaviour.
I hope this helps :)