Caching & Best settings for using knowledgebase

Hi Team,

I want to use a large knowledge base for factual chatbot interactions, yet I want to keep costs down.
What do you suggest are the recommended Token distribution settings? I read on your blog 750 - 800 for memory buffer - would this limit allow for usage of a 200+ page knowledgebase?
Maximum output length
Maximum input length
Memory buffer

Im using GPT4o - but eyeing of Claude’s caching and wondering if it’s somehow integrated yet in Pickaxe and perhaps the model I should be using instead:
Prompt Caching (beta) - Anthropic

Cheers.
BTW Loving the product!

There are no best settings. Ideal settings differ from tool to tool.

Reducing costs - If you want to keep costs down, the most productive change to make is using GPT-4o mini. It is a very cheap model yet still smart.

Memory Buffer - Memory Buffer affects how much of the conversation the bot remembers. It is somewhat irrelevant for the knowledge base.

Max Input/Output - These settings really matter on your use case. Does the user need to enter really, really long inputs? Increase maximum input. Does your chatbot need to give really long answers and they’re getting cut off? Increase maximum output. These only refer to space reserved for the tokens and often is not actually used. If you set 10,000 tokens for the maximum input length but the end-user only enters in 10 tokens, then you’re only paying to process those 10 tokens.

Knowledge Base - An important setting is the amount setting under Knowledge Base Settings. I’ve included a screenshot below. This is the amount of tokens pulled from the knowledge base for reference with each query. Setting this higher makes the bot pull more information for each request. Setting it lower makes it pull less.