Hey Minchor, this is an excellent question from someone who is clearly a detailed oriented user.
So first let’s discuss the difference between putting information inside the prompt itself versus adding it as a knowledge base.
Prompt
The prompt is the actual instructions that the model reads. The prompt is the best piece of real estate with which to control and influence the model’s outputs. That’s because every time the model runs it will look at the entire prompt. It will do this every time it generates a new output. There are limits to how much you can put into the prompt, but the newer models allow for over 120k tokens, which is quite a lot. The more tokens you are asking the model to read for each run, the costlier the request becomes though because you’re asking the model to do a lot of work.
Knowledge Base
A Knowledge Base works differently. This takes a large document (hundreds of pages long or millions of tokens long) and indexes it into a series of easily queryable chunks. It organizes it with a special indexing system so that it can always find relevant chunks. So if a user asks about “food”, it can grab a chunk that’s all about sandwiches but doesn’t mention “food directly”. Then, whenever there is a request, the model will grab 1000 tokens worth of the most relevant stuff from the knowledge base and inject it directly into the prompt as additional context. These 1000 tokens from the knowledge will then be read as thoroughly as if they are part of the prompt originally. You can also have it inject 2000 tokens worth. Or 3000 tokens. Or 4000. You can set this number in the Advanced Settings under “Tokens available for knowledge base”.
Prompt vs. Knowledge Base
The advantage of using a knowledge base versus always cramming everything into a prompt is because the Knowledge Base lets you add 10 million tokens of knowledge and only use the relevant pieces when it’s relevant to the request. This saves time, save money, and makes for better requests because the prompt isn’t bogged down with extraneous information during every request.
Costs
To answer your final question about cost, I encourage you to look at OpenAI’s token pricing chart. The models are fairly cheap (fractions of cents) for most requests which have normal prompts of 100-1,000 tokens. But if you make massive requests with 100k+ tokens in each one, they quickly add up.
I’ve shifted 98% of the context now into knowledge base but the output is still hallucinating stuff and not pulling from the knowledge base (despite me setting relevance threshold to 0 and allowing 21000 tokens from Knowledge Base).
Is there something specific I can say in the initial Role Window that can direct the model to go to the knowledge base? I’ve tried repeatedly telling it in there to only use the Knowledge Base for replies, but should I refer to this in a specific way to get the desired outcome?
This is such an important topic. RAG is the most immediate opportunity for most corporate clients, and we need a way to do what the no-code platforms are doing well; add knowledge as a google drive or onedrive folder. What are the document upload limits on a pickaxe? Same as for the underlying engine? Does the knowledge get embedded (vectorised)? Or tokenised every run (I think not as per above explanation).
As this post attempts to explain (but perhaps it’s not clear) all the Knowledge Base content gets vectorized. So it’s free to upload files into the Knowledge Base.
Generating answers is actually different. For each interaction, your Pickaxe will summon forth the 750 most relevant tokens from the vectorized Knowledge Base. These 750 tokens are inserted into the context window, and thus paid for. This ‘750 token’ number is adjustable within the “Learn” tab of the builder.