Any tips on best settings and prompts to get most accurate info from a knowledge base? I have a PDF of podcast episodes with title of the episode, summary of what is in the episode and episode link. The assistant is a little inconsistent, sometimes it will pull info from the PDF other times it will make answers up. Also, when it does pick up episode info from the PDF it might get the title and summary right but give the wrong link. Not sure if there is a better way to approach this problem or if i just have the settings wrong.
Thanks
1 Like
Hey Nathan, there are two sides to this approach. For more detailed breakdowns, here’s a a guide on how our document system works and here’s a guide on how to make a great document-based chatbot.
Number One
Optimizing the actual document retrieval part of your tool is important. There are a few levers to pull on here.
- Relevance threshold: When the document is uploaded, it is chunked into tons of small sections. Whenever you interact with the bot, it pulls the chunks that are scored as ‘most relevant’ to your query. Adjusting this up or down will change how frequently information is used. For example, if you make it really high (.9 or higher) than only extremely relevant information will be used. If you set it really low (.4 or lower) then tons of information will be used.
- Tokens available for knowledge base: This controls the total amount of tokens from your document is fed to the bot when it is answering a query. If you set it 1000, then the most relevant chunks will be shown to the bot until it reaches 1000 tokens total. Even if you have the relevance threshold set super low (like .2) only the amount of tokens specified here will be given to the bot.
- Knowledge base injection phrase : When the 1000 tokens of info from your document are shown to the bot, this phrase will contextualize it. For example, if your document is a bunch of quotes from a famous person, you might provide that context here. Or if its tax law, you may provide that here. Understanding that the bot does not look at the entire document every time, you can use this phrase to provide extra context.
Number Two
Format the document in a more friendly way. This more time intensive but can have great returns. Many times, poor performance is because a document is not formatted in a LLM-friendly way. There are tons of pictures, tables, columns, and other things that make it harder to chunk the text nutritiously. More headings and labels helps the system understand more of the context of the document. It’s important to understand the bot does not look at the entire document every time, it chunks the documents into smaller entries that it will call when it considers them relevant. Adding additional labeling through the document helps it summon relevant chunks more accurately.
2 Likes
Thank you for the response. I have tried some of these, made all the context windows larger and relevance thresholds very high. Not sure if others have the same problem, but the assistants still seem to just make up information even when you ask specific questions about the knowledge base and give the assistant pre-defined instructions to get answers from there. Are there any best practices on how to reference the knowledge base in the instructions (i.e. should you say ‘reference your knowledge base’, or ‘reference xyz.pdf attached’)? I give the same instructions and document to GPT assistant in GPT PLUS environment and it works fine (even in the openAI developer environment with the same instructions it works better).