My bots that use knowledge base files have been failing to make use of relevant chunks.
This is the typical behavior:
– I ask a question that is definitely covered in the knowledge base.
–The pickaxe responds saying it finds no information in its knowledge base.
– I check the Message Insights screen. It shows that several chunks were used, but NONE of the chunks listed actually contain the requested information. (I can’t see all their content, but I know from the filenames and chunk numbers that these are NOT relevant chunks.)
– I then proceed to the “Explore” screen. That shows several chunks above the green cutoff line, but they are NOT the same chunks as listed on the Message Insights screen. The first chunk typically has a rating of 70 or better, and it DOES actually contain the exact information requested
Conclusion: It seems that the bot finds the relevant information, but is perhaps only “using” the irrelevant chunks listed on the Message Insights screen.
I manually recreated one of the bots that uses 12 files. Initially the chunks listed on the Message Insights screen were matching the ones on the Explore screen, and the bot was using information in the relevant chunks in its responses. But before long, it reverted to the same (very frustrating) behavior of claiming it found no relevant information even though the Chunk Explorer feature proved otherwise. This behavior continues.
Well, after a very frustrating two weeks where I kept seeing apparent failure in KB retrieval, I have now manually recreated a bot for the fourth time – and this time it appears to be working. I’ve tested it with many queries this evening and it’s consistently finding the relevant KB information. Let’s hope it keeps working.
The recreated bot uses the same prompt and files as the previous version. It’s a mystery.
Has anyone else seen the same symptoms I was seeing over the past two weeks? Such as:
– Pickaxe responses indicate it can find no relevant information in the knowledge base.
– The files and chunk numbers listed on the “Message Insights” screen are completely different than the ones on the “Explore” screen (and I’m not referring to the minor discrepancy apparently due to the “Message Insights” screen treating chunk #1 as chunk #0).
– The chunks listed above the green line on the “Explore” screen do contain highly relevant information, despite the bot’s claim that the knowledge base has no such information.
Thanks so much for reaching out! The issues I’m seeing with KB chunk retrieval have really been frustrating. In addition to this “KB Retrieval bug” post, see also a related one re. KB file size – Impact of file size on knowledge base behavior?.
I’ve seen similar issues in both of my studios that use KB files, but focus on this one for now:
MennoChat – Pickaxe
Focus on the “Chat with Menno” pickaxe. This is designed to allow users to ask Menno Simons, an Anabaptist leader during the Protestant Reformation, any questions about his life and his theology.
In brief, I often see instances where the bot returns responses that either report that the KB has no related information, or it gives a very vague generalized response that doesn’t use KB information – these are instances where the requested information clearly exists in the KB. In such cases, you’ll always see that one or more chunks shown in the “Message Insights” or “Explore” screens contain the information that the response failed to actually use. And, what’s further sign of a clear issue is that in such instances, the chunks listed on the “Message Insights” screen are NOT THE SAME as the ones on the “Explore” screen.
That second Bug post mentioned above reports how it appeared that the issue might somehow be related to file size. Most of the files in this KB are small, but 2 of them were 1.5 or 2 MB. So I SPLIT both of those in half – so that all files were about 1 MB or smaller.
In an hour of testing last evening, the problems disappeared when I used the smaller files, but repeatedly returned when I switched the large files back on again. I repeated this many times, so I’m sure that’s what was happening.
HOWEVER, to my chagrin, this afternoon, even when using the smaller, split files, I was again seeing the same behavior. And if I switched off the smaller split files (which, at roughly 1 MB each, are still bigger than any of the other files), the problem returned.
The best way to test this is to ask a very specific question, such as:
– “Tell me about Gertrude” (Gertrude was Menno Simons’ wife, and a proper response should clearly state that relationship.), or
– “Who was Sicke Snyder?” (He was a religious man who was executed for his beliefs.)
The two original “large” files that seemed to be triggering the issue are “Menno_Simons_Works_V2_ed.txt”, and
“Menno_Simons_Works_V1_ed.txt”.
Right now both of those are switched off.
You’ll see four files with similar names – these are the smaller files created by splitting those larger ones in half. Those are currently turned on, but this evening I was even getting bad responses with those – and could get good responses if I disconnected them.
This is all a bit hard to explain, and I would be more than happy to do a Zoom with you and share my screen as I demonstrate this issue. Just let me know!
It’s not entirely consistent, but it seems like it has something to do with file size.
I love Pickaxe, but struggling with KB retrieval issues for the past several weeks has me pulling my hair out!
Again, thanks for reaching out!
Gene
Thank you for the detailed response! We discovered an issue where the ‘Model Reminder’ was being included during the document retrieval, which caused an inconsistency between the message insights and the chunk explorer. This should now be fixed!
As for the tool not using the retrieved documents, it’s most likely a prompt/model issue because having chunks in message insights indicates that document contents were indeed appended to the user message that will be sent to the LLM/model provider. Slight modification to the knowledge context or role might do the trick! On the backend, we currently structure the user message more or less like this:
...
<user input>
## KNOWLEDGE BASE ##
<knowledge context from knowledge settings>
<retrieved knowledge chunks>
## END OF KNOWLEDGE BASE ##
...
@stephenasuncion Thank you so much for working on this! Much appreciated!! Some quick initial testing shows an improvement in the issues I reported. There is no longer a discrepancy between the chunks listed in Message Insights and those listed in the Chunk Explorer, and the bot’s responses appear to be better.
However, the significant issues that are documented in the “Chunking Issues” Word doc that I provided have NOT been eliminated. In short, the content from my KB files has not always been placed in chunks representing the expected sequential order, and within an individual chunk, big blocks of text are sometimes missing. If you didn’t see that document, check your Chat in this forum or the inbox for info@pickaxeproject.com. Or send me an email at gene.kraybill@gmail.com.
@stephenasuncion Please note that the chunk numbers I cite in that “Chunking Issues” doc will need to be reduced by 1 (e.g., chunk #3 becomes #2), since your programming changes this week have shifted the “Explore” numbers by 1 to match how they’re handled in “Message Insights”.
Hello @stephenasuncion … Just tested it on an 89 kb file. The chunk order seems right at first glance. However, how many tokens or words or you aiming for per chunk? The previous chunking process generated 91 chunks for this file; the new process created 534. Average chunk size is only 25 or 30 words.
We’re currently splitting content into chunks with a maximum of 250 tokens each (Though, we might increase or make it controllable). If the content is valid JSON, we increase the limit to 4,000 tokens per chunk.
When splitting, we start by breaking the raw content by paragraphs first. If a chunk still exceeds the max token limit, we continue splitting it further—first by lines, then by spaces, and finally character by character if necessary. This ensures each chunk stays within the size limit while preserving as much structure and meaning as possible.
Each chunk includes a 10% overlap in the beginning of each chunk based on the max token size (e.g., 25 tokens for 250-token chunks). This overlap ensures important context isn’t lost between chunks.
@stephenasuncion, I just did another upload test. The actual chunk size in the two files I’ve tested averages only 30 words, or roughly 35 tokens – much smaller than the target size you mentioned. I do love the idea of the overlap you’ve implemented. The 2 files I tested are simple txt files, and the resulting number of chunks in each is roughly 6 times the count under the old chunking process. Is that what you intended?
Glad you like the overlap! We do need to keep an eye on the increased number of chunk entries. We might end up increasing the max token limit as a result, but ultimately we care more about how it’s impacting your results in your Pickaxes.
@stephenasuncion Thanks for your work on this! Together with the new OpenAI models, improved KB processes will really make pickaxes rock!
I won’t be able to get a good feel for performance until I upload all the documents again and do a good bit of testing. I’m a newbie in terms of the technical side of RAG chunking, but as I’m sure you know, small chunks apparently expedite finding very specific facts, while larger chunks may be needed for good responses when interactions tend to require responses that are more broadly analytical in nature. I do wonder how well chunks averaging only 30 or so tokens, with just a few words of overlap, will work with certain kinds of interaction. Still not clear on why so small when the aim apparently was more like 200-250 tokens.
I love your idea of possibly providing a way for users to control the size of the chunks.
By the way, does the overlap make it possible for the LLM to actually piece together contiguous relevant chunks so the context it sees spans multiple chunks? Or is the purpose simply to provide slightly more contextual information in case the content at start and end of the chunk need it?
@stephenasuncion Problem!! Late yesterday, I re-uploaded 18 relatively small text files containing a total of 6 MB of content. This generated 12,834 chunks, averaging roughly 30 words per chunk and some containing only a single line of text. Today, even with simple queries, the Message Insights screen is reporting “Knowledge Base not used” and when I advance to the chunk Explorer, I get a popup error that says, “Cannot read properties of undefined (reading ‘filter’)”. I’ve made no changes to the prompt.