Hello @stephenasuncion
Thanks so much for reaching out! The issues I’m seeing with KB chunk retrieval have really been frustrating. In addition to this “KB Retrieval bug” post, see also a related one re. KB file size – Impact of file size on knowledge base behavior?.
I’ve seen similar issues in both of my studios that use KB files, but focus on this one for now:
MennoChat – Pickaxe
Focus on the “Chat with Menno” pickaxe. This is designed to allow users to ask Menno Simons, an Anabaptist leader during the Protestant Reformation, any questions about his life and his theology.
In brief, I often see instances where the bot returns responses that either report that the KB has no related information, or it gives a very vague generalized response that doesn’t use KB information – these are instances where the requested information clearly exists in the KB. In such cases, you’ll always see that one or more chunks shown in the “Message Insights” or “Explore” screens contain the information that the response failed to actually use. And, what’s further sign of a clear issue is that in such instances, the chunks listed on the “Message Insights” screen are NOT THE SAME as the ones on the “Explore” screen.
That second Bug post mentioned above reports how it appeared that the issue might somehow be related to file size. Most of the files in this KB are small, but 2 of them were 1.5 or 2 MB. So I SPLIT both of those in half – so that all files were about 1 MB or smaller.
In an hour of testing last evening, the problems disappeared when I used the smaller files, but repeatedly returned when I switched the large files back on again. I repeated this many times, so I’m sure that’s what was happening.
HOWEVER, to my chagrin, this afternoon, even when using the smaller, split files, I was again seeing the same behavior. And if I switched off the smaller split files (which, at roughly 1 MB each, are still bigger than any of the other files), the problem returned.
The best way to test this is to ask a very specific question, such as:
– “Tell me about Gertrude” (Gertrude was Menno Simons’ wife, and a proper response should clearly state that relationship.), or
– “Who was Sicke Snyder?” (He was a religious man who was executed for his beliefs.)
The two original “large” files that seemed to be triggering the issue are “Menno_Simons_Works_V2_ed.txt”, and
“Menno_Simons_Works_V1_ed.txt”.
Right now both of those are switched off.
You’ll see four files with similar names – these are the smaller files created by splitting those larger ones in half. Those are currently turned on, but this evening I was even getting bad responses with those – and could get good responses if I disconnected them.
This is all a bit hard to explain, and I would be more than happy to do a Zoom with you and share my screen as I demonstrate this issue. Just let me know!
It’s not entirely consistent, but it seems like it has something to do with file size.
I love Pickaxe, but struggling with KB retrieval issues for the past several weeks has me pulling my hair out!
Again, thanks for reaching out!
Gene