Knowledge Base retrieval bug

nathaniel · April 18, 2025, 9:09pm

Hi guys,

We had some instability with our database provider last night, from the hours of 2am PST - about 10am PST. I have been up most of the night attempting to make things right, and since about 10am PST today most of the issues should have been solved. @Pristine can you let me know if you or your team are still encountering issues?

Gene · April 18, 2025, 9:25pm

@nathaniel … I’m still definitely seeing major issues here. None of my test queries are returning responses that refer to the knowledge base files. The Message Insights and Explore screens show a list of chunks supposedly used, but none of the chunks are actually relevant and they’re all from a single file that contains no info relevant to the queries.

alexandrek · April 18, 2025, 10:43pm

Same for me, since few days…

Gene · April 19, 2025, 1:46am

@nathaniel … Update as of 9:45 pm ET Friday: All bot responses are still coming back reflecting no knowledge base information, despite the fact that clear info does exist there. Message Insights screen says knowledge base wasn’t used. Explore screen shows “0 relevant knowledge chunks”. This bot has the relevance cutoff set at 60. The explore screen displays at least 20 chunks, but the highest score is typically in the 50s and none of the chunks on that screen actually contain the requested relevant info. Knowledge base use appears to be entirely broken. @stephenasuncion

Pristine · April 19, 2025, 12:27pm

Still not working at 8:25am EST Saturday.

I’ve duplicated the bot, removed all knowledge base, added in different knowledge base, used no knowledge base, nothing works to fix. Bot is somehow utilizing an old knowledge base that was deleted weeks ago, providing information and answers that should be impossible for it to provide since they are not in its knowledge base, including providing links to 404 Error pages.

stephenasuncion · April 21, 2025, 4:43pm

Hi @Gene,

We’ve updated the knowledge base again to apply some fixes. If you re-upload your documents, everything should work correctly now.

Gene · April 21, 2025, 8:00pm

Thanks for the update, @stephenasuncion. The Pickaxe chunk size used to be 200-250 words, and the explanation was that tests showed that was a good size. Now, the average chunk size in files I’ve re-uploaded since the new chunking process is just 30 words – three lines in the Chunk Explorer, and some have as few as 1 or 2 lines. Can you please share the rationale behind this change?
I’m working with documents to which markdown has been added to define a hierarchy of header sections, and I’m doing some experimentation using CSV files to see whether longer chunks whose boundaries are defined partly by the heading-defined sections result in a higher level of accuracy.

stephenasuncion · April 22, 2025, 1:51am

No worries! It turns out there was an issue with token counting. We’ve increased the max token limit, so you should notice an increase in chunk size.

I recommend splitting sections using double new lines (\n\n). As I mentioned earlier, we start by breaking the raw content into paragraphs based on double new lines. If a chunk still exceeds the max token limit, we continue splitting it further—first by lines, then by spaces, and finally character by character if necessary.

Gene · April 22, 2025, 2:52am

Thanks, @stephenasuncion! With the generous help of ChatGPT, I’ve just developed a Python script that chunks my documents with close attention to topic boundaries, and saves the output as a CSV file. I manually applied hierarchical topic headings to all my documents using markdown’s #-##### heading symbols. The Python script treats the content between the # headings as segments, and then combines (or in some cases, splits) the segments to intelligently allocate content to the chunks. It’s set up for a default minimum length of 80 words and a default normal maximum of 250 (with a subsequent average of around 200 words, but will permit a chunk to be as large as 400 words if that helps keep related information together. The script supports the passing of parameters to alter those three values if desired. I’m willing to share the script if anyone has documents with markdown headings and would like to try this approach. I plan to do some testing to compare this topic-based chunking with inbuilt length-based chunking, and will pass along the results when available.

sensei · May 10, 2025, 3:32pm

Hi @Gene
any update about KB retrieval inconsistencies? I am also having similar problems.

Would be appreciated if you have found a solution and share it with us.
Thx in advance

Gene · May 10, 2025, 3:57pm

Hello @sensei
The issues I was experiencing with KB retrieval appear to have all been resolved with the programming changes made by the Pickaxe team. What specific issues are you seeing?

sensei · May 17, 2025, 12:58pm

Hi @Gene
Thank you for your reply.
Although the file (plain text e-book very well structured with ## main headings and #subheadings) was very neatly chuncked and thre relevance score was 70%, it could not find the very obvious answers for simple questions. I am talking in the past tense, since after that day I have not encountered similar problems on my other pickaxes. When I visit this pickaxe (a bot for chatting about an e-book), I will try again.
Best,

Topic		Replies	Views
Good KB response, but NO files on Explore screen Bugs / Site Issues	3	23	April 22, 2025
Message Insights: Look at Knowledge Base Citations 📣 General pickaxe , knowledge-base	11	263	January 29, 2025
Pickaxes struggling to access knowledge base info Bugs / Site Issues	5	104	May 1, 2025
Agent Not Pulling From The Knowledge Base First Questions knowledge-base	2	18	June 23, 2025
Knowledgebase / knowledge chunks General	2	85	November 7, 2024

Knowledge Base retrieval bug

Related topics