Putting a large spreadsheet (csv) in the Knowledge Base

mseaworthy · October 31, 2024, 5:17am

If I want to link user of a pickaxe (as a prospective customer) to a vendor in a geographic are of America, it seems assigning a vendor to n zip codes might work and if the uses enters their zip. Would this work currently simply by uploading a .CSV? If so, what are the size limit constraints. There are about 42,000 zip codes in American and that would require likely 150,000 cells to store the needed information. This is trivial for an online database but I have no idea how LLMs do with this. Has this kind of load been tested?

zerodot · October 31, 2024, 6:51am

I think analyzing an excel is not going to give good results. LLMs frequently make errors. A better way would be to convert it to a database table then connect it with an LLM wherein when a user asks a query in natural language the LLM will convert it to an SQL query and fetch the right answer from the table. That ways you’d have greater accuracy.

mseaworthy · October 31, 2024, 4:39pm

Thanks for the thoughts @zerodot. In my case, the spreadsheet is nothing but a table. No calculations, no queries or lookups.

My question was just about the size and performance of a large table in the context of a pickaxe. Any experience trying that?

admin_mike · October 31, 2024, 11:56pm

If you have a large CSV that was uniformly formatted and clearly had the Zip Code for each vendor, it could probably work.

When you add a CSV file to the Knowledge base, we actually built a custom, specialized system just for processing them that handles CSVs differently than other types of documents.

Here’s how it works.

EXAMPLE CSV
Let’s take the example of this spreadsheet of Taco Bell franchise owners.

HOW IT GETS CHUNKED
The very top row (with headers) is turned into properties. Then each subsequent row is turned into an individual chunk with the headers as properties. For example, row 4 would be turned into:

{ 
Owner name: "Charlie Puck",
Location: "Billings, MT",
Annual Revenue: "$72,000", 
Top-selling Item: "Doritos Locos Taco"
}

So this CSV would be turned into 3 separate chunks, and then entered into the Knowledge Base. Here’s what it looks like after you add this CSV to the Knowledge Base.

HOW IT GETS PULLED UP

Then when people ask questions, it can pull up the relevant chunks. For example, when we ask the question everyone wants to know “Who sells the most Doritos Locos Tacos” it’s able to pull the relevant row of the CSV.

zerodot · November 1, 2024, 6:04am

I remember reading your post on putting a google doc in the knowledge base (by making it public) and setting the pickaxe to refresh every day, letting it answer questions from the updated doc.

Would the same be also possible with google sheets or is it necessary to upload it to the knowledge base?

admin_mike · November 1, 2024, 3:45pm

I’m not sure where you read that post, but that is inaccurate. The web-scraping cannot scrape google docs. Google Docs are cloud-based document editors, not web-pages. The web-scraping can only access information on real webpages.

zerodot · November 1, 2024, 5:23pm

Ok. Maybe i am getting something wrong. I was referring to this post:

admin_mike · November 1, 2024, 9:03pm

Ah. that is no longer the case. You should get an error message when you try to upload one. I’ll delete that out-of-date post.

suprime · January 22, 2025, 10:17am

Mike, how we can force chatgpt via pickaxe to work with whole file?

For example when i gave chatgpt in mac os app it works with whoel file and modify 1600 rows.

When i try to do it with pickaxe it work only with 16 or 20 rows and end. Is there any chance also to let him to do all and save it as csv?

sumergoconicio · January 22, 2025, 7:07pm

This is VERY GOOD TO KNOW for a lot of production usecases. Amazing.

admin_mike · January 23, 2025, 8:43pm

I knew you would like it! You’ve a Knowledge Base super user!

Topic		Replies	Views
Announcement: Much better CSV handling for Knowledge Base 🎉 General pickaxe	0	77	July 30, 2024
Prioritizing user knowledge (CSV) General	1	5	February 18, 2025
Getting killer results from your PickAxe General	5	177	February 13, 2025
Uploading .json files and chunk size General	8	58	November 18, 2024
How do I upload big Excel files into my chatbot? Questions	2	37	January 16, 2025

Putting a large spreadsheet (csv) in the Knowledge Base

Related topics