If I want to link user of a pickaxe (as a prospective customer) to a vendor in a geographic are of America, it seems assigning a vendor to n zip codes might work and if the uses enters their zip. Would this work currently simply by uploading a .CSV? If so, what are the size limit constraints. There are about 42,000 zip codes in American and that would require likely 150,000 cells to store the needed information. This is trivial for an online database but I have no idea how LLMs do with this. Has this kind of load been tested?
I think analyzing an excel is not going to give good results. LLMs frequently make errors. A better way would be to convert it to a database table then connect it with an LLM wherein when a user asks a query in natural language the LLM will convert it to an SQL query and fetch the right answer from the table. That ways you’d have greater accuracy.
Thanks for the thoughts @zerodot. In my case, the spreadsheet is nothing but a table. No calculations, no queries or lookups.
My question was just about the size and performance of a large table in the context of a pickaxe. Any experience trying that?
If you have a large CSV that was uniformly formatted and clearly had the Zip Code for each vendor, it could probably work.
When you add a CSV file to the Knowledge base, we actually built a custom, specialized system just for processing them that handles CSVs differently than other types of documents.
Here’s how it works.
EXAMPLE CSV
Let’s take the example of this spreadsheet of Taco Bell franchise owners.
HOW IT GETS CHUNKED
The very top row (with headers) is turned into properties. Then each subsequent row is turned into an individual chunk with the headers as properties. For example, row 4 would be turned into:
{
Owner name: "Charlie Puck",
Location: "Billings, MT",
Annual Revenue: "$72,000",
Top-selling Item: "Doritos Locos Taco"
}
So this CSV would be turned into 3 separate chunks, and then entered into the Knowledge Base. Here’s what it looks like after you add this CSV to the Knowledge Base.
HOW IT GETS PULLED UP
Then when people ask questions, it can pull up the relevant chunks. For example, when we ask the question everyone wants to know “Who sells the most Doritos Locos Tacos” it’s able to pull the relevant row of the CSV.
I remember reading your post on putting a google doc in the knowledge base (by making it public) and setting the pickaxe to refresh every day, letting it answer questions from the updated doc.
Would the same be also possible with google sheets or is it necessary to upload it to the knowledge base?
I’m not sure where you read that post, but that is inaccurate. The web-scraping cannot scrape google docs. Google Docs are cloud-based document editors, not web-pages. The web-scraping can only access information on real webpages.