How to train chatgpt on my own data?

Hi. I want to train ChatGPT but on my own data. I want to train it with my documents and emails. Can I make my own chatGPT basically with Pickaxe?

I have maybe a dozen business strategy and business documentation pdfs to put in. It’s very important that the chatgpt understand the information and learn it. Ideally, it would use these documents to know how to answer things about the company. I want to use it internally basically, so not for customers but for me and a few others.

1 Like

This is a good question. Pickaxe actually lets you train multiple different AI models (ChatGPT, Claude, Mistral) on your own data and documents. This recent video actually shows you how to train a GPT on your own data. I’ll also explain below.

ADDING DATA
Our Knowledge Base system is a basically an easy, fast way to train a model on your own data. You can upload your data as files like PDFs, TXT, CSV, etc. or you can upload domains and scrape webpages from them.

WHAT DOES IT MEAN TO “TRAIN CHATGPT”
Now this above process of adding your data shouldn’t be confused with training a foundational model. That’s what companies like OpenAI and Anthropic do. And it requires billions of words of training data to do properly. Our training system organizes all your data semantically, then shows the most relevant pieces of the data to your chatbot at opportune moments. For example, right before your chatbot answers a question about your security protocol, it will read your privacy & security policy page and then answer the question.

DATA LIMITATIONS ON PICKAXE
Our Knowledge Base system lets you train with as a few as 1 document to start. And you can upload up to 50 files per tool on the Gold tier, and unlimited on the Pro tier. These files can be documents, webpages, even videos.

Here’s a page that explains more how to train ChatGPT on your own data.

1 Like

Ok. Do I need to train it with specific kind of files, or is it ok to just upload the data that I have? It’s not exactly well-formatted datasets like in your video.

And to train chatgpt on my data like my brand’s services, does that mean I have to give it tons of examples, or will just a few work? And can i upload a google doc as CSV into it to train it on it?

Regarding files types, you can train the models with most file types (PDFs, TXTs, DOC, webpages, even youtube videos).

Regarding training data amount, there’s no single right answer. You should upload all the information you want to train it on. But as a general rule of thumb, less is often more. Don’t give it too many empty calories. Instead of uploading 50 blog posts that cover similar material, I would recommend uploading the dozen most information-dense blog posts. As an example of why less can be more, imagine trying to answer a question drawing from 12 sources instead of just one.

Regarding uploading CSVs, yes, you can upload CSVs into the knowledge base. We actually handle CSVs in a special way.