How to make a Pickaxe that can "see" - (vision chatbot walkthrough)

admin_mike · October 8, 2024, 6:12pm

A lot of Pickaxe users don’t realize that they can make vision chatbots that can look at and “see” images. This format allows end-users to upload photos, images, screenshots, etc. as an input into the chatbot.

We just published a tutorial video on how to make a vision-enabled chatbot on Pickaxe (and no, neither Jennifer nor Salma have gotten back to me yet). I thought it would be instructive to write a quick breakdown of the process here as well.
.
.
WHAT IS A VISION CHATBOT
That’s just my made-up term for a chatbot that can ‘see’ images. There are many use cases for this type of a chatbot. For example, here is a website critic that looks at screenshots of website landing pages (uploaded as a screenshot) and then writes a critique of them. Other uses have made chatbots that look at your LinkedIn profile and give feedback, write outreach messages based on Instagram grids, give tips on youtube thumbnails, and more!
.
.
HOW TO ENABLE VISION
This is very simple. There are two steps.

Step 1: In the Builder, go to configure and click on “allow users to upload…”. This is poorly labeled at the moment of writing this post, but users can upload images as well as text documents.
Step 2: In the model dropdown, select GPT-4o or GPT-4o mini. This is important. Vision will only work with these two language models.

.
.
HOW DOES IT WORK
Vision-enabled chatbots are a fairly simple concept as well. The chatbot will “look” at an image and then write a very detailed text description of the image. Then the description is fed to the language model powering your Pickaxe.

These descriptions have a surprising amount of detail. Things like color, mood, atmosphere, even written text are all captured in them.

intellibotique · October 8, 2024, 7:14pm

Thanks for putting this guide together. I was a bit excited to see “vision enabled,” thinking that you guys had somehow figured out a way to connect a pickaxe to a webcam.

Admittedly there’d still be latencies but that would basically mean you could create a Pickaxe Robotaxi.

The image analysis feature is pretty cool, though. I’ve used it for a lot of things where I couldn’t figure out what I was looking at but the LLM did.

I’ve yet to specifically apply it to Pickaxe; some fun ideas are an art critic pickaxe that allows you to take a picture of street art and get a critique.

A fashionista Pickaxe that allows you to take a picture of your outfit and pick out accessories or layering items.

Other uses that might or might not work include a plant identifier — then if that worked you could niche it into edible plants, or native plants, or poisonous plants…

It might be interesting to see how well the LLM is able to identify car mods. Take a picture of a car and it tells you what aesthetic mods have been applied — these last two might not work but o think the first two would.

kenlyle · October 15, 2024, 9:48pm

How can we get the next (image processing) action to happen immediately?
Currently, we have a prompt: “Please upload your room image and type “continue” to proceed.”, which feels lame to me.
Is there some obvious prompt modification or setting we have missed?

admin_mike · October 15, 2024, 10:47pm

For now, this is a limitation of the architecture. The user must say something to trigger the AI’s response. It cannot prompt itself.

kenlyle · October 15, 2024, 11:46pm

OK.
First, could uploading an image not be considered equal to “saying something”?
Could there be any kind of macro action, using your Actions framework, a text field in the Editor, “Text to Send After File Upload”?

jonr · October 22, 2024, 2:57pm

Thank you for this Mike . Would you mind adding the option to change/remove the placeholder text as it currently says, paste a website/video link or drag a file , but we only want the user to upload an image, I know we can set the title but the placeholder text is just a little confusing in this case. Thank you again.

pau4navarro · December 27, 2024, 8:24am

Thank you, I can see lots of possibilities for this if the user can upload or
take a picture that is immediately fed to the pickaxe without any placeholder text.

So I definitely vote for getting rid of that text (or at least give us the option to add/remove it).

Also, so far the Form builder doesn’t seem to allow users to upload images, they can only be uploaded in the Chat builder Am I right?

admin_mike · April 1, 2025, 5:35pm

You can do vision-enabled Forms. Just toggle on file upload and use an OpenAI model.

Pristine · May 7, 2025, 6:35pm

Is it possible to have an image database that the pickaxe can reference and tell a user what the icon means that they just uploaded?

We have a bunch of icon error messages and some are rarely seen, would like the pickaxe to be able to tell someone what an icon means when they see it.

Pristine · June 6, 2025, 1:34pm

@admins any help here too?

ab2308 · June 9, 2025, 11:41pm

@Pristine the knowledge base only accepts text inputs (embeddings).

Your options are:

Use GPT-4o or 4o mini as your model and let the model “guess” the meaning of the icon (no fine-tuning required).
You can also create a spreadsheet where you have one column describing the icon (in words) and a second column explaining the meaning of the icon. You can then upload it in the knowledge base. Then you create a prompt where you ask the pickaxe to Step 1) describe the icon uploaded by the user Step 2) Find the meaning of the icon in the knowledge base.
Use actions to connect to an external database trained on your specific icons. It is a more precise solution but requires some fine-tuning with platforms like TwelveLabs to have image embeddings (this allows you to compare images instead of textual explanations of images)

Topic		Replies	Views
How do I allow users to Upload Photos into the chatbots? Questions	4	74	April 13, 2025
Can Pickaxe do the image recognition? General	1	45	October 8, 2024
Image generation + Web Browsing = not working Bugs / Site Issues	7	73	March 27, 2025
(SOLVED) All ChatGPT Models that Handle Image Recognition are Not Working Currently - SOLVED must upload image and text to submit Bugs / Site Issues	3	54	June 6, 2025
Unable to Upload Image Bugs / Site Issues	5	62	January 10, 2025

How to make a Pickaxe that can "see" - (vision chatbot walkthrough)

Related topics