How to make a Pickaxe that can "see" - (vision chatbot walkthrough)

A lot of Pickaxe users don’t realize that they can make vision chatbots that can look at and “see” images. This format allows end-users to upload photos, images, screenshots, etc. as an input into the chatbot.

We just published a tutorial video on how to make a vision-enabled chatbot on Pickaxe (and no, neither Jennifer nor Salma have gotten back to me yet). I thought it would be instructive to write a quick breakdown of the process here as well.
.
.
WHAT IS A VISION CHATBOT
That’s just my made-up term for a chatbot that can ‘see’ images. There are many use cases for this type of a chatbot. For example, here is a website critic that looks at screenshots of website landing pages (uploaded as a screenshot) and then writes a critique of them. Other uses have made chatbots that look at your LinkedIn profile and give feedback, write outreach messages based on Instagram grids, give tips on youtube thumbnails, and more!
.
.
HOW TO ENABLE VISION
This is very simple. There are two steps.

  • Step 1: In the Builder, go to configure and click on “allow users to upload…”. This is poorly labeled at the moment of writing this post, but users can upload images as well as text documents.

  • Step 2: In the model dropdown, select GPT-4o or GPT-4o mini. This is important. Vision will only work with these two language models.


.
.
HOW DOES IT WORK
Vision-enabled chatbots are a fairly simple concept as well. The chatbot will “look” at an image and then write a very detailed text description of the image. Then the description is fed to the language model powering your Pickaxe.

These descriptions have a surprising amount of detail. Things like color, mood, atmosphere, even written text are all captured in them.

2 Likes

Thanks for putting this guide together. I was a bit excited to see “vision enabled,” thinking that you guys had somehow figured out a way to connect a pickaxe to a webcam.

Admittedly there’d still be latencies but that would basically mean you could create a Pickaxe Robotaxi.

The image analysis feature is pretty cool, though. I’ve used it for a lot of things where I couldn’t figure out what I was looking at but the LLM did.

I’ve yet to specifically apply it to Pickaxe; some fun ideas are an art critic pickaxe that allows you to take a picture of street art and get a critique.

A fashionista Pickaxe that allows you to take a picture of your outfit and pick out accessories or layering items.

Other uses that might or might not work include a plant identifier — then if that worked you could niche it into edible plants, or native plants, or poisonous plants…

It might be interesting to see how well the LLM is able to identify car mods. Take a picture of a car and it tells you what aesthetic mods have been applied — these last two might not work but o think the first two would.

1 Like

How can we get the next (image processing) action to happen immediately?
Currently, we have a prompt: “Please upload your room image and type “continue” to proceed.”, which feels lame to me.
Is there some obvious prompt modification or setting we have missed?

For now, this is a limitation of the architecture. The user must say something to trigger the AI’s response. It cannot prompt itself.

1 Like

OK.
First, could uploading an image not be considered equal to “saying something”?
Could there be any kind of macro action, using your Actions framework, a text field in the Editor, “Text to Send After File Upload”?

Thank you for this Mike :slight_smile: . Would you mind adding the option to change/remove the placeholder text as it currently says, paste a website/video link or drag a file , but we only want the user to upload an image, I know we can set the title but the placeholder text is just a little confusing in this case. Thank you again.

1 Like