Hi there, I have asked that any response includes a link to a source. However, I’m finding I’m getting quite a few 404 errors in the links pickaxe provides. Is there a way to not show a link if it comes up with a 404 error? I had one response with abut 8 links and 5 of them were 404 errors.
I’m guessing this comes from URLs either being dynamically generated by the websites platform or the articles have since been moved or deleted.
Hey David, your reasoning is correct. Sometimes links the models read in their training data have since been deprecated, which is why some of the links lead to deprecated pages. Additionally, sometimes the models hallucinate non-existent links.
Currently there is not a way to verify links in the outputs. I was experimenting with this myself over the weekend with a subreddit suggester tool that recommends subreddits to post in. When we find a solution we will advertise it to all users. If you find a solution let us know as well.
If the universe for your tool is relatively limited, you may be able to upload a document with all of the appropriate links allowed by your Pickaxe to share.
This seems to be the easiest way to prevent hallucinations to fake links in my experience.
In ChatGPT, I have occasionally had success by telling it to visit and verify each link before sharing it, but that approach doesnt seem to work consistently or transfer to the PickAxe environment
Love the idea of uploading a list of URLs. Is the root domain sufficient, or do you need to add the actual URL for each piece of content in the list of links?
I’ll try the visit and verify the apporach. Thanks. You might just have solved the problem for me.
The root domain wouldn’t be sufficient — if you want it to draw solely from one root domain, I’d download the site map for the domain and upload it into the knowledge base and reference it accordingly in the prompt.
Has anyone sucessfully done this? What are the exact steps for " if you want it to draw solely from one root domain, I’d download the site map for the domain and upload it into the knowledge base and reference it accordingly in the prompt."?
I’ve downloaded my xml sitemap but I can’t upload it to the knowledge base–it won’t let me select the file. Does it need to be in a different file type other than xml?
I would put the Sitemap JSON into the actual prompt so it will ALWAYS be in the context window of the Pickaxe.
Your Pickaxe is not always looking at the full Knowledge Base. Only the most relevant parts of the document. It does ALWAYS at the full role though. The new message insights feature in the builder should provide considerable visibility into this.
Should I put the contents of the Sitemap JSON file into the prompt, or reference the file name in the prompt? And how would I write this out as an instruction?
I don’t fully understand what this means: “Your Pickaxe is not always looking at the full Knowledge Base. Only the most relevant parts of the document. It does ALWAYS at the full role though.”
If you’re having trouble with the JSON file in the Knowledge Base, I would try adding it into the Role (the place where you want the prompt).
The Knowledge Base is a huge store of information. It can have multiple thousand-page books. It can vastly exceed the context window of AI models. For example, GPT-4o has a context window (the amount of tokens it can process) of only 128,000 tokens. A Knowledge Base with many books in it will have millions of tokens even hundreds of millions of tokens.
Each time your Pickaxe generates a response, it is unable to look at the entirety of the Knowledge Base. Instead it looks at the most relevant pieces of the knowledge base given the user’s most recent message.
Ok, I’ve added a list of URLs in the Role prompt box and adjusted the language in the instructions. I also added instructions to only use the list of URLs in the prompt injection box. The AI bot continues to make up URLs during follow up questions.
This is very frustrating!
Any other ideas on how to prevent it from making up bad URLs? Is there a AI model that will be best to use for this?
I’m having this same exact issue, where I’ve uploaded a csv file of all my sites articles and links, the bot continues to make up URL’s that don’t exist with the answers it gives back.
@admin_mike or @timothy, could you share what was changed in the prompt to get it to give out accurate links consistently?
I’ve tried adding explicit instructions to only use the links provided, and adjusting settings to no avail. Would love to know what you changed to fix it