404 errors on lots of cited sources

DavidOswald · July 20, 2024, 11:30am

Hi there, I have asked that any response includes a link to a source. However, I’m finding I’m getting quite a few 404 errors in the links pickaxe provides. Is there a way to not show a link if it comes up with a 404 error? I had one response with abut 8 links and 5 of them were 404 errors.

I’m guessing this comes from URLs either being dynamically generated by the websites platform or the articles have since been moved or deleted.

admin_mike · July 22, 2024, 6:24pm

Hey David, your reasoning is correct. Sometimes links the models read in their training data have since been deprecated, which is why some of the links lead to deprecated pages. Additionally, sometimes the models hallucinate non-existent links.

Currently there is not a way to verify links in the outputs. I was experimenting with this myself over the weekend with a subreddit suggester tool that recommends subreddits to post in. When we find a solution we will advertise it to all users. If you find a solution let us know as well.

DavidOswald · July 22, 2024, 6:55pm

Thanks Mike

I’ll keep plugging away at it to see if I can come up with a solution that n the prompt in the meantime.

Absolutely loving my Pickaxe projects!!

intellibotique · July 23, 2024, 2:44pm

If the universe for your tool is relatively limited, you may be able to upload a document with all of the appropriate links allowed by your Pickaxe to share.

This seems to be the easiest way to prevent hallucinations to fake links in my experience.

In ChatGPT, I have occasionally had success by telling it to visit and verify each link before sharing it, but that approach doesnt seem to work consistently or transfer to the PickAxe environment

DavidOswald · July 23, 2024, 2:57pm

Hi Intellibotique,

Love the idea of uploading a list of URLs. Is the root domain sufficient, or do you need to add the actual URL for each piece of content in the list of links?

I’ll try the visit and verify the apporach. Thanks. You might just have solved the problem for me.

intellibotique · July 23, 2024, 4:17pm

The root domain wouldn’t be sufficient — if you want it to draw solely from one root domain, I’d download the site map for the domain and upload it into the knowledge base and reference it accordingly in the prompt.

DavidOswald · July 29, 2024, 7:25am

Ah, using the site map is a clever idea! I’ll try that. Thanks for your advice!

timothy · October 8, 2024, 3:04pm

Has anyone sucessfully done this? What are the exact steps for " if you want it to draw solely from one root domain, I’d download the site map for the domain and upload it into the knowledge base and reference it accordingly in the prompt."?

I’ve downloaded my xml sitemap but I can’t upload it to the knowledge base–it won’t let me select the file. Does it need to be in a different file type other than xml?

timothy · October 10, 2024, 3:49pm

Ok, I converted my xml sitemap to JSON and got it uploaded.

I’ve included this instruction:

Only provide URLs contained in the knowledge base documents.

The responses are still making up URLs that return 404 errors. What else can I do to fix this??

admin_mike · October 10, 2024, 4:45pm

I would put the Sitemap JSON into the actual prompt so it will ALWAYS be in the context window of the Pickaxe.

Your Pickaxe is not always looking at the full Knowledge Base. Only the most relevant parts of the document. It does ALWAYS at the full role though. The new message insights feature in the builder should provide considerable visibility into this.

timothy · October 10, 2024, 6:10pm

Ok, thanks but I’m not 100% clear.

Should I put the contents of the Sitemap JSON file into the prompt, or reference the file name in the prompt? And how would I write this out as an instruction?

I don’t fully understand what this means: “Your Pickaxe is not always looking at the full Knowledge Base. Only the most relevant parts of the document. It does ALWAYS at the full role though.”

admin_mike · October 10, 2024, 7:04pm

If you’re having trouble with the JSON file in the Knowledge Base, I would try adding it into the Role (the place where you want the prompt).

The Knowledge Base is a huge store of information. It can have multiple thousand-page books. It can vastly exceed the context window of AI models. For example, GPT-4o has a context window (the amount of tokens it can process) of only 128,000 tokens. A Knowledge Base with many books in it will have millions of tokens even hundreds of millions of tokens.

Each time your Pickaxe generates a response, it is unable to look at the entirety of the Knowledge Base. Instead it looks at the most relevant pieces of the knowledge base given the user’s most recent message.

You can read more about the process in this forum discussion.

timothy · October 11, 2024, 6:19pm

Ok, I’ve added a list of URLs in the Role prompt box and adjusted the language in the instructions. I also added instructions to only use the list of URLs in the prompt injection box. The AI bot continues to make up URLs during follow up questions.
This is very frustrating!
Any other ideas on how to prevent it from making up bad URLs? Is there a AI model that will be best to use for this?

admin_mike · October 11, 2024, 6:27pm

Can I have permission to go into your tool?

timothy · October 11, 2024, 7:15pm

Yes that is fine. Let me know your thoughts

admin_mike · October 11, 2024, 7:19pm

Okay. I re-structured your prompt and a couple settings. I tried it out and got 5/5 valid, correct links.

See if it works now.

timothy · October 12, 2024, 12:00am

Awesome, thank you, it is working much better now!

admin_mike · October 12, 2024, 12:32am

Glad to help! It was just a matter of Prompt engineering.

apwp · October 23, 2024, 7:34pm

I’m having this same exact issue, where I’ve uploaded a csv file of all my sites articles and links, the bot continues to make up URL’s that don’t exist with the answers it gives back.

@admin_mike or @timothy, could you share what was changed in the prompt to get it to give out accurate links consistently?

I’ve tried adding explicit instructions to only use the links provided, and adjusting settings to no avail. Would love to know what you changed to fix it

timothy · October 24, 2024, 12:41am

You will need to add your a list of URLs in the Role prompt box. I’ve put this at the end of my prompts.
Above this list, you need an instruction to only use URLs from the below list.

Topic		Replies	Views
URL (source links) broken when Pickaxe reply from Knowledgebase General	4	32	June 29, 2025
Hallucinated URL references General knowledge-base	8	92	January 9, 2025
AI Chatbot Web-Browsing Issue - Unable to Access URLs Consistently Bugs / Site Issues	3	51	March 27, 2025
Knowledge Base web scraper keeps inventing fake URLs Bugs / Site Issues knowledge-base	4	69	February 11, 2025
Uploading list if links rather than pasting them individually General custom-domain , pickaxe , knowledge-base	9	124	August 8, 2024

404 errors on lots of cited sources

Related topics