404 errors on lots of cited sources

Hi there, I have asked that any response includes a link to a source. However, I’m finding I’m getting quite a few 404 errors in the links pickaxe provides. Is there a way to not show a link if it comes up with a 404 error? I had one response with abut 8 links and 5 of them were 404 errors.

I’m guessing this comes from URLs either being dynamically generated by the websites platform or the articles have since been moved or deleted.

1 Like

Hey David, your reasoning is correct. Sometimes links the models read in their training data have since been deprecated, which is why some of the links lead to deprecated pages. Additionally, sometimes the models hallucinate non-existent links.

Currently there is not a way to verify links in the outputs. I was experimenting with this myself over the weekend with a subreddit suggester tool that recommends subreddits to post in. When we find a solution we will advertise it to all users. If you find a solution let us know as well.

2 Likes

Thanks Mike

I’ll keep plugging away at it to see if I can come up with a solution that n the prompt in the meantime.

Absolutely loving my Pickaxe projects!!

1 Like

If the universe for your tool is relatively limited, you may be able to upload a document with all of the appropriate links allowed by your Pickaxe to share.

This seems to be the easiest way to prevent hallucinations to fake links in my experience.

In ChatGPT, I have occasionally had success by telling it to visit and verify each link before sharing it, but that approach doesnt seem to work consistently or transfer to the PickAxe environment

1 Like

Hi Intellibotique,

Love the idea of uploading a list of URLs. Is the root domain sufficient, or do you need to add the actual URL for each piece of content in the list of links?

I’ll try the visit and verify the apporach. Thanks. You might just have solved the problem for me.

The root domain wouldn’t be sufficient — if you want it to draw solely from one root domain, I’d download the site map for the domain and upload it into the knowledge base and reference it accordingly in the prompt.

1 Like

Ah, using the site map is a clever idea! I’ll try that. Thanks for your advice!

1 Like

Has anyone sucessfully done this? What are the exact steps for " if you want it to draw solely from one root domain, I’d download the site map for the domain and upload it into the knowledge base and reference it accordingly in the prompt."?

I’ve downloaded my xml sitemap but I can’t upload it to the knowledge base–it won’t let me select the file. Does it need to be in a different file type other than xml?

Ok, I converted my xml sitemap to JSON and got it uploaded.

I’ve included this instruction:

  • Only provide URLs contained in the knowledge base documents.

The responses are still making up URLs that return 404 errors. What else can I do to fix this??

I would put the Sitemap JSON into the actual prompt so it will ALWAYS be in the context window of the Pickaxe.

Your Pickaxe is not always looking at the full Knowledge Base. Only the most relevant parts of the document. It does ALWAYS at the full role though. The new message insights feature in the builder should provide considerable visibility into this.

Ok, thanks but I’m not 100% clear.

Should I put the contents of the Sitemap JSON file into the prompt, or reference the file name in the prompt? And how would I write this out as an instruction?

I don’t fully understand what this means: “Your Pickaxe is not always looking at the full Knowledge Base. Only the most relevant parts of the document. It does ALWAYS at the full role though.”

If you’re having trouble with the JSON file in the Knowledge Base, I would try adding it into the Role (the place where you want the prompt).

The Knowledge Base is a huge store of information. It can have multiple thousand-page books. It can vastly exceed the context window of AI models. For example, GPT-4o has a context window (the amount of tokens it can process) of only 128,000 tokens. A Knowledge Base with many books in it will have millions of tokens even hundreds of millions of tokens.

Each time your Pickaxe generates a response, it is unable to look at the entirety of the Knowledge Base. Instead it looks at the most relevant pieces of the knowledge base given the user’s most recent message.

You can read more about the process in this forum discussion.

Ok, I’ve added a list of URLs in the Role prompt box and adjusted the language in the instructions. I also added instructions to only use the list of URLs in the prompt injection box. The AI bot continues to make up URLs during follow up questions.
This is very frustrating!
Any other ideas on how to prevent it from making up bad URLs? Is there a AI model that will be best to use for this?

Can I have permission to go into your tool?

Yes that is fine. Let me know your thoughts

Okay. I re-structured your prompt and a couple settings. I tried it out and got 5/5 valid, correct links.

See if it works now.

Awesome, thank you, it is working much better now!

Glad to help! It was just a matter of Prompt engineering.

I’m having this same exact issue, where I’ve uploaded a csv file of all my sites articles and links, the bot continues to make up URL’s that don’t exist with the answers it gives back.

@admin_mike or @timothy, could you share what was changed in the prompt to get it to give out accurate links consistently?

I’ve tried adding explicit instructions to only use the links provided, and adjusting settings to no avail. Would love to know what you changed to fix it

  1. You will need to add your a list of URLs in the Role prompt box. I’ve put this at the end of my prompts.
  2. Above this list, you need an instruction to only use URLs from the below list.
1 Like