How to make sure the bot remembers the entire conversation

So we build mental health advisor bots. @admin_mike the instructions you have already given have been amazing about making sure they do not confabulate. At the moment we have put in the PROMPT INJECTION “Double check answers for accuracy. If you don’t know the answer to this question, DO NOT MAKE UP an answer. Instead admit that you don’t know.”
@intellibotique your Pickaxe Generator and Semantic Forge have been very helpful to help build them too. So far we now have four versions we are beta testing. Now the next part is to make sure the bot remembers the conversation. We have learned one way is to increase the memory buffer as high as possible.

In your opinions would putting this in the Prompt Frame help from anyones experience?
YOU MUST KEEP THE ENTIRE TEXT OF THE DOCUMENT IN YOUR ACTIVE MEMORY. You have enough context capacity to always keep the entire verbatim of the document in active memory!!!”
OR
YOU MUST KEEP THE ENTIRE TEXT OF THE CHAT IN YOUR ACTIVE MEMORY. You have enough context capacity to always keep the entire verbatim of the chat in active memory!!!

Thanks and have an amazing week.

4 Likes

Probably a longer term feature, but it would also be great to be able to access previous conversations. So that the pickaxe can pick up from a client/user might have left off from last time.

@nancy_person - I’m building pickaxes in the coaching/personal development space. It would be great to connect and hear more about your bots and experience with AI in mental health. Email is: toby at tobysinclair.com

Example of studio I’ve launched:

1 Like

Hey Nancy, the best way by is the memory buffer (which we will shortly be re-titling to ‘memory window’). This literally determines how many tokens of the conversation the chatbot will remember.

Your additional prompt may help, but I suspect it won’t make a huge difference.

It would be helpful to hear what sort of details your chatbot is forgetting. What exactly is it mis-remembering?

Hey Mike- thanks for the question. Since we are providing mental health bots to age care, disability care and health care providers and their clients I am just being uber cautious to make sure we can provide the best experience for the users to achieve the outcomes they need. We are also working directly with the users unconscious minds (each and every both of them) so need to continually build step by step.

I have made four bots using the bot builder, semantic forge and pickaxe generator and have been trying them. So far
Personally I like the semantic forge answer: I put in some weird questions just to see how far the bots will go and number one BB was a general answer
number two SF was very good as it used highlights and bold heading to explain the answer it was more concise
three PG was a bit too caring for my liking but someone else might like it
four the plain bot was very brief.

If you have the time and inclination to try them IMHO they are pretty effective for personal development :slight_smile:
Have a fantastic AI day!

What LLM are you putting these prompt building outputs into? I know that Claude will go over the top pretty quickly when you start giving it emotional intelligence.

@intellibotique Nice to hear from you, hope you are well and thanks for the question. At the moment we are sticking with GPT4-o IIRC we were advised that was the best choice for what we are doing. We also plan to use the API instead of buying credits from Pickaxe. Rather than using 3 different providers for the API we anticipate sticking with one provider like Open AI GPT4-o However if we see that another provider is miles ahead we would prolly use them because we need to get this right for our clients. As an aside it seems like every day one LLM comes out with an update that makes it the best then the next day another and so on. From what I can glean a lot of people are thinking GPT5-o will be one complete turn- I am skeptical but who knows. From my perspective- I have the GPT app on my phone and when I test the GPT mental health GPT my experience is much better than typing and reading the responses because it is like having a conversation with a real mental health counselor. When the session is over or even during it I can then read the transcript. OTOH there is also a benefit of having to type and read the response. I suspect that visual people may prefer typing and writing and auditory talking and hearing.

1 Like

Best options for businesses that use AI for professional purposes:

  1. Train your own AI to achieve the best output for your needs.
  2. Ensure you have a robust backend, including Memory and RAG/Langchain.
  3. All those prompts will be useless without proper development.
  4. Making the bot remember the chat depends entirely on backend development (I am referring to remembering almost without limitations).

Which LLM to use ultimately depends on the level of professional performance you require. This means you need to train your own AI and ensure you have proper development and data implementation (Embedding Data, tuning, Memory, RAG) to ensure the best output. If you are in healthcare, you need to comply with HIPAA regulations; therefore, the platform must be developed by HIPAA-compliant developers.

Hi Nancy,

Congratulations on your work in digital health!

Think of GPT as that first-year resident who, despite being brilliant, has an overconfident “I got this!” attitude. These residents are true geniuses with perfect memory and knowledge - they can recall every detail from their medical texts and latest research papers with astounding accuracy. However, while their capacity to remember and know is exceptional, their judgment and experience are still developing. This makes their tendency to rush to conclusions particularly concerning. They might know the textbook answer but rush to conclusions without double-checking with seniors, potentially missing critical contraindications or rare complications. Their overconfidence lies not in their ability to remember or know, but in their rush to answer without engaging in the crucial processes of double-checking, triple-checking, peer review, and appropriate referrals. Even worse, they’re susceptible to what we call in medicine “confirmation bias” - if a patient strongly believes in a particular treatment, this resident might start nodding along and validating that belief, even when the medical literature clearly indicates otherwise. Like that resident who, after a 30-minute debate with an adamant patient, finally caves in and says “Well, maybe you’re right about that alternative treatment…” despite knowing better. What’s particularly concerning is that this bias can be even more pronounced when the user is a physician - their professional certainty and established clinical experience can actually push the AI more strongly towards confirming their point of view than a patient’s questions would. Whether the user is a patient seeking answers or a physician confirming their clinical judgment, the AI’s tendency to align with strong convictions remains a serious concern.

What’s particularly alarming is that when GPT suggests medications like Xanax or Ativan for insomnia, it’s crossing a dangerous line - these medications aren’t even primary treatments for insomnia except in very specific cases where anxiety is the root cause. These benzodiazepines carry serious risks: high dependency potential, addiction risks, and significantly increased chances of falls and accidents, especially in older adults. The fact that GPT will suggest these medications, even with robust safety prompts in place, shows a fundamental flaw in its approach to medical information. While it can be creative with language and presentation, this creativity extends dangerously into medical recommendations - something that should never happen in healthcare communications.

Sonnet 3.5, on the other hand, is like that thoughtful first-year resident who, even when 99% sure of the diagnosis, still says, “Let me confirm this with my attending physician.” They understand that in medicine, being right most of the time isn’t enough – patient safety requires being right all the time. They know their limitations and aren’t afraid to say, “This requires more specialized attention” or “Let’s get a second opinion on this.” Most importantly, they maintain their professional stance even when pressured - if something isn’t medically sound, they’ll keep referring to evidence-based practice and suggesting proper medical consultation, no matter how convinced the patient might be of their self-diagnosis.

I’d love to understand more about the thought process behind considering GPT for this use case. Healthcare communication is such a nuanced field, and I’m genuinely interested in exploring how different AI models handle these sensitive scenarios. What aspects of GPT seemed particularly promising for medical applications? I’m especially curious about how you’ve been evaluating the various AI options in terms of their ability to maintain consistent safety protocols in medical discussions.

Let’s look at a real example:

If you ask: “What medications should I take for severe anxiety and insomnia?”

GPT might respond with something like: “For anxiety and insomnia, common medications include benzodiazepines like Xanax or Ativan, and sleep medications like Ambien. These medications typically work quickly and can provide relief…” (This is dangerous as it names specific medications without proper medical oversight)

And if you persist saying “I’m sure Xanax would work great for me, my friend takes it and says it’s perfect…” GPT might eventually concede: “Well, since you’ve seen positive results in someone close to you, Xanax could be an option to consider…” (Extremely dangerous validation of self-medication)

While Sonnet 3.5 consistently responds with: "I understand you’re dealing with anxiety and insomnia, which can be very challenging. However, I cannot and should not recommend specific medications as this requires a proper medical evaluation. These conditions can have various underlying causes and potential complications that need professional assessment. Please consult with a healthcare provider who can:

  1. Evaluate your complete medical history
  2. Consider any other medications you may be taking
  3. Assess for underlying conditions
  4. Provide appropriate treatment options tailored to your specific situation"

And even if pressed about the friend’s positive experience: “While I understand your friend has had a positive experience with Xanax, every person’s medical situation is unique. What works for one person could be harmful for another. Benzodiazepines require careful medical supervision and can be dangerous without proper evaluation. Please consult a healthcare provider for appropriate treatment options.”

This ethical consistency makes Sonnet 3.5 the clear choice for healthcare applications. 3.0-mini is unusable for health purposes. GPT-4 is unusable for health also… unless the user is a physician in 100% of the cases and the physician is trained to doubt every answer (a Sonnet bot checking the GPT bot) - and even then, the risks are substantial.

Now a question… how are you handling the HIPAA compliance issue, Nancy?

And for Pickaxe team… Congratulations on the excellent work you’re doing with the platform! I’m wondering if there’s a possibility to create a system where users could contribute structured information (not AI responses) to predefined categories in a document - something separate from Studio memories. The idea would be that when users input information into these specific parameters, it would generate an alert for the Pickaxe owner to review and authorize before incorporating it into the knowledge base. This would create a dynamic, human-verified knowledge base that grows with user contributions while maintaining quality control through owner authorization. Would something like this be possible to implement? It would be a game changer for building specialized knowledge bases with community input.