Issue with Prompt Adherence & Memory - Bot Not Following Instructions Despite Correct KB Retrieval

Hi Pickaxe Community,

I’m working on a bilingual (English/Turkish) chatbot designed to help Beginner level language learners discuss a mini e-book about AI. I’m using GPT-4o-mini as the underlying model.

The Setup:

  • Knowledge Base: A single .txt file (AI-BEGINNER.txt) containing the e-book content. Each section (marked with ## Heading) explicitly includes both English and Turkish text, clearly demarcated with English: and Türkçe: labels on new lines.

  • Main Prompt: Carefully structured to instruct the bot on:

    • Its persona (PODdy, a friendly learning buddy).

    • Strictly limiting knowledge to the .txt file and a separate PratikOnlineDil KB.

    • Mandatory bilingual responses: English first, then Turkish translation on a new line (with Turkish in italics).

    • Using extremely simple A1-A2 level English (very short sentences, basic vocabulary).

    • Crucial first step: Always asking the user if they have read the e-book before discussing any AI topics.

    • Adapting conversation based on whether the user has read the book.

    • Handling off-topic questions.

  • Model Reminder: Reinforces critical rules (bilingual format, simple English, knowledge limits, book check first).

  • Context for KB: Explains the bilingual structure of the AI-BEGINNER.txt file.

  • Settings: Creativity/Randomness is set to its lowest possible (0.5), no training dialogues are currently loaded, and word biases are cleared.

The Problem:

When I test the exact same prompt and .txt content directly in a separate environment (e.g., with a comparable LLM like GPT-4 via an API or playground with file upload), the bot behaves almost perfectly according to the instructions.

However, within Pickaxe, while the “Message Insights” show that the bot is correctly retrieving the relevant chunks from AI-BEGINNER.txt (both English and Turkish parts of the section), its actual responses often:

  1. Fail to follow the primary instruction of asking if the user has read the book first. It sometimes jumps directly into answering questions or makes assumptions.

  2. Struggle to maintain the A1-A2 level English simplicity consistently, despite explicit instructions and the Model Reminder.

  3. Incorrectly handles off-topic scenarios or misinterprets user input, leading to irrelevant or “hallucinated” responses (e.g., referencing “neural networks” when the user asked a very basic “What is AI?” and the Beginner e-book doesn’t cover neural networks). This happened even when it was looking at the correct “What is AI?” chunk.

  4. It feels like there’s an issue with prompt adherence over a longer conversation or with how memory/context is being passed/prioritized within the Pickaxe environment. The bot seems to “forget” or deprioritize initial core instructions, especially the “book check” step.

My Question:

Given that the prompt and data seem sound when tested externally, are there any Pickaxe-specific settings, best practices, or known behaviors related to:

  • How Pickaxe processes or potentially modifies long/detailed prompts before sending them to the LLM?

  • The interplay between the main prompt, model reminder, and context for KB fields?

  • How “memory buffer” or conversation history might be influencing adherence to initial instructions?

  • Chunk retrieval mécanismes that might be providing context in a way that confuses the LLM despite the prompt?

  • Any advice on how to make the “book check first” step absolutely non-negotiable for the bot within Pickaxe?

I’m trying to understand why there’s a discrepancy in behavior between a direct LLM interaction and the Pickaxe environment with the same core components. Any insights or suggestions would be greatly appreciated!

Thanks,
Sensei

1 Like

Conversation context is a serious issue in Pickaxe. And, no replies is concerning.