Hey @user30
You’re on the right track using Pickaxe’s RAG system. When you upload a document (like a PDF), it doesn’t just “read” the whole thing every time, Pickaxe slices your document into chunks, turns those into vector embeddings, and then, when a user asks a question, it fetches only the most relevant pieces. That’s why you’re seeing retrieved text that doesn’t look like the original PDF format, the response is built from those extracted, semantically relevant snippets.
-
Chunking is automatic: The system, not the AI model itself, does the chunking and embedding. The model only gets the relevant bits at runtime, so it’s not memorizing your entire PDFs.
-
Source specificity: Because Pickaxe’s chunking is semantic, it can zoom in on highly relevant sections based on each user query. That’s the backbone of why you’re seeing detailed, on-point responses when it works well.
-
Model loyalty: Deepseek is currently performing best for you because it’s staying truest to the retrieved document chunks. That’s exactly what you want in a minimal-hallucination RAG workflow.
Addressing Your Observations
-
Response Variability: You’re noticing inconsistent detail levels in Deepseek’s answers. That’s pretty normal, sometimes the model interprets the context slightly differently, or the chunk retrieval pulls in more or less surrounding text based on subtle query changes.
-
Smaller Models in RAG: There’s a theory, and some real-world results, that smaller models can “hallucinate” less in RAG, because they rely more heavily on the retrieved chunks. But as you’ve seen, the real-world value is in what works for your data and your users. If Deepseek’s current setup is giving you the best faithfulness, you’re already ahead of the curve.
-
Format of Retrieved Chunks: It’s by design that you’re not seeing PDF-style formatting in outputs. The engine is pulling the actual text, not the layout or formatting.
-
Model Instability: If Deepseek is unstable or inconsistent, you can try slight tweaks in your prompt engineering, or even experiment with temperature settings (if available) to nudge toward more consistent outputs.
My Recommendation
Stick with what’s working, but keep an eye on new model releases in Pickaxe. If you spot smaller Deepseek variants or models marketed for low-hallucination RAG, give them a test run with your workflow. Meanwhile, don’t sweat the technical details: the magic is happening under the hood, and Pickaxe is handling the heavy lifting for you.
If you ever need to “see” exactly what chunks are getting retrieved for a given query, check if Pickaxe’s debug/logging features can show this. That can help you fine-tune your documents or prompts for even better responses.
Real-World Example
Let’s say you uploaded a 200-page legal doctrine PDF. When a user asks, “What is the primary precedent in Section 5?”, Pickaxe’s RAG engine quickly finds the 2-3 paragraphs in Section 5, feeds them to Deepseek, and generates a focused answer. That’s why output stays on-topic, even though the original file is huge.
Bottom Line
You’re already making smart choices. Pickaxe’s RAG, plus a document-faithful model like Deepseek, is a powerful combo for minimal-hallucination, doctrine-preserving chatbots. Keep building, and if you see new model options pop up, don’t hesitate to run an A/B test. If you hit any roadblocks or want to go even deeper on prompt tweaking, just ask, I’m always happy to help you get more out of your setup!
Let me know if you want to troubleshoot any specific use case or get tips on optimizing your knowledge base further!