Thanks for fixing the context. Definitely curious what happened. I am getting ready to introduce Finance Sector people to some use cases and I know they are going to ask about security.
We’ve made some changes to our submission API endpoint, it now uses a single stream, similar to how other AI providers do it.
To handle lots of users at once, our system allows multiple requests to be processed at the same time (concurrency). This is expected behavior for public-facing endpoints.
We have a class responsible for handling all requests to the LLM provider. Think of it as a worker that prepares the messages and sends them to the AI provider.
The issue was that the request payload to the LLM provider (e.g., OpenAI, Anthropic, etc.) was being shared between concurrent calls to the endpoint. We thought initializing the worker each time would start fresh, but it didn’t, which caused the problem.
Thank you. Appreciate you breaking this down. I understand large changes cause unexpected behaviors at times. Letting us know, and providing a level of transparency similar to this is really helpful!