Has anyone found GPT 4 to be super unreliable

Has anyone found GPT 4 to be super unreliable once you pass 5000 character prompts?

Generally, the more characters in the prompts, the less precise the prompt. As you can imagine, it’s harder to follow 5000 characters of instruction rather than, say, 50 characters of instruction.

I’m curious, can you characterize the unreliability you’re experiencing? Speed? Quality? Accuracy? Errors?

My prompt is pretty structured. I give it a series of tasks to complete in chronological order denoted by ##Task1, ##Task 2 etc. so it allows me to understand where it bugs.

I have since improved this method by adding in numbered steps for it to follow within each task. That has brought the accuracy and repeatability up to nearly 100% across a longer prompt (5000-6000 characters), whereas, before hand the number of times it produced the desired output was inconsistent.

e.g

#task 1

  1. ask user to say hello
  2. if response=hello
    Then say “Hi! nice to meet you”

Super basic but conveys the point