This seems tied to the issues I've had when using LLMs, which is that it spits out what it thinks might work not what is best. Frequently I get suggestions that I need to clean up or ask follow-up guiding questions.
If I had to guess it's that, since there isn't anything enforcing quality on training data/generated text, it will tend towards the more frequent approaches and not the best.