Spot on here! I've only just been using stable Diffusion for a couple of weeks now to help me visualise characters and locations in my world building. It's such a great tool when you really have no artistic skill. But the limitations soon become apparent and a lot of problem solving goes into trying to regenerate the simplest things.
Can generate a near perfect image in a minute if the prompts are right and you get lucky. But the details take hours, where an artist would be able to simply visualise and draw it in.
I think the key is to develop basic skills to draw a really shit mockup of what you want, then img2img it from that... I'll get there, maybe.