There simply isn't enough entropy in prose to accurately detect the use of language models if either
- the text is short enough, like a school essay, or
- there has been enough of a human-ai collaboration
Like I'm sure many of us are familiar with ChatGPT's style when asked to write text on its own or with very little prompting. However, that's just the raw style, and ChatGPT is only one of the many language models (albeit it's clearly the most accessible). If you provide example prose for the AI to imitate, for example with a simple tool like Sudowrite (the use of which tends to be the subject of many accusations) you will not pick out those segments from human-written prose unless the human using it is too lazy or too stupid to leave in obvious tells.
The sooner we let go of this comfortable fantasy that AI somehow leaves in easy to isolate markers that enable a different (and vastly inferior) AI model to tell if the text was AI or not, the better. The simple truth is that if that was the case, AI companies would use the same isolation strategies to teach their AI to imitate human prose better, therefore breaking detection.
And with a high chance for false positives we're just going to recreate a cyberpunk version of the Salem witch trials. Because we simply have no proof -- if you don't trust ChatGPT with anything important, why would you trust a vastly less sophisticated AI, or something that amounts to a gut feeling, to condemn people?