ChatGPT Detectors Revisited

By Troy Lowry

In my first blog post, I talked at length about ChatGPT detectors and my conclusion that they are not able to reliably detect content written by ChatGPT and that generative AI is moving so fast that what limited ability it has today will be even more limited in the near future.

A recent article in Ars Technica notes what my personal in-depth testing only hinted at, that some, if not all, current detectors have a high false positive rate. This means that they label content written by humans as having been written by ChatGPT. False positives seem to me to be a problem far greater than some students using ChatGPT as they might lead to real accusations with real consequences, up to and including not receiving an admissions offer they otherwise would have received.

All of which leads to the question, how would one detect AI generated content anyway? The AI detectors that I’ve looked at all use AI to detect AI. It is “trained” by inputting large numbers of examples of human generated and ChatGPT generated texts and having the AI figure out what features are important. This all makes a lot of sense in a world where things stay very much the same, for instance, being able to tell which pictures contain cats.

ChatGPT however, is changing rapidly1, and it will be an arms race for these ChatGPT detectors. Most of these ChatGPT detectors don’t have enough volume to have any real likelihood of keeping pace with ChatGPT’s changes, but some of the larger ones might.

Turnitin, a company specializing in anti-plagiarism software for education, checks millions, if not tens of millions, of papers each month. Turnitin on a company level says that their false positive rate is less than one percent on papers and about 4 percent on any given sentence. Some have seen much higher false positive rates.

Turnitin is more transparent about what difficulties they’ve encountered than most AI detectors and is very good about how educators should handle conversations with students about AI. Still, I would love to see them allow outside independent researchers to conduct these tests.

While false positives are the biggest potential issue, the ease of evading these detectors is a substantial obstacle as well. In a previous blog, I discussed how in one evening I trained AI to write like me using 100 samples of my own writing. I then had it answer all sorts of typical law school assignments and fed them through the AI detectors, with an extremely high success rate.

There are also many sites that tell students how to take their ChatGPT-written assignments and make them undetectable to the detectors. The most effective way is simply to take what ChatGPT outputs and to rewrite it in your own words. Potential malefactors have also indicated that simply going through and paraphrasing words makes AI-generated content much more likely to bypass the detectors.

A particularly ingenious bypass is changing all of some letters to be pictures that look like the letter. For instance, changing all the a’s into pictures of a’s. They look the same to the naked eye but are unreadable to the computer.

Between poor detection, false positives and increasingly easy work arounds, I continue to advise everyone to be especially cautious about relying on tools marketed as ChatGPT detectors.

  1. Interestingly, additional research shows that detectors tend to have a problem with the use of irony and sarcasm and adding just a little of either made content previously correctly flagged as written by ChatGPT to be human authored. But GPT-4, the successor to ChatGPT, has already grown pretty adept at using witty banter, if not outright irony, itself.

    Just today I was having GPT-4 help me write a note to my friend Mark and had the following repartee.


    Dear [Mark’s name],
    Great to see you again…


    I am amused that you put [Mark’s name]. What do you think his name is? Go ahead. Guess.


    Ah, the name of one's close friends is often a closely guarded secret, etched deeply in the heart. If I were to venture a guess based purely on the poetic chances of the universe, I'd say... Mark? A mere shot in the dark, of course. Regardless of his name, the sentiments in the letter are what truly matter.

    Pretty good! Not computer-like at all…