There was a post just a few hours ago on the frontpage asking not to use AI for writing [0]. I copied the content and pasted it on multiple "AI detection" tools. It scored from 0% and to up to 80%. This is not gonna cut it. As someone who used LLMs to "improve" my writing, after a while, no matter the prompt, you will find the exact same patterns. "Here's the kicker" or "here is the most disturbing part" those expressions and many more come up no matter how your engineer the prompt. But here's the kicker, real people also use these expressions, just at a lesser rate.
Detection is not what is going to solve the problem. We need to go back and reevaluate why we are asking students to write in the first place. And how we can still achieve the goal of teaching even when these modern tools are one click away.
My two cents about this after working with some teachers: this is a cat and mouse game and you're wasting your time trying to catch students writing essays on their own time.
It is better to pivot and not care about the actual content of the essay, but instead seek alternate strategies to encourage learning - such as an oral presentation or a quiz on the knowledge. In the laziest case, just only accept hand-written output - because even if it was generated at least they retained some knowledge by copying it.
Slightly off-topic. A client uses Stripe for a small business website. We got an automated email saying a transaction was flagged as potentially fraudulent. We should investigate and possibly refund before a chargeback occurs. What? Is this a stolen card or what?
So I inquired with the chatbot and they list possible causes of a flagged transaction could be stolen card, as well as a few other examples which amount to a mix of service issues which are customer-determined. But the bot says it’s definitely not a chargeback. What?
So now I contact support. They say it’s a flag from the credit card issuing bank. Wait. What? Is this a fraudulent stolen card or not? Still no. It’s just a warning based on pattern usage. Why you passing this slop to my client? If there is a pattern problem, the flag should go to the customer who authorizes the charge. Otherwise it’s a chargeback or known stolen card.
They say, well, you can contact the customer. What? If the pattern is actually a stolen card, which is listed as a possible cause of the flag while not saying it is or isn’t, then they can just lie!
Which is a lot to say this pattern matching for fraud or negative patterns suffers from idiocy, even in the simplest of contexts.
While it’s interesting work, so far my experience is that AI isn’t good enough (or most people aren’t good enough with AI) for detection to really be a concern, at least in “research” or any writing over a few sentences.
If you think about the 2x2 of “Good” vs “By AI”, you only really care about the case when something it good work that an AI did, and then only when catching cheaters, as opposed to deriving some utility.
If it’s bad, who cares if it’s AI or not, and most AI is pretty obvious thoughtless slop, and most people that use it aren’t paying attention to mask that, so I guess what I’m saying is for most cases one could just set a quality bar and see if the work passes.
I think maybe a difference AI brings is that in many cases people don’t really know how to understand or judge the quality of what they are reading, or are to lazy to, so have substituted as proxies for quality the same structural cues that AI now uses. So if you’re used to saying “it’s well formatted, lots of bulleted lists, no spelling mistakes, good use of adjectives, must be good”, now you have to actually read it and think about it to know.
The paper Artificial Writing and Automated Detection by Brian Jabarian and Alex Imas examines the strange boundary that now divides human expression from mechanical imitation. Within their analysis one feels not only the logic of research but the deeper unease of our age, the question of whether language still belongs to those who think or only to those who simulate thought. They weigh false positives and false negatives, yet behind those terms lives an older struggle, the human desire to prove its own reality in a world of imitation.
I read their work and sense the same anxiety in myself. When I write with care, when I choose words that carry rhythm and reason, I feel suspicion rather than understanding. Readers ask whether a machine has written the text. I lower my tone, I break the structure, I remove what once gave meaning to style, only to make the words appear more human. In doing so, I betray something essential, not in the language but in myself.
The authors speak of false positives, of systems that mistake human writing for artificial output. But that error already spreads beyond algorithms. It enters conversation, education, and the smallest corners of daily life. A clear sentence now sounds inhuman; a careless one, sincere. Truth begins to look artificial, and confusion passes for honesty.
I recall the warning of Charlotte Thomson Iserbyt in The Deliberate Dumbing Down of America. She foresaw a culture that would teach obedience in place of thought. That warning now feels less like prophecy and more like description.
When people begin to distrust eloquence, when they scorn precision as vanity and mistake simplicity for virtue, they turn against their own mind. And when a society grows ashamed of clear language, it prepares its own silence. Not the silence of peace, but the silence of forgetfulness, the kind that falls when no one believes in the power of words any longer.
Artificial Writing and Automated Detection [pdf]
(nber.org)45 points by mathattack 27 October 2025 | 30 comments
Comments
Detection is not what is going to solve the problem. We need to go back and reevaluate why we are asking students to write in the first place. And how we can still achieve the goal of teaching even when these modern tools are one click away.
[0]: https://news.ycombinator.com/item?id=45722069
It is better to pivot and not care about the actual content of the essay, but instead seek alternate strategies to encourage learning - such as an oral presentation or a quiz on the knowledge. In the laziest case, just only accept hand-written output - because even if it was generated at least they retained some knowledge by copying it.
So I inquired with the chatbot and they list possible causes of a flagged transaction could be stolen card, as well as a few other examples which amount to a mix of service issues which are customer-determined. But the bot says it’s definitely not a chargeback. What?
So now I contact support. They say it’s a flag from the credit card issuing bank. Wait. What? Is this a fraudulent stolen card or not? Still no. It’s just a warning based on pattern usage. Why you passing this slop to my client? If there is a pattern problem, the flag should go to the customer who authorizes the charge. Otherwise it’s a chargeback or known stolen card.
They say, well, you can contact the customer. What? If the pattern is actually a stolen card, which is listed as a possible cause of the flag while not saying it is or isn’t, then they can just lie!
Which is a lot to say this pattern matching for fraud or negative patterns suffers from idiocy, even in the simplest of contexts.
For example “delve” and the em-dash are both a result of the finetuning dataset, not the base LLM.
If you think about the 2x2 of “Good” vs “By AI”, you only really care about the case when something it good work that an AI did, and then only when catching cheaters, as opposed to deriving some utility.
If it’s bad, who cares if it’s AI or not, and most AI is pretty obvious thoughtless slop, and most people that use it aren’t paying attention to mask that, so I guess what I’m saying is for most cases one could just set a quality bar and see if the work passes.
I think maybe a difference AI brings is that in many cases people don’t really know how to understand or judge the quality of what they are reading, or are to lazy to, so have substituted as proxies for quality the same structural cues that AI now uses. So if you’re used to saying “it’s well formatted, lots of bulleted lists, no spelling mistakes, good use of adjectives, must be good”, now you have to actually read it and think about it to know.
I read their work and sense the same anxiety in myself. When I write with care, when I choose words that carry rhythm and reason, I feel suspicion rather than understanding. Readers ask whether a machine has written the text. I lower my tone, I break the structure, I remove what once gave meaning to style, only to make the words appear more human. In doing so, I betray something essential, not in the language but in myself.
The authors speak of false positives, of systems that mistake human writing for artificial output. But that error already spreads beyond algorithms. It enters conversation, education, and the smallest corners of daily life. A clear sentence now sounds inhuman; a careless one, sincere. Truth begins to look artificial, and confusion passes for honesty.
I recall the warning of Charlotte Thomson Iserbyt in The Deliberate Dumbing Down of America. She foresaw a culture that would teach obedience in place of thought. That warning now feels less like prophecy and more like description.
When people begin to distrust eloquence, when they scorn precision as vanity and mistake simplicity for virtue, they turn against their own mind. And when a society grows ashamed of clear language, it prepares its own silence. Not the silence of peace, but the silence of forgetfulness, the kind that falls when no one believes in the power of words any longer.