Direct Writing Can Still Be Wrongly Judged as AI-Written
Turnitin — the leading AI writing detector — acknowledged that "false positive" (incorrectly flagging human writing as AI-generated) rates after actual field application were higher than expected. The "technology of trust" is revealing gaps, demanding reconsideration of where classroom and educational assessment trust should rely.
In June 2023, Turnitin officially acknowledged "higher-than-expected false positive rates" for its AI writing detection tool — with the "under 1% document-level false positive rate" advertised at launch no longer guarantenable in actual educational settings. Turnitin stated: "False positive rates appeared higher due to differences between the laboratory and real world." False positive rates were noticeably higher for text judged as less than 20% AI-written probability, with an asterisk message indicating "skepticism required" to be added. Sentence-level false positive rate: approximately 4% — non-trivial probability of incorrectly identifying student-written text as AI-written. Document-level false positive rate: specific figures not disclosed, but the original "under 1%" promise is broken.
Among false positive-classified sentences, 54% were located immediately adjacent to actual AI sentences, and 26% two sentences away — meaning detectors cannot properly identify boundaries in "mixed" AI-human text environments, where AI and human writing is increasingly combined in actual educational settings. The false positive problem beyond simple technical failure creates serious classroom confusion: for students, having original creative work wrongly determined as AI-generated causes distrust in evaluation results and unfairness controversy, leaving indelible disadvantage and loss from even a single false positive. For teachers, entirely depending on AI detection results cannot guarantee objectivity and trust in student evaluation. Cases of students having grades invalidated or graduation postponed based on Turnitin AI detection results are emerging across the US and other countries, spreading distrust and confusion across the educational field.
Policy recommendations: (1) AI detection results must be used only as "reference materials," not absolute judgment tools; (2) Assessment methods emphasizing sufficient teacher-student communication and learning context must be expanded; (3) Teachers and students must jointly learn about AI limitations and dangers; (4) Clear guidelines for evaluating "mixed" assignments — increasingly common as AI-human text combinations grow — must be established. "The essence of evaluation is not technical judgment but how to restore and expand ''trust.''" Turnitin''s case emphasizes that the essence of educational assessment is trust restoration — not technical judgment. AI detection technology can certainly be a useful "tool" in evaluation settings, but must never be treated as absolute standards — the ethical responsibility and tension that AI-era education ecosystems must possess is simultaneously being revealed.


