Academic Honesty and Integrity

Comparing AI Detection Tools: One Instructor’s Experience

Comparison of different programs that claim to detect AI-generated text

The post below was written by Dr. Ellie Andrews, an instructor in the Department of Anthropology and Geography. In this piece, Dr. Andrews shares her experience trying to verify authentic student writing in the two large section courses she taught in the spring of 2023. She shares the pedagogical challenges she faced creating AI-resistant assignments and the varying results she discovered while trying to use AI detection tools available to instructors on the internet. Thank you to Dr. Andrews for sharing this work. Her bio can be found at the bottom of this post. 

 

Student paper Sapling Copyleaks ZeroGPT OpenAI Text Classifier CrossPlag Notes
1 100% 89% 87% OK 100% admitted to using “paraphrasing tools”
2 100% 92-95% 96% OK OK admitted using AI
3 100% 92% 11% “unclear” 98% did not respond to requests to meet
4 100% 87-90% 7% OK 100% admitted to “copy and pasting”
5 77% 99% 53% OK 100% admitted using AI
6 75% 80–93% 89% OK OK boyfriend wrote paper, he denied using AI
7 74% 99% 92% OK OK did not respond to requests to meet
8 89% OK 33% OK OK denied completely
9 76% OK 7% OK OK I did not contact
10 71% OK 18% OK OK admitted to using “paraphrasing tools”
11 25% 63–86% 23% OK OK used outside sources for “connective phrases”
12 24% 82% 22% OK 12% denied using AI, used Reverso “rephraser”
13 24% OK 23% OK OK I did not contact
14a 12% OK 11% OK OK admitted to using AI; 14a is entire essay; 14b is AI-generated paragraphs
14b 100% 98% 46% “unclear” 100%

Takeaways

  • I looked for signs of AI-generated text in student essays; most of the essays that failed the AI-detection programs had one or more of the following:
    • a lack of quotation marks
    • formulaic or bland concluding paragraphs
    • bland, overly generic statements
    • information that did not appear in the article students were asked to analyze
    • different formatting or tone in different parts of the text
    • Note: I formulated these signs after spending 1–2 hours experimenting with ChatGPT.
  • Programs are not very consistent with one another in detecting AI-generated text, meaning that instructors may want to use more than one.
  • Some programs are typically more “suspicious” (Sapling, Copyleaks) and others more “generous” (OpenAI Text Classifier).
  • All programs have trouble identifying shorter passages within longer essays that are likely AI generated, likely because of word limits (14a is the results of an entire essay, 14b is only two paragraphs from that essay that were AI generated). It is probably best to copy and paste only the text that seems problematic.

I teach two sections of Introduction to Geography with the help of two TAs (thank you, Tom Chittenden and Tanmoy Malaker!). For the latest writing assignment, I revised a prompt from the previous semester, asking students to apply three political economic terms / theories to a contemporary situation described in one of four articles that they could choose from. (The full assignment description is at the end of this article—many thanks to Dr. Heidi Hausermann for the original assignment). I knew the prompt wasn’t AI-proof, in part because ChatGPT may have access to the articles (all of them were published before 2021, the most recent data that ChatGPT was trained on, although they may or not have actually been included in the data that it was trained on).

[Note]
Other AI text-generation programs include Bard and Bing, but it’s my understanding that ChatGPT is the most widely used. I do not know what data those programs were trained on.

In class, in order to circumvent the use of AI, I stressed the need for specific quotes and details from the selected article.

We received 156 papers. Before my TAs began to grade them, I skimmed all of them to search for signs of AI-generated text (based on having played around a bit with ChatGPT previously):

  • a lack of quotation marks, indicating that a student had not quoted from the article as recommended
  • formulaic or bland concluding paragraphs that began, “In conclusion…, X, Y, and Z” (often three concepts, in the overall form of a five-paragraph essay)
  • bland qualifying statements such as “It is important to note…”—student writing may use these phrases, too, but such phrasing prompted me to look further
  • information that did not appear in the original article
    • one paper included the passages: “coal companies have been able to avoid taking responsibility for the disease by using legal and medical tactics to deny workers’ compensation claims…. workers often lack access to adequate healthcare, workers’ compensation, and legal resources.” But workers’ comp and legal tactics are not discussed in the original NPR
    • another paper drew a comparison to a similar mining disaster, the Upper Big Branch mine explosion, which was not discussed in the article either.
  • different formatting or tone in different parts of the text
    • see the image below; the second two paragraphs are formatted differently and are free of the kind of errors found in the first paragraph.
shows example of student paper where the second and third paragraphs are formatted differently and free of errors compared to first paragraph
  • not meeting length requirements (1000 words, plus or minus 10%)
  • other signs of being rushed or cutting corners
    • one student had entitled their paper, “I This [sic] Was Due Next Week For Some Reason and Started Doing it at 10:30 PM,” which may be disarmingly honest, but also merited a closer look.

In the end, these last two criteria were not as useful as others, or at least corresponded less to the programs’ results and student conversations.

Based on these criteria, the TAs and I flagged sixteen papers (I found eight with “red” flags and seven with “orange” flags, meaning I was more uncertain about whether there was an AI problem); the TAs, while reading the papers more closely for grading them, flagged another one. (On closer look, two of these seemed to be likely written by students, and were not run through the programs; that’s why there are only fourteen in the table above.)

I decided to run these papers through multiple AI-detection programs after seeing a post on the Facebook group “Teaching with a Sociological Lens” by Dr. Nikki Civettini from Winona State University in Minnesota; she had compared four different programs (GPTzero, Sapling, Copyleaks, and CrossPlag) for 97 short essays. Twelve of those essays were rated as 95% or higher (whether as a percentage of the text as AI-generated or as a likelihood that it was AI-generated, I’m not sure); two were flagged to be AI by all four programs. Unlike my findings here, she did not find any patterns for specific checkers being more or less likely to flag an essay as all or partly AI-generated; she wrote, “they seemed similarly likely to be the ‘dissenter’ when the detectors disagreed”).

Conversations with Students

I contacted twelve students and had meetings with ten. In those conversations, I framed the programs’ results as one data point and the conversation as another data point, meaning that I was transparent about being concerned about AI-generated text but not wholly trusting the results of the AI-detection programs.

Most students who I spoke to admitted to using tools to help them write, naming a variety of such tools. One international student whose first language is not English said he used “paraphrasing tools,” which he apparently often does; he also uses Grammarly. Another student also said he used “paraphrasing tools”—he says he wrote the essay himself, then ran it through such a tool a few sentences at a time in order to make it sound more sophisticated, improve the vocabulary, and fix mistakes. Another said he used Reverso, which has a “rephraser” tool (e.g., “We should ask a question” à “We ought to ask a question,” “Let’s ask a question,” “Let’s get a question in.”). He had been shown the tool in his first year in college by an instructor who encouraged him to use it. Another said she used outside articles for “connective phrases” (she appeared to be referring more to plagiarism than AI-generated text, but Turnitin did not flag anything). Another said her boyfriend had written the paper (he is not taking the class), but that he had denied using AI. I list all these descriptions to make the point that there are many ways that students may cheat, but some of them are decidedly not black and white. For instance, I am comfortable with students using Grammarly; I do not know enough about various paraphrasing tools to know how comfortable I am with them.

Specific Programs

All of the AI-detection programs are easy to use. I copied and pasted text from the essays directly into the programs’ textboxes, deleted the students’ names and other information at the top, deleted any extraneous line breaks in text copied from pdfs (time-consuming, and not necessarily helpful, but I wasn’t sure), and submitted them.

I only used the simplest, free versions of the programs described below. OpenAI Text Classifier requires signing up (it is the same account that is required for ChatGPT). So does CrossPlag. Sapling has the option for a free account with more features and free trial of the premium version.

As shown below, the programs describe their results in slightly different ways: as X% of the total being AI generated or as X% “likely” to be AI generated.

Below, I show some of the programs’ different results for the essay from which the screenshot above is taken. The first part of the essay was all or mostly written by the student; three out of the final four paragraphs were almost certainly AI generated (based on a conversation with the student and the formatting).

Sapling truncates the text after 2000 characters, or about 400–450 words. For this reason, it did not flag essay #14 in its entirety as problematic because only the last few paragraphs were AI-generated. I copied in the entire essay, which was then truncated, so the program gave me a result of 0% AI-generated, or “fake.”

showing Sapling message: Query has been truncated. Fake is 0 percent

I then copied in the final few paragraphs (475 words), but again the most egregious text at the end was cut off, so Sapling only flagged it as 11.7% “fake.” I then copied in only the passage screenshotted above, but again some of the most egregious text at the end was cut off, and the result was 25.2% “fake”:

Finally, I just copied the two final paragraphs; they came back as 100% “fake.”

Sapling output showing 100% fake

Sapling gives a disclaimer: “No current AI content detector (including Sapling’s) should be used as a standalone check to determine whether text is AI-generated or written by a human. False positives and false negatives will regularly occur.” It also says it uses different techniques to evaluate entire passages versus individual sentences, so they should be evaluated together (sentence-level detection not shown here).

Copyleaks has a daily limit of approximately twenty submissions. It claims 99.12% accuracy and highlights different passages as “X% probability for AI.”

ZeroGPT claims to have higher than 98% accuracy. Results are given qualitative terms, e.g.,

  • Your text is Human written
  • Your text is AI/GPT Generated
  • Most of Your text is AI/GPT Generated
  • Your text is Most Likely AI/GPT generated
  • Your text is Likely generated by AI/GPT
  • Your text contains mixed signals, with some parts generated by AI/GPT
  • Your text is Likely Human written, may include parts generated by AI/GPT
  • Etc.

It also estimates the total percentage of the text that is AI generated.

Here is ZeroGPT’s assessment of the two paragraphs from the excerpt pictured above (14b).

The first paragraph is very likely AI generated, so it is puzzling that it is not highlighted here.

example of zeroGPT output showing 45% fake

This tool was disappointingly “generous,” with false negatives. It only flagged two essays as “unclear” regarding whether or not they contained AI-generated text (one of which was the one screenshotted above).

CrossPlag requires signing up for an account (free) with your name and email address.

CrossPlag also has a word limit of 1000 words, which did not affect these results, as most essays were approximately 1000 words.

Colorado State is not (yet?) using Turnitin’s AI-detection function.

Rubric and Writing Assignment

Length: 1000 words plus or minus 10% (so, 900–1100 words including everything)

Due: April 6, 11:59pm via Canvas > Assignments or Canvas > Module 10

A key component of critical thinking and writing is the ability to take concepts and apply them to other situations. This is what you are asked to do in this assignment. Chapter 7 of the textbook by Robbins et al. (2022) introduces various concepts and terms related to political economy. This assignment asks you to apply those ideas to one of the photojournalism articles below.

Step 1: Review each of the following photojournalism pieces, then pick one for analysis.

  1. Berkes, H., Jinghan, H., & Benincasa, R. (2018, December 18). An epidemic is killing thousands of coal miners. Regulators could have stopped it. National Public Radio. npr.org/2018/12/18/675253856/an-epidemic-is-killing-thousands-of-coal-miners-regulators-could-have-stopped-it
  2. Paddock, R.C. (2019, November 14). To make this tofu, start by burning plastic. The New York Times. nytimes.com/2019/11/14/world/asia/indonesia-tofu-dioxin-plastic.html
  3. Stewart, N., Jones, R.C., Peçanha, S., Furticella, J., & J. Williams. (2019, October 23). Underground lives: The sunless world of immigrants in Queens. The New York Times. nytimes.com/interactive/2019/10/23/nyregion/basements-queens-immigrants.html
  4. Marosi, R. & Bartletti, D. (2014, December 7). Hardship on Mexico’s farms, a bounty for U.S. tables. LA Times. http://graphics.latimes.com/product-of-mexico-camps

Note: the New York Times is available to all CSU students; go to the following website to set it up: https://source.colostate.edu/nytimes-com-free-to-csu-students-employees. The LA Times allows a certain number of free articles per week and NPR is always freely accessible. There is no need for sources other than those listed above and the textbook.

Step 2: Study the piece carefully, including all of the photographs and/or short videos, in order to fully understand the issue.

Step 3: In 3–4 paragraphs, analyze the issue presented in the piece using three of the following theories and concepts from the lectures on political economy and chapter 7 of the textbook by Robbins et al. (2022) (most of the chapters in Part II of the textbook have sections on political economy as well: 11, 12, 15–19, which may be useful).

  • capital
  • class (capitalists and laborers)
  • commodity
  • fetishism
  • commodity
  • frontiers
  • conditions of production
  • dependency theory
  • deregulation
  • exploitation (technical definition)
  • first contradiction of
    capitalism
  • free trade
  • globalization
  • means of production
  • neoliberalism
  • offshoring
  • overaccumulation
  • political ecology
  • primitive accumulation
  • privatization
  • second contradiction of
    capitalism
  • spatial fix
  • the “free market”
  • the invisible hand
  • uneven development
  • unions
  • zones
  • Demonstrate your understanding of the theories and concepts you are applying. This requires defining and explaining these terms in your own words.
  • Provide specific evidence from the photojournalism article to illustrate or back up your claims, whether in the form of direct quotes or non-quoted descriptions. The particulars are very important! You may also connect the article to other cases discussed in class. Some summary may be necessary, but should not be more than a few sentences long.
  • You should not give your personal opinion (don’t use the word “I”) except in the final paragraph (see Step 4).
  • These are short papers; there’s no need for a repetitive introductory or concluding paragraph. Just get to the heart of the issue!

Step 4: Write a final paragraph reflecting on the process of analyzing the article and writing. Was anything confusing about the political economic terms that you chose? Did writing about them help you understand them better? Did you use other tools to better understand them? What strategies did you use to conduct an analysis? What strategies might be useful in the future? You do not have to answer all of these questions; they are just meant to prompt a thoughtful, useful reflection about the material itself and your process of writing about it.

Step 5: Provide an APA-style citation of the article and, if you used it, the textbook (hint: the citations above are in APA style). Any URLs should not include colostate.

Step 6: Give your paper a title and submit it!

This assignment will be evaluated based on the criteria below. Grammar and syntax, etc., will not be graded, as they are not taught in this class and there are many reasons why one student may have better grammar than another; still, presentation matters, so run the paper through spellcheck, check for typos, and read it out loud to find spots that sound weird before submitting it. Assignments will be run through Turnitin, a program that detects plagiarism, so be sure to use your own words or quote appropriately (with quotation marks and a citation).

  • Demonstrate your understanding of the theories and concepts you are applying. This requires defining and explaining these terms in your own words.
  • Provide specific evidence from the photojournalism article to illustrate or back up your claims, whether in the form of direct quotes or non-quoted descriptions. The particulars are very important! You may also connect the article to other cases discussed in class. Some summary may be necessary, but should not be more than a few sentences long.
  • You should not give your personal opinion (don’t use the word “I”) except in the final paragraph (see Step 4).
  • These are short papers; there’s no need for a repetitive introductory or concluding paragraph. Just get to the heart of the issue!

Step 4: Write a final paragraph reflecting on the process of analyzing the article and writing. Was anything confusing about the political economic terms that you chose? Did writing about them help you understand them better? Did you use other tools to better understand them? What strategies did you use to conduct an analysis? What strategies might be useful in the future? You do not have to answer all of these questions; they are just meant to prompt a thoughtful, useful reflection about the material itself and your process of writing about it.

Step 5: Provide an APA-style citation of the article and, if you used it, the textbook (hint: the citations above are in APA style). Any URLs should not include colostate.

Step 6: Give your paper a title and submit it!

This assignment will be evaluated based on the criteria below. Grammar and syntax, etc., will not be graded, as they are not taught in this class and there are many reasons why one student may have better grammar than another; still, presentation matters, so run the paper through spellcheck, check for typos, and read it out loud to find spots that sound weird before submitting it. Assignments will be run through Turnitin, a program that detects plagiarism, so be sure to use your own words or quote appropriately (with quotation marks and a citation).

Criterion Possible points (14.5 total) A+ (100%)
or A (95%)
B (85%) C/D (70%) (0)
Terms or concepts (3) 3 terms
x
2 points
= 6
Term is clearly defined in the student’s own words. Term is included and explained accurately, albeit imperfectly. Term is included but explained confusingly or wrongly. Term is missing.
Application of terms to specific issues from the article 3 issues
x
2 points
= 6
Examples from the articles are specific, e.g., quotes or details. Connections to terms are clearly made and demonstrate a solid understanding and thoughtfulness. Examples are nonspecific. Student summarizes but does not engage in analysis, or examples are generic. The writing relies on personal experience or opinion.
References 1 Citations are provided in APA style. APA-style citations are provided, with a few mistakes (e.g., bad URL). Citations are provided, but not in APA style or with a lot of mistakes. No citations are provided.
Quality of the writing 1.5 Writing is clear, concise, and logically organized. Some small errors or wordiness. Writing is occasionally unclear or poorly organized. Poor quality of writing takes away from the meaning throughout.

For every 1–100 words that the paper is short of the required word count, it will lose 20%, the equivalent of two letter grades.

More about Dr. Andrews:

Ellie Andrews is an adjunct instructor in the Department of Anthropology and Geography. She holds a PhD from Cornell University (Development Sociology) and a master’s degree from the Pennsylvania State University (Geography). She has taught peace and justice to high schoolers, sociology to incarcerated students, and writing to undergraduates–but is especially drawn to teaching anyone, anywhere, about the environment: “In the end, we will conserve only what we love, we will love only what we understand, and we will understand only what we are taught” (Baba Dioum).