New OpenAI Classifier for Detecting AI-Written Text

Written on Apr 18, 2023. Posted in Negative AI.

This article was initialized by a human, created by AI, updated by a human, copy edited by AI, final copy edited by a human, and posted by a human. For human benefit. Enjoy!

OpenAI has trained a classifier that can differentiate between text created by a human and text created by various AI providers. Although it is not possible to identify all AI-generated text with certainty, classifiers can help prevent false claims that AI-written text was created by humans. This is particularly important in situations where AI-generated text is used to spread fake news or promote academic dishonesty, or when an AI chatbot is disguised as a human.

Open AI's classifier is not fully reliable. In evaluations on a set of English texts, the classifier correctly identifies 26% of AI-written text (true positives) as "likely AI-written," while labeling human-written text as AI-written 9% of the time (false positives). The classifier's reliability usually improves as the input text's length increases. Compared to the previously released classifier, this new classifier is much more reliable on text from recent AI systems.

Open AI offers a free work-in-progress classifier that you can try out yourself.

Try the Classifier

Limitations

Open AI's classifier has a number of important limitations. It is not recommended to use it as the sole means for making any important decisions. Instead, it should be used in combination with other methods to determine the origin of a piece of text.

The Open AI classifier is not reliable on short texts (below 1,000 characters). It can also make mistakes in labeling longer texts.
The classifier may wrongly label human-written text as AI-written.
The classifier works best with English text and may not work well with other languages or code.
If the text is predictable, it may be hard to determine whether it was written by an AI or a human. For example, a list of the first 1,000 prime numbers will always have the same correct answer.
AI-written text can be changed to avoid detection by the classifier. While the classifier can be updated to address new attacks, it is unclear how effective this approach is in the long-term.
Classifiers based on neural networks, like the Open AI classifier, may not work well with inputs that are very different from the training data. In these cases, the classifier can be very confident in a wrong prediction.

Training the classifier

Open AI's classifier is a special computer program that helps distinguish between text written by humans and text written by AI. It was trained on a dataset that contains pairs of human-written and AI-written text on the same topic. The dataset was collected from various sources, including pretraining data and human demonstrations on prompts submitted to InstructGPT. The text was divided into a prompt and a response, and AI-generated responses were generated for each prompt using various language models from Open AI and other organizations.

The classifier is used in a web app that adjusts the confidence threshold to keep the false positive rate low. This means that the app only marks text as likely AI-written if the classifier is very confident. However, there are limitations to the classifier. It is less reliable on short texts, can sometimes mislabel human-written text as AI-written, and is recommended only for English text. Also, AI-generated text can be edited to avoid detection by the classifier. Finally, the classifier is a language model fine-tuned on a dataset, so it may not be calibrated properly outside of its training data.

Impact on educators and call for input

Open AI acknowledges that detecting AI-written text is an important topic for educators to discuss. It's equally important to recognize the limits and impacts of AI generated text classifiers in the classroom. Open AI has developed an preliminary resource for educators on the use of ChatGPT. This resource highlights some uses, limitations, and considerations. Although this resource is aimed at educators, Open AI expects that their classifier and associated tools will also affect journalists, misinformation and disinformation researchers, and other groups.

Open AI is reaching out to educators in the United States to gain insights into their experiences with ChatGPT in the classroom. Open AI aim to have discussions with them regarding the capabilities and limitations of its classifier. As part of their mission to deploy large language models safely, they believe it is important to engage with affected communities directly. they plan to expand their outreach as they continue to learn from these conversations.

If you're someone who is affected by the issues related to ChatGPT's capabilities and limitations (including teachers, administrators, parents, students, and education service providers), Open AI would appreciate your feedback. You can share your thoughts with them by filling out this form. Open AI value your direct feedback on their preliminary resource, and they also welcome any helpful resources that you may have developed or found useful, such as course guidelines, honor code and policy updates, interactive tools, and AI literacy programs.

Interested in the latest updates on AI technology? Follow us on Facebook and join our group (Link to Group) to leave your comments and share your thoughts on this exciting topic!

New OpenAI Classifier for Detecting AI-Written Text

Limitations

Training the classifier

Impact on educators and call for input

Site Links

Documentation

Login

Support

Contact Us

New OpenAI Classifier for Detecting AI-Written Text

Limitations

Training the classifier

Impact on educators and call for input

Site Links

Documentation Login Support

Contact Us

Documentation

Login

Support