ChatGPT fails to pass accounting exams in human capabilities study

The students scored an average of 76.7% on the exams, while the ChatGPT only scored 47.4%.

 A keyboard is seen reflected on a computer screen displaying the website of ChatGPT, an AI chatbot from OpenAI, in this illustration picture taken Feb. 8, 2023. (photo credit: REUTERS/FLORENCE LO/ILLUSTRATION/FILE PHOTO)
A keyboard is seen reflected on a computer screen displaying the website of ChatGPT, an AI chatbot from OpenAI, in this illustration picture taken Feb. 8, 2023.
(photo credit: REUTERS/FLORENCE LO/ILLUSTRATION/FILE PHOTO)

Researchers from Brigham Young University (BYU) in Utah collaborated with hundreds of higher education institutions to see if ChatGPT could fare better than actual university students in accounting exams, which the AI chatbot failed to do so.

The students scored an average of 76.7% on the exams, while the ChatGPT only scored 47.4%.

The peer-reviewed study was published in the journal Issues in Accounting Education, where 327 co-authors from over 180 institutions in 14 different countries partook in the study.

The large recruitment of collaborators is due to BYU professor David Wood, who is the lead author of the study. Wood recruited the large amount of academics through social media.

The failings of ChatGPT

The study compared the students' performance on the exams vs the AI chatbot. Researchers contributed over 25,000 exam questions to the study. The questions covered a variety of topics including accounting information systems, auditing, tax and financial accounting in different formats such as true/false and multiple choice.

 A smartphone with a displayed ChatGPT logo is placed on a computer motherboard in this illustration taken February 23, 2023.  (credit: DADO RUVIC/REUTERS)
A smartphone with a displayed ChatGPT logo is placed on a computer motherboard in this illustration taken February 23, 2023. (credit: DADO RUVIC/REUTERS)

The AI specifically struggled with the questions relating to tax, financial, and managerial assessments, which is thought to be due to it struggling with the mathematical processes required. Sometimes the technology used addition instead of subtraction, meaning it could not always follow the basic mathematical formula.

ChatGPT also struggled with short-answer questions (between 28.7% and 39.1%). Additionally, ChatGPT would sometimes provide authoritative descriptions for incorrect answers, or answer the same question in different ways. In some cases, the program would simply make up a reference to defend its answers.

The technology was able to complete true/false questions (68.7% correct) and multiple-choice questions (59.5%) more successfully than students could.

“When this technology first came out, everyone was worried that students could now use it to cheat,” said Wood. "But opportunities to cheat have always existed. So for us, we’re trying to focus on what we can do with this technology now that we couldn’t do before to improve the teaching process for faculty and the learning process for students."

"It’s not perfect; you’re not going to be using it for everything,” said Jessica Wood, currently a freshman at BYU. “Trying to learn solely by using ChatGPT is a fool’s errand.”