Washington State University professor Mesut Cicek and his research team fed ChatGPT with hypotheses from scientific papers and repeated the tests. The goal was to see if the AI could correctly determine whether each claim was supported by research, meaning it was true or false.
In total, the team evaluated more than 700 hypotheses and asked the same question 10 times for each hypothesis to measure consistency.
Accuracy results and AI performance limits
When first tested in 2024, ChatGPT got it right 76.5% of the time. A follow-up test in 2025 saw a slight increase in accuracy to 80%. But when the researchers adjusted for random guesses, the results became less impressive. The AI performed only about 60% better than chance, which was close to a D below strong reliability.
This system had the hardest time identifying false statements, correctly labeling them only 16.4% of the time. It also showed notable inconsistencies. Even when given the exact same prompt 10 times, ChatGPT produced a consistent answer only 73% of the time.
Inconsistent answers raise concerns
“We’re not just talking about accuracy, we’re talking about inconsistency, because if you ask the same question over and over again, you’re going to get different answers,” said Cicek, associate professor in the Department of Marketing and International Business in the WSU Carson College of Business and lead author of the new book.
“We used 10 prompts with the exact same question, all the same. The answer is true. Then it says false. True, false, false, true. There were some cases where there were five true and five false.”
AI fluency and real understanding
The survey results are rutgers business reviewhighlights the need for caution when relying on AI for important decisions, especially those that require nuanced or complex reasoning. Although generative AI can generate smooth and convincing language, it has not yet demonstrated the same level of conceptual understanding.
According to Cicek, these results suggest that artificial general intelligence that can truly “think” may still be further away than many expect.
“Current AI tools can’t understand the world the way we do. They don’t have a ‘brain’,” Cicek says. “They just memorize it and can give you some insight, but they don’t understand what they’re saying.”
Research design and methods
Cicek collaborated with co-authors Sevincgul Ulu of Southern Illinois University, Can Uslay of Rutgers University, and Kate Karniouchina of Northeastern University.
The team used 719 hypotheses from scientific studies published in business journals since 2021. These types of questions are often nuanced, and multiple factors influence whether a hypothesis is supported. Reducing such complexity to simple truth-or-false judgments requires careful reasoning.
Researchers tested the free version of ChatGPT-3.5 in 2024 and the updated ChatGPT-5 mini in 2025. Overall, performance remained similar for both versions. After adjusting for a random probability of 50% correct, AI effectiveness was only about 60% above chance in both years.
Key weaknesses of AI inference
This result points to a fundamental limitation of large-scale language model AI systems. Although they can produce fluent and persuasive responses, they often struggle to understand complex questions logically. This can lead to answers that seem convincing but are actually wrong, Cicek said.
Why experts caution about AI
Based on these findings, the researchers recommend that business leaders examine AI-generated information and approach it with a degree of skepticism. They also highlight the need for training to better understand what AI systems can and cannot do effectively.
Although the study focused specifically on ChatGPT, Cicek noted that similar experiments using other AI tools have yielded comparable results. This study also builds on previous research that points to caution against AI hype. A 2024 national study found that consumers are less likely to purchase a product if it is marketed with an AI focus.
“Always be skeptical,” he said. “I’m not against AI. I’m using AI. But we have to be very careful.”

