Today, large language models (LLMs) such as OpenAI's ChatGPT and Google's Gemini can offer relationship advice, write text, and even draft scientific articles. But can they solve Sudoku puzzles and provide reasonable explanations? A team of computer scientists at the University of Colorado Boulder set out to investigate. The team designed nearly 2,300 original Sudoku puzzles and tested multiple AI tools on them.

The results show that AI performance on Sudoku puzzles is highly inconsistent. While some models can solve easy puzzles, even the best-performing ones struggle to clearly explain their reasoning. Their descriptions are often confused, inaccurate, or outright bizarre. Co-author Maria Pacheco notes that these findings raise serious questions about the trustworthiness of information generated by AI. "For certain types of Sudoku puzzles, most AI models still fall short when it comes to providing human-understandable explanations," Pacheco said. The study was published in the Transactions of the Association for Computational Linguistics.
The research was not intended to "cheat" at puzzles but to explore how AI "thinks" through logical exercises. Co-author Professor Fabio Somenzi believes these results could help inspire the creation of more reliable and trustworthy computer programs. "Puzzles are fun, but they are also miniature models of machine learning decision-making processes," he said. Most current AI models struggle to develop human-like logical reasoning, largely due to their training methods. For example, ChatGPT answers questions by predicting the next word — essentially a computerized form of rote memorization. Pacheco, Somenzi, and their colleagues are working on combining AI's memory capabilities with human-like logical reasoning — an approach known as "neuro-symbolic" AI. In the tests, OpenAI's o1 model preview solved about 65% of the Sudoku puzzles correctly but frequently fabricated facts or completely went off-topic when explaining the solution process. The researchers aim to build AI systems that can both solve complex puzzles and clearly explain their reasoning. They are currently beginning research with similar logic puzzles, such as "Hitori."











