April 22, 2025

NEMOTRON-CROSSTHINK Improves Reasoning Abilities of Large Language Models

Listen to this article as Podcast
0:00 / 0:00
NEMOTRON-CROSSTHINK Improves Reasoning Abilities of Large Language Models

New Advances in Self-Learning Reasoning with NEMOTRON-CROSSTHINK

Large language models (LLMs) have shown impressive progress in logical reasoning in recent years, particularly through the use of Reinforcement Learning (RL). Previous approaches that used RL for mathematical reasoning – where rules and correctness are clearly defined – encountered difficulties when transferring to more general areas due to data scarcity, lack of verifiable reward structures, and diverse task sets. A new research paper now introduces NEMOTRON-CROSSTHINK, a framework that systematically integrates multi-domain-specific corpora, both synthetic and real question-answer pairs, into RL training to improve generalization across various reasoning tasks.

NEMOTRON-CROSSTHINK addresses the central challenges through four main strategies: First, it integrates data from diverse sources encompassing STEM fields, humanities, social sciences, etc. Second, it uses structured templates (e.g., multiple-choice and open-ended questions) to control the complexity of the answer space. Third, it filters for verifiable answers. And fourth, it optimizes data combination strategies to effectively utilize data from multiple sources.

This approach enables scalable and verifiable reward modeling that goes beyond mathematics. The results show improved accuracy in both mathematical benchmarks (MATH-500: +30.1%, AMC23: +27.5%) and non-mathematical reasoning benchmarks (MMLU-PRO: +12.8%, GPQA-DIAMOND: +11.3%, AGIEVAL: +15.1%, SUPERGPQA: +3.8%).

Increased Efficiency Through More Focused Reasoning

Another notable aspect of NEMOTRON-CROSSTHINK is the significantly improved response efficiency. The model requires 28% fewer tokens for correct answers, suggesting more focused and effective reasoning. This is a significant advancement, as the efficiency of LLMs plays a crucial role in their practical applicability.

The research results suggest that integrating multi-domain-specific data in various formats into RL training leads to more accurate, efficient, and generalizable LLMs. This opens up new possibilities for the use of LLMs in a variety of applications that go beyond mathematical reasoning.

Mindverse, as a provider of AI-powered content tools, is following these developments with great interest. The improvement of LLMs' capabilities in the area of logical reasoning is an important step towards more powerful and versatile AI systems. The development of customized solutions such as chatbots, voicebots, AI search engines, and knowledge systems benefits directly from these advances and enables the development of innovative applications in various industries.

Bibliography:

Akter, S. N., Prabhumoye, S., Novikov, M., Han, S., Lin, Y., Bakhturi, E., Nyberg, E., Choi, Y., Patwary, M., Shoeybi, M., & Catanzaro, B. (2025). NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning. arXiv preprint arXiv:2504.13941.