The Chinese company Moonshot AI has introduced a new multimodal AI model, Kimi k1.5, which is showing promising results in the field of complex reasoning and is positioned as a competitor to established models like OpenAI's o1. Kimi k1.5 follows the release of DeepSeek-R1 and underscores China's growing influence in the field of AI research.
Moonshot AI offers Kimi k1.5 in two versions: a "long-CoT" version for detailed, step-by-step reasoning and a "short-CoT" version for concise answers. Both versions achieve or exceed the performance of leading models like o1 and DeepSeek-R1 in various benchmarks, according to the company's technical report. The long-CoT version is characterized by its transparent thinking process, while the short-CoT version aims for short, precise answers and even surpasses models like Claude 3.5 Sonnet in some benchmarks.
A key difference from DeepSeek-R1 is the multimodal capability of Kimi k1.5. The model can process both text and images, drawing conclusions from various input data. Kimi k1.5 achieves high scores, particularly in multimodal benchmarks like MathVista and MMMU. However, it's important to note that benchmark results don't always reflect real-world performance.
The development process began with standard pre-training on a large dataset of text and images to build a basic understanding of language and imagery. The model was then fine-tuned (SFT) with a carefully selected, smaller dataset. For tasks with clear solutions, such as mathematical problems, "Rejection Sampling" was used. This involves generating multiple answers, and only the correct ones are used for training. Additionally, training data containing detailed, step-by-step reasoning was created.
In the final phase, Reinforcement Learning was used, but with a key difference from conventional approaches. Instead of evaluating intermediate values, the team focused on the final result. This gives the model more freedom to explore different solution paths. To ensure the efficiency of the answers, a penalty for excessively long answers was introduced. This approach differs significantly from the DeepSeek-R1 and R1-Zero models. While R-1 uses a simpler Reinforcement Learning with rule-based feedback, R1-Zero was trained exclusively with Reinforcement Learning and without additional data.
Since detailed reasoning (long-CoT) delivers good results but requires more computing power, the team developed methods to transfer this knowledge to models that generate shorter answers. Various techniques were combined, including model merging and "Shortest Rejection Sampling," which selects the most concise correct answer from multiple attempts.
The researchers found that increasing the context length (up to 128,000 tokens) consistently improves performance through more complex reasoning. Similar to DeepSeek-R1, it was also shown here that effective reasoning models don't require complicated components like Monte-Carlo Tree Search.
The success in transferring knowledge from longer to shorter models reflects a general trend in the industry. Anthropic has also presumably used similar knowledge distillation techniques for its smaller but powerful Claude 3.5 Sonnet.
Founded in 2023, Moonshot AI secured over $1 billion in funding led by Alibaba in February 2024, reaching a valuation of $2.5 billion. In August, the value increased to $3.3 billion after further investments from Tencent and Gaorong Capital. Kimi k1.5 is intended to form the basis for the company's ChatGPT competitor. However, the models are not yet publicly available.
Bibliographie: https://the-decoder.com/moonshot-ai-unveils-kimi-k1-5-chinas-next-o1-competitor/ https://github.com/MoonshotAI/Kimi-k1.5 https://www.globaltimes.cn/page/202411/1323248.shtml https://www.threads.net/@luokai/post/DFDZpiNT3mn?hl=de https://ground.news/article/kimi-k15-the-first-non-openai-model-to-match-full-powered-o1-performance https://www.therundown.ai/p/deepseek-releases-open-source-r1 https://cheatsheet.md/ai-news/kimi-ai https://www.threads.net/@testingcatalog/post/DFDmZyRtlaL https://datainnovation.org/2025/01/moonshot-ai-betting-big-on-long-context-confronting-the-challenges-of-scale-and-reliability/