December 31, 2024

Chain-of-Thought Improves Problem-Solving in OpenAI's O1 Language Models

Listen to this article as Podcast
0:00 / 0:00
Chain-of-Thought Improves Problem-Solving in OpenAI's O1 Language Models
```html

The Influence of "Chain-of-Thought" on the Reasoning Processes of o1-like LLMs

The latest developments in the field of large language models (LLMs) demonstrate an increasing ability to solve complex tasks. OpenAI's o1 model family, consisting of o1-preview and o1-mini, utilizes the so-called "Chain-of-Thought" (CoT) technique to emulate reasoning processes. CoT describes a series of intermediate steps in natural language that lead to the final answer. OpenAI emphasizes that through this technique, o1 learns to break down complex problems into simpler steps and find alternative solutions if the current approach doesn't work.

An example illustrates how CoT works: Suppose the cafeteria has 23 apples. If 20 apples are used for lunch and 6 more apples are bought, how many apples are there? Older models like GPT-3 had difficulty reliably solving such tasks. Current models like GPT-4o, which already exhibits some "reasoning abilities," can solve this task and present the solution process step by step.

OpenAI demonstrates the power of o1 using more complex examples, such as solving crossword puzzles. While GPT-4o fails, o1-preview can solve the puzzle and explain the solution process by first analyzing the puzzle grid and clues and then describing the individual steps to the solution. By writing down the steps that should be taken, the model can find a solution much more effectively than if it were to predict the answer directly from the question. This is not actual thinking or reasoning, but rather an emulation of this process by writing down steps that lead to an answer.

OpenAI sees two advantages in CoT: First, it offers new possibilities for the alignment and safety of AI models. By observing the model's "thought steps," human values and principles can be conveyed more effectively. Second, o1 significantly outperforms its predecessors in OpenAI's own benchmarks.

Technical Background and Implications

o1 is trained using Reinforcement Learning (RL) to "think" before it answers. The longer the thought process, the better the results in reasoning tasks. This opens a new dimension for scaling LLMs, as performance is no longer limited solely by pre-training but can also be improved by computing power in the inference phase, the so-called "test-time compute." This is a positive development for hardware manufacturers like Nvidia and cloud providers who provide these computing resources.

However, the cost of using the model is a significant factor. OpenAI does not disclose how much test-time compute was needed to achieve the accuracies reported in the benchmarks. However, it can be assumed that the costs can be substantial. Noam Brown, a research scientist at OpenAI, mentions the possibility that future models could compute for hours, days, or even weeks. He compares the costs to investments in new drugs or technologies and emphasizes the potential of AI beyond chatbots.

Limitations and Challenges of "Chain-of-Thought"

Although o1 delivers impressive results, it is not always the best choice. For many tasks, "thinking" is not required, and a quick response from GPT-4o can be more efficient. By releasing o1-preview, OpenAI wants to find out which use cases will establish themselves and where there is a need for improvement.

The designation of o1 as a "Reasoning Model" is also being discussed. Experts like Daniel Kang from the University of Illinois Urbana-Champaign see it more as a semantic question. o1 applies a kind of test-time scaling, similar to AlphaGo, which can be interpreted as "reasoning," but does not necessarily correspond to human thinking.

Alon Yamin, CEO of Copyleaks, sees o1 as an approximation of human thinking in dealing with complex problems. However, he emphasizes that these are analogies and not an exact replica of human thought.

Brown points out that o1 is not always better than GPT-4o. Many tasks do not require complex reasoning, and the longer computation time of o1 can be a disadvantage. OpenAI emphasizes the improved coding ability of o1 and refers to positive experiences with GitHub Copilot. Access to o1-preview and o1-mini in GitHub Copilot currently requires signing up for Azure AI.

The safety assessment of o1 in the OpenAI System Card classifies the model as a "medium" risk in the categories "persuasiveness" and "CBRN" (chemical, biological, radiological, and nuclear). Although o1 can support experts in implementing plans to reproduce known biological threats, the risk is classified as medium because it does not offer this possibility to inexperienced users. The "inconsistent rejection of requests to synthesize nerve agents" is assessed as not a significant risk.

In summary, o1 and the CoT technique represent an important step in the development of LLMs. The ability to solve complex problems step by step opens up new possibilities, but also presents challenges in terms of cost and efficiency. The further development and application of o1 will show the true potential of this technology.

```