January 3, 2025

ProgCo: Program-Driven Self-Correction Improves Large Language Model Accuracy

Listen to this article as Podcast
0:00 / 0:00
ProgCo: Program-Driven Self-Correction Improves Large Language Model Accuracy

Improved Self-Correction in Large Language Models: ProgCo Relies on Program-Driven Refinement

Large language models (LLMs) have made impressive progress in recent years and are used in various areas such as text generation, translation, and answering questions. Despite their capabilities, LLMs, especially in complex tasks, are prone to errors and hallucinations. A promising approach to improving the accuracy of LLMs is self-correction, where the model checks its own answers and corrects them if necessary. A new method called ProgCo (Program-driven Self-Correction) uses the power of programming to optimize the self-correction of LLMs.

The Challenges of Self-Correction

Previous self-correction methods often encounter difficulties because LLMs are often unable to reliably identify and correct their own errors. Particularly in complex reasoning tasks, such as mathematical problems, this can lead to incorrect feedback, which negatively influences the subsequent refinement of the answer and thus causes the self-correction to fail.

ProgCo: A Two-Stage Approach

ProgCo consists of two main components: program-driven verification (ProgVe) and program-driven refinement (ProgRe). ProgVe enables the LLM to independently generate and execute verification programs in the form of pseudocode. These programs check the original answer of the LLM using complex logic and extensive validation. ProgRe uses the feedback from ProgVe to reflect on and refine both the answer and the verification program. This dual approach increases the robustness of self-correction, especially in complex tasks, by minimizing the risk of erroneous feedback steering the refinement in the wrong direction.

The Advantages of ProgCo

By integrating programmatic elements into the self-correction process, ProgCo offers several advantages. The generation of verification programs allows for a more precise and comprehensive check of the answers compared to conventional methods. The dual reflection and refinement of answer and program helps to identify and correct errors in the feedback process, which improves the quality of the final answer. Experiments with ProgCo on various benchmarks focusing on instruction following and mathematical skills show promising results. Combining ProgCo with real programming tools can further enhance performance.

Outlook and Potential

ProgCo represents an innovative approach to self-correction in LLMs that has the potential to significantly improve the accuracy and reliability of these models. Program-driven verification and refinement enables more precise error detection and correction, especially in complex tasks. Future research could focus on extending ProgCo to other application areas and optimizing the generation and execution of verification programs. The development of efficient and robust self-correction mechanisms is an important step towards realizing the full potential of LLMs and making them usable for critical applications.

Bibliography - Song, X., Wu, Y., Wang, W., Liu, J., Su, W., & Zheng, B. (2025). ProgCo: Program Helps Self-Correction of Large Language Models. arXiv preprint arXiv:2501.01264. - Wu, Z., Zeng, Q., Zhang, Z., Tan, Z., Shen, C., & Jiang, M. (2024). Large Language Models Can Self-Correct with Key Condition Verification. arXiv preprint arXiv:2405.14092. - Quoc, T. T., Ha, D. M., Thanh, T. Q., & Nguyen-Duc, A. (2024). An Empirical Study on Self-correcting Large Language Models for Data Science Code Generation. arXiv preprint arXiv:2408.15658. - Pan, L., Saxon, M., Xu, W., Nathani, D., Wang, X., & Wang, W. Y. (2024). Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies. - Kumar, A., Zhuang, V., Agarwal, R., Su, Y., Co-Reyes, J. D., Singh, A., ... & Faust, A. (2024). Training Language Models to Self-Correct via Reinforcement Learning. arXiv preprint arXiv:2409.12917. - Ganguli, D., Askell, A., Bai, Y., Chen, A., Chen, C., ... & Kaplan, J. (2023). The Capacity for Moral Self-Correction in Large Language Models. arXiv preprint arXiv:2302.07459. - Warepam, R. (2024, September 30). How Self-Correction in Large Language Models(LLMs) Can Be Improved. Towards AI. - Letitia Parcalabescu. (2023). Moral Self-Correction in Large Language Models | paper explained [Video]. YouTube. - Hugging Face Papers. https://huggingface.co/papers