DeepSeek, an emerging company in the field of Artificial Intelligence, has made significant progress in the development of large language models (LLMs) in recent months. From the release of DeepSeek-V2 in May 2024 to the current version V2.5 and the announcement of the R1-Lite-Preview model, the company demonstrates a strong commitment to innovation and open source.
DeepSeek-V2 laid the foundation for the subsequent models. With innovations like Multi-head Latent Attention (MLA) and DeepSeekMoE, the model aimed for economical training and efficient inference. MLA compresses the key-value cache, while DeepSeekMoE reduces training costs through sparse computation. The model was released in various sizes, including a 236B parameter version, as well as a chat model.
DeepSeek-V2.5 builds on the strengths of its predecessors by combining the capabilities of DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The result is a model that offers both general language skills and specialized coding abilities. Improvements are evident in various benchmarks such as AlpacaEval 2.0, ArenaHard, AlignBench, and MT-Bench. Performance in code generation, measured by HumanEval and LiveCodeBench, has also been increased.
Substantial resources are required for local execution of DeepSeek-V2.5, depending on the hardware and precision used. Running in BF16 format, for example, requires 8 GPUs with 80 GB of memory each. DeepSeek provides instructions for inference with Huggingface Transformers and recommends the use of vLLM for optimized performance. Additionally, features such as Function Calling, JSON Output, and FIM (Fill In the Middle) Completion are supported.
With DeepSeek-V2-Lite, DeepSeek addresses the need for powerful models that can also run on less powerful hardware. With 16B total parameters and 2.4B active parameters, the model is executable on a single GPU with 40 GB of memory. Despite its smaller size, DeepSeek-V2-Lite, according to DeepSeek, outperforms 7B dense and 16B MoE models in various benchmarks for English and Chinese. It is available as both a base model and a chat model and offers a context length of 32k tokens.
The recent announcement of DeepSeek-R1-Lite-Preview indicates a further step towards advanced AI capabilities. The model promises improved reasoning performance, comparable to OpenAI's o1-preview model, particularly in benchmarks such as AIME and MATH. A transparent, real-time thought process is intended to give users insights into how the model works. Open-source models and an API are planned.
The developments at DeepSeek are particularly relevant for companies like Mindverse, which offer AI-powered content tools and customized solutions. The availability of powerful and efficient open-source models opens up new possibilities for integration into existing and future products. From chatbots and voicebots to AI search engines and knowledge systems, the advancements at DeepSeek could significantly influence the development of innovative AI solutions at Mindverse and other companies.
Bibliographie: https://www.reddit.com/r/LocalLLaMA/comments/1gvnhob/deepseekr1lite_preview_version_officially_released/ https://www.deepseek.com/ https://huggingface.co/deepseek-ai/DeepSeek-V2.5 https://api-docs.deepseek.com/news/news0905 https://twitter.com/deepseek_ai https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite https://www.youtube.com/watch?v=dT8thuqHN2g https://www.reddit.com/r/LocalLLaMA/comments/1g3odpf/am_i_doing_something_wrong_trying_to_use_deepseek/?tl=de