Alibaba Cloud has introduced a new framework called Qwen-Agent, designed to simplify the creation of AI agents for developers. The framework builds upon the existing Qwen language models and expands their capabilities. Qwen-Agent allows agents to follow detailed instructions, utilize tools, plan tasks, and maintain conversational context.
Advanced features include RAG (Retrieval-Augmented Generation), a code interpreter, and specialized mathematical capabilities powered by Qwen2.5-Math. Qwen-Agent employs a two-tiered approach to agent development. The base layer provides language models and fundamental tools, while the upper layer contains ready-to-use agent components. Developers can combine these components to create agents capable of performing complex tasks – from reading PDFs and working with existing tools to executing user-defined functions.
An example application of Qwen-Agent is BrowserQwen, a web browser agent that demonstrates the framework's capabilities. BrowserQwen can independently research information on the internet and summarize it. For implementation, developers can either utilize Alibaba's DashScope cloud service or run Qwen models on their own hardware. Alibaba recently reduced the pricing for its API AI services. The framework also includes a graphical user interface (GUI) that simplifies the creation of interactive web demos using the Gradio framework.
Alibaba is continuously expanding the technology behind these agents. Recently, QVQ-72B-Preview for visual tasks, as well as specialized Qwen2.5 models for programming and mathematics, were released. Developers should, however, be aware of two things: Like other Chinese LLMs, these agents may have limitations regarding political content. Furthermore, it's advisable to consider simpler solutions before opting for agent-based approaches. The framework is open-source and available to developers on platforms like Hugging Face and ModelScope. Alibaba's goal is to democratize the development of generative AI and make it accessible to a wider audience.
Qwen is not just a single model, but a whole family of language models. These models have been pre-trained on data from a variety of domains and languages and support a context length of up to 32768 tokens. They can create content, summarize and translate text, write and interpret code, solve mathematical problems, use tools, and function as agents. In addition to Qwen, there is also Qwen-VL, a large vision-language model, and Qwen-Audio, a large audio-language model. Qwen-VL can generate content based on images, text, and bounding boxes, while Qwen-Audio accepts text and various audio files as input and provides text-based outputs.
The Qwen models offer a wide range of application possibilities. They can be used to develop chatbots that understand multimodal data to interact intelligently and comprehensively with users. Qwen-Agent can summarize content and interpret code to process data and provide analysis results in the form of rich text, graphs, and code. Qwen-VL can generate images in various styles and genres and analyze objects and text within images to create new content. Qwen-Audio can understand and analyze various types of audio, summarize information such as music genres and speaker emotions, and even use tools to edit audio files.
Bibliographie: - https://the-decoder.com/alibabas-qwen-ai-lab-launches-framework-for-building-ai-agents/ - https://medium.com/@mcunningham1440/china-catches-up-in-ai-agents-a61d57518139 - https://www.alibabacloud.com/en/solutions/generative-ai/qwen?_p_lc=1 - https://www.buildingaiagents.ai/p/china-catches-up - https://www.alibabacloud.com/en/solutions/generative-ai/build?_p_lc=1 - https://www.linkedin.com/pulse/daily-news-ai-agents-key-updates-1231-huatuaogpt-o1-schwoebel-6gghe - https://aiagentsdirectory.com/agent/qwen-agent - https://www.linkedin.com/pulse/alibaba-challenges-openai-anthropic-redefines-ai-anyaegbu-m-s--stzhe - https://github.com/QwenLM/Qwen - https://itbrief.com.au/story/alibaba-cloud-open-sources-two-large-language-models-for-ai-community ```