Introducing Qwen2: A Revolutionary Language Model with 72 Billion Parameters, Open Source and Ready for Coding and Mathematics Excellence
A new language model with up to 72 billion parameters, trained on 27 languages, excelling in coding and mathematics, is now available as open source. Qwen2 marks a significant evolution from its predecessor, Qwen1.5. The latest version offers pretrained and instruction-tuned models of various sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and the largest one – Qwen2-72B.
Advancements in Language Models
After months of dedicated efforts by the Qwen Team, the transition from Qwen1.5 to Qwen2 brings forth several key improvements. One notable enhancement is the inclusion of models trained on data from 27 additional languages apart from English and Chinese. This broad multilingual training equips the models with a diverse linguistic understanding that enhances their performance across various language contexts.
Model Capabilities
The new language model series boasts state-of-the-art performance across numerous benchmark evaluations. Particularly noteworthy is its significantly improved proficiency in coding and mathematics tasks. With extended context length support of up to 128K tokens for certain variants like Qwen2-7B-Instruct and Qwen2-72B-Instruct, users can delve into more extensive text processing tasks with ease.
Model Sizes and Specifications
The five different sizes within the QWen2 series cater to varying needs based on computational resources and task requirements:
- QWen2-0.5B: Parameters – 0.49 billion; Non-Emp Params – 0.35 billion; GQA – True; Tie Embedding – True; Context Length – 32K.
As we move towards larger models like QWen2-72B, which contains a whopping 72 billion parameters along with advanced features such as Group Query Attention (GQA) implemented across all sizes for enhanced performance.
Open Source Availability
One of the most commendable aspects of this new language model release is that all variants have been made open source through platforms like Hugging Face and ModelScope for wider accessibility within the developer community.
In conclusion, the advent of this sophisticated language model series represents a significant leap forward in natural language processing capabilities with its vast parameter size and multilingual training data integration.
If you are interested in exploring further details about these cutting-edge models or wish to contribute to their development journey through open-source collaboration avenues provided by Hugging Face or ModelScope platforms mentioned earlier on their official website page.