Qwen2.5 Technical Report

Published on Dec 19, 2024 Updated on Jan 15, 2026 LLMs LLMs, Qwens One minute

Contents

Qwen Team, Alibaba Group arXiv 2412.15115 Hugging Face

TL;DR

Motivation

Key Innovations

Approach

Tokenization

Based on the Qwen BBPE tokenzier (151,646 tokens: 151,624 regular and 22 control), we expand the control tokens from 3 to 22, including two tool-related tokens and 20 for other model capabilities.

Model Architecture

Dense Model

GQA for efficient KV cache utilization
SwiGLU for activation
RoPE for positional embedding
QKV bias for attention
RMSNorm and pre-normalization for training stability

Mixture-of-Experts (MoE) Model

Pre-training

Pre-training Data

Better data filtering: with Qwen2-Instruct Model
Better math and code data: incorporate high-quality domain-specific datasets (math, code) during pretraining.
Better synthetic data:
- leverage both Qwen2-72B-Instruct and Qwen2-Math-72B-Instruct to generate high-quality synthetic data, particularly in mathematics, code, and knowledge domain.
- further enhance the quality of synthesized data through rigorous filtering using our propretary general reward model and the specialized Qwen2-Math-RM-72B model.
Better data mixture:
- Domains like e-commerce, social media, and entertainment are significantly overrepresented in web-scale data, often containing repetitive, template-based, or machine-generated content.
- Domains such as technology, science, and academic research, while containing higher-quality information, are traditionally underrepresented.
- down-sample overrepresented domains and up-sample high-value domains.

Contents

Qwen2.5 Technical Report

TL;DR

Motivation

Key Innovations

Approach

Tokenization

Model Architecture

Dense Model

Mixture-of-Experts (MoE) Model

Pre-training

Pre-training Data

Experiments

References

Contents

Qwen2.5 Technical Report

TL;DR

Motivation

Key Innovations

Approach

Tokenization

Model Architecture

Dense Model

Mixture-of-Experts (MoE) Model

Pre-training

Pre-training Data

Experiments

References

Related Posts