Qwen3-8B

An 8 billion-parameter, next-generation large-language model from the Qwen team. Qwen3-8B natively supports reasoning-oriented "thinking" mode and an efficient non-thinking mode, letting you balance raw reasoning power with speed whenever you need it.

Key Features

Seamless switching between thinking (reasoning) and non-thinking (dialogue) modes via the enable_thinking flag or /think & /no_think chat directives.
Strong improvements in maths, coding and agent tool-use compared to earlier Qwen and QwQ releases.
Multilingual: > 100 languages supported.
Long context: 32 k tokens out-of-the-box and up to 131 k with YaRN RoPE scaling.

Technical Specifications

Attribute	Details
Model Type	Causal Language Model
Training Stage	Pretraining & Post-training
Parameters	8.2 B total (6.95 B non-embedding)
Layers	36
Attention Heads (GQA)	32 for Q / 8 for KV
Context Length	32 768 tokens natively; up to 131 072 with YaRN

Citation

If you use Qwen3-8B in your research, please cite:

@misc{qwen3technicalreport,
  title        = {Qwen3 Technical Report},
  author       = {Qwen Team},
  year         = {2025},
  eprint       = {2505.09388},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2505.09388}
}

Qwen/Qwen3

Model Information

./README.md

Qwen3-8B

Key Features

Technical Specifications

Citation

Model Information