Qwen3 is the next-generation family of Qwen large language models featuring dense and Mixture-of-Experts variants, seamless reasoning "thinking" mode, 32K–131K token context windows, and state-of-the-art performance in multilingual chat, coding and mathematical reasoning tasks.
Model Information
ModelQwen3
AuthorQwen
Parameters
Architecturetransformer-dense
FormatGGUF
Size on disk5.03GB
Quantization
License
./README.md
Qwen3-8B
An 8 billion-parameter, next-generation large-language model from the Qwen team. Qwen3-8B natively supports reasoning-oriented "thinking" mode and an efficient non-thinking mode, letting you balance raw reasoning power with speed whenever you need it.
Key Features
- Seamless switching between thinking (reasoning) and non-thinking (dialogue) modes via the
enable_thinking
flag or/think
&/no_think
chat directives. - Strong improvements in maths, coding and agent tool-use compared to earlier Qwen and QwQ releases.
- Multilingual: > 100 languages supported.
- Long context: 32 k tokens out-of-the-box and up to 131 k with YaRN RoPE scaling.
Technical Specifications
Attribute | Details |
---|---|
Model Type | Causal Language Model |
Training Stage | Pretraining & Post-training |
Parameters | 8.2 B total (6.95 B non-embedding) |
Layers | 36 |
Attention Heads (GQA) | 32 for Q / 8 for KV |
Context Length | 32 768 tokens natively; up to 131 072 with YaRN |
Citation
If you use Qwen3-8B in your research, please cite:
@misc{qwen3technicalreport,
title = {Qwen3 Technical Report},
author = {Qwen Team},
year = {2025},
eprint = {2505.09388},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2505.09388}
}
Model Information
ModelQwen3
AuthorQwen
Parameters
Architecturetransformer-dense
FormatGGUF
Size on disk5.03GB
Quantization
License