Qwen3 is the next-generation family of Qwen large language models featuring dense and Mixture-of-Experts variants, seamless reasoning "thinking" mode, 32K–131K token context windows, and state-of-the-art performance in multilingual chat, coding and mathematical reasoning tasks.
Model Information
ModelQwen3
AuthorQwen
Parameters
Architecturetransformer-dense
FormatGGUF
Size on disk5.03GB
Quantization
License
./README.md
Qwen3-8B
An 8 billion-parameter, next-generation large-language model from the Qwen team. Qwen3-8B natively supports reasoning-oriented "thinking" mode and an efficient non-thinking mode, letting you balance raw reasoning power with speed whenever you need it.
Key Features
- Seamless switching between thinking (reasoning) and non-thinking (dialogue) modes via the
enable_thinkingflag or/think&/no_thinkchat directives. - Strong improvements in maths, coding and agent tool-use compared to earlier Qwen and QwQ releases.
- Multilingual: > 100 languages supported.
- Long context: 32 k tokens out-of-the-box and up to 131 k with YaRN RoPE scaling.
Technical Specifications
| Attribute | Details |
|---|---|
| Model Type | Causal Language Model |
| Training Stage | Pretraining & Post-training |
| Parameters | 8.2 B total (6.95 B non-embedding) |
| Layers | 36 |
| Attention Heads (GQA) | 32 for Q / 8 for KV |
| Context Length | 32 768 tokens natively; up to 131 072 with YaRN |
Citation
If you use Qwen3-8B in your research, please cite:
@misc{qwen3technicalreport,
title = {Qwen3 Technical Report},
author = {Qwen Team},
year = {2025},
eprint = {2505.09388},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2505.09388}
}
Model Information
ModelQwen3
AuthorQwen
Parameters
Architecturetransformer-dense
FormatGGUF
Size on disk5.03GB
Quantization
License