Qwen3 is the next-generation family of Qwen large language models featuring dense and Mixture-of-Experts variants, seamless reasoning "thinking" mode, 32K–131K token context windows, and state-of-the-art performance in multilingual chat, coding and mathematical reasoning tasks.

> humaan run qwen3
Work with Humaan
1 Tags

Model Information

Model
Qwen3
Author
Qwen
Parameters
Architecture
transformer-dense
Format
GGUF
Size on disk
5.03GB
Quantization
License

./README.md

Qwen3-8B

An 8 billion-parameter, next-generation large-language model from the Qwen team. Qwen3-8B natively supports reasoning-oriented "thinking" mode and an efficient non-thinking mode, letting you balance raw reasoning power with speed whenever you need it.


Key Features

  • Seamless switching between thinking (reasoning) and non-thinking (dialogue) modes via the enable_thinking flag or /think & /no_think chat directives.
  • Strong improvements in maths, coding and agent tool-use compared to earlier Qwen and QwQ releases.
  • Multilingual: > 100 languages supported.
  • Long context: 32 k tokens out-of-the-box and up to 131 k with YaRN RoPE scaling.

Technical Specifications

AttributeDetails
Model TypeCausal Language Model
Training StagePretraining & Post-training
Parameters8.2 B total (6.95 B non-embedding)
Layers36
Attention Heads (GQA)32 for Q / 8 for KV
Context Length32 768 tokens natively; up to 131 072 with YaRN

Citation

If you use Qwen3-8B in your research, please cite:

@misc{qwen3technicalreport,
  title        = {Qwen3 Technical Report},
  author       = {Qwen Team},
  year         = {2025},
  eprint       = {2505.09388},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url          = {https://arxiv.org/abs/2505.09388}
}