GLM-4.5V

byZ.ai

GLM-4.5V is a vision-language foundation model developed by Z.ai, designed for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106 billion parameters and 12 billion activated parameters, it achieves state-of-the-art performance in tasks such as video understanding, image question answering, optical character recognition (OCR), and document parsing. The model offers a hybrid inference mode, allowing users to switch between a "thinking mode" for deep reasoning and a "non-thinking mode" for fast responses, balancing processing speed and output quality according to task requirements.

Features

  • Video understanding
  • Image question answering
  • Optical character recognition (OCR)
  • Document parsing
  • Hybrid inference mode with "thinking" and "non-thinking" options

Product Details

Pricing
Subscription
Deployment
Cloud
Location
🇨🇳 Beijing, China

Related Products