Run GLM 4.6 with an API

November 25, 2025

67

Run GLM 4.6 with an API — coding benchmark.png?width=1000&height=783&name=coding benchmark

Introduction

Zhipu AI launched GLM-4.6, the most recent mannequin in its Basic Language Mannequin (GLM) sequence. Not like many proprietary frontier techniques, the GLM household stays open-weight and is licensed underneath permissive phrases equivalent to MIT and Apache, making it one of many solely frontier-scale fashions that organizations can self-host.

GLM-4.6 builds on the reasoning and coding strengths of GLM-4.5 and introduces a number of main upgrades.

The context window expands from 128k to 200k tokens, enabling the mannequin to course of total books, codebases or multi-document evaluation duties in a single go.
It retains the Combination-of-Consultants structure with 355 billion whole parameters and roughly 32 billion energetic per token, however improves reasoning high quality, coding accuracy and tool-calling reliability.
A brand new pondering mode improves multi-step reasoning and sophisticated planning.
The mannequin helps native instrument calls, permitting it to determine when to invoke exterior features or providers.
All weights and code are overtly accessible, permitting self-hosting, fine-tuning and enterprise customization.

These upgrades make GLM-4.6 a robust open different for builders who want high-performance coding help, long-context evaluation and agentic workflows.

Mannequin Structure and Technical Particulars

Combination of Consultants Core

GLM-4.6 is constructed on a Combination-of-Consultants (MoE) Transformer structure. Though the complete mannequin accommodates 355 billion parameters, solely round 32 billion are energetic per ahead go resulting from sparse skilled routing. A gating community selects the suitable specialists for every token, lowering compute overhead whereas preserving the advantages of a giant parameter pool.

Key architectural options carried over from GLM-4.5 and refined in model 4.6 embrace:

Grouped Question Consideration, which improves long-range interactions through the use of a lot of consideration heads and partial RoPE for environment friendly scaling.
QK-Norm, which stabilizes consideration logits by normalizing question–key interactions.
The Muon optimizer, which permits bigger batch sizes and quicker convergence.
A Multi-Token Prediction head, which predicts a number of tokens per step and enhances the efficiency of the mannequin’s pondering mode.

Hybrid Reasoning Modes

GLM-4.6 helps two reasoning modes.

The usual mode offers quick responses for on a regular basis interactions.
The pondering mode slows down decoding, makes use of the MTP head for multi-token planning and generates inside chain-of-thought. This mode improves efficiency on logic issues, longer coding duties and multi-step agentic workflows.

Prolonged Context Window

Some of the necessary upgrades is the expanded context window. Shifting from 128k tokens to 200k tokens permits GLM-4.6 to course of giant codebases, full authorized paperwork, lengthy transcripts or multi-chapter content material with out chunking. This functionality is especially helpful for engineering duties, analysis evaluation and long-form summarization.

Coaching Information and Superb-Tuning

Zhipu AI has not disclosed the complete coaching dataset, however GLM-4.6 builds on the inspiration of GLM-4.5, which was pre-trained on trillions of numerous tokens after which fine-tuned closely on code, reasoning and alignment duties. Reinforcement studying strengthens its coding accuracy, reasoning high quality and tool-usage reliability. GLM-4.6 seems to incorporate extra knowledge for tool-calling and agentic workflows, given its improved planning skills.

Device-Calling and Agentic Capabilities

GLM-4.6 is designed to operate because the management system for autonomous brokers. It helps structured operate calling and decides when to invoke instruments based mostly on context. Its inside reasoning improves argument validation, error rejection and multi-tool planning. In coding-assistant evaluations, GLM-4.6 achieves excessive tool-call success charges and approaches the efficiency of high proprietary fashions.

Effectivity and Quantization

Though GLM-4.6 is giant, its MoE structure retains energetic parameters manageable. Public weights can be found in BF16 and FP32, and neighborhood quantizations in 4- to 8-bit codecs enable the mannequin to run on extra inexpensive GPUs. It’s appropriate with frequent inference frameworks equivalent to vLLM, SGLang and LMDeploy, giving groups versatile deployment choices.

Benchmark Efficiency

Zhipu AI evaluated GLM-4.6 on a spread of benchmarks protecting reasoning, coding and agentic duties. Throughout most classes, it exhibits constant enhancements over GLM-4.5 and aggressive efficiency towards high-end proprietary fashions equivalent to Claude Sonnet 4.

In real-world coding evaluations, GLM-4.6 achieved near-parity outcomes with proprietary fashions whereas utilizing fewer tokens per activity. It additionally demonstrates improved efficiency in tool-augmented reasoning and multi-turn coding workflows, making it one of many strongest open fashions presently accessible.

Licensing and Openness

GLM-4.6 is launched underneath permissive licenses equivalent to MIT and Apache, permitting unrestricted business use, self-hosting and fine-tuning. Builders can obtain each base and instruct variations and combine them into their very own infrastructure. This openness stands in distinction to proprietary fashions like Claude and GPT, which may solely be used by way of paid APIs.

Accessing GLM-4.6 through API

GLM-4.6 is on the market on the Clarifai Platform, and you’ll entry it through API utilizing the OpenAI-compatible endpoint.

Step 1: Create a Clarifai Account and Get a Private Entry Token(PAT)

Join, and generate a Private Entry Token. You can even take a look at GLM-4.6 within the Clarifai Playground by choosing the mannequin and attempting coding, reasoning or agentic prompts.

Step 2: Set Up Your Atmosphere

Step 3: Name GLM-4.6 through the API

Step 4: Utilizing TypeScript or JavaScript

You can even entry GLM 4.6 by way of the API utilizing different languages like Node.js and cURL. Try all of the examples right here.

Use Instances for GLM-4.6

Superior Coding Help

GLM-4.6 exhibits sturdy enhancements in code technology accuracy and effectivity. It produces high-quality code whereas utilizing fewer tokens than GLM-4.5. In human-rated evaluations, its coding means approaches that of proprietary frontier fashions. This makes it appropriate for full-stack improvement assistants, automated code evaluate, bug-fixing brokers and repository-level evaluation.

Agentic Workflows and Device Orchestration

GLM-4.6 is constructed for tool-augmented reasoning. It could plan multi-step duties, name exterior APIs, verify outcomes and keep state throughout interactions. This permits autonomous coding brokers, analysis assistants and sophisticated workflow automation techniques that depend on structured instrument calls.

Lengthy-Context Doc Evaluation

With a 200k-token window, the mannequin can learn and cause over total books, authorized paperwork, technical manuals or multi-hour transcripts. It helps compliance evaluate, multi-document synthesis, long-form summarization and codebase understanding.

Bilingual Improvement and Inventive Writing

The mannequin is educated on each Chinese language and English and delivers sturdy efficiency in bilingual duties. It’s helpful for translation, localization, bilingual code documentation and artistic writing duties that require pure model and voice.

Enterprise-Grade Deployment and Customization

Because of its open license and versatile MoE structure, organizations can self-host GLM-4.6 on non-public clusters, fine-tune on proprietary knowledge and combine it with their inside instruments. Group quantizations additionally allow lighter deployments on restricted {hardware}. Clarifai offers an alternate cloud-hosted pathway for groups that need API entry with out managing infrastructure.

Conclusion

GLM-4.6 is a significant milestone in open AI improvement. It combines a big MoE structure, a 200k-token context window, hybrid reasoning modes and native tool-calling to ship efficiency that rivals proprietary frontier fashions. It improves on GLM-4.5 throughout coding, reasoning and tool-augmented duties whereas remaining totally open and self-hostable.

Whether or not you’re constructing autonomous coding brokers, analyzing giant doc units or orchestrating advanced multi-tool workflows, GLM-4.6 offers a versatile, high-performance basis with out vendor lock-in.

Tags
API
GLM
Run