Usage

Wafer streaming performance for GLM and Qwen.

Output tokens per second while the response streams.

Time to first token. Lower means the model starts responding faster.

Request counts below reflect the selected range.

Model	Runs	OK	p50 TTFT	p95 TTFT	First 100	Total	TPS	E2E TPS	Avg tokens	Avg stalls	Max stall