Usage

Wafer streaming performance for GLM and Qwen.

Loading...

TPS

Output tokens per second while the response streams.

TTFT

Time to first token. Lower means the model starts responding faster.

BY MODEL

Request counts below reflect the selected range.

Model Runs OK p50 TTFT p95 TTFT First 100 Total TPS E2E TPS Avg tokens Avg stalls Max stall