Usage
Wafer streaming performance for GLM and Qwen.
Loading...
TPS
Output tokens per second while the response streams.
TTFT
Time to first token. Lower means the model starts responding faster.
BY MODEL
Request counts below reflect the selected range.
| Model | Runs | OK | p50 TTFT | p95 TTFT | First 100 | Total | TPS | E2E TPS | Avg tokens | Avg stalls | Max stall |
|---|