smart_toyAI/PROMPT ENGINEERING

How fast is 10 tokens per second really?

sourceSimon Willison

calendar_todayMay 20, 2026

schedule1 min read

lightbulb

EXECUTIVE SUMMARY

Understanding Token Output Speeds in AI Models

Summary

This article discusses a tool created by Mike Veerman that simulates the output speeds of large language models (LLMs) in terms of tokens per second. It aims to help users comprehend the practical implications of advertised token speeds in AI models.

Key Points

The tool simulates LLM token output speeds ranging from 5 tokens/second to 800 tokens/second.
It provides a visual representation of what different token speeds look like in practice.
The application is particularly useful for evaluating models that claim speeds like "30 tokens/second."
The source code for the app is available, enabling further exploration and customization.
The discussion originated from a post on Hacker News, highlighting community interest in AI performance metrics.

Analysis

Understanding token output speeds is crucial for IT professionals working with AI and generative models, as it allows them to set realistic expectations for performance based on advertised metrics. This tool serves as a practical resource for evaluating the efficiency of various models in real-world applications.

Conclusion

IT professionals should leverage this simulation tool to better understand and assess the capabilities of different AI models, ensuring they make informed decisions when selecting technologies for their projects.