radar

ONE Sentinel

smart_toyAI/PROMPT ENGINEERING

Structured Context Engineering for File-Native Agentic Systems

sourceSimon Willison
calendar_todayFebruary 10, 2026
schedule2 min read
lightbulb

EXECUTIVE SUMMARY

Unlocking the Power of Context in Large Language Models: A New Study

Summary

This article discusses a new paper by Damon McMillan that explores the challenges of context engineering in large language models (LLMs) when dealing with structured data, particularly large SQL schemas. The study includes extensive experimentation across various models and formats to assess performance.

Key Points

  • The study involved 9,649 experiments across 11 models and 4 formats: YAML, Markdown, JSON, and Token-Oriented Object Notation (TOON).
  • SQL generation was used as a proxy for programmatic agent operations in structured data contexts.
  • Frontier models such as Opus 4.5, GPT-5.2, and Gemini 2.5 Pro outperformed leading open-source models like DeepSeek V3.2, Kimi K2, and Llama 4.
  • Filesystem-based context retrieval significantly benefited frontier models, while open-source models struggled with this approach.
  • The Terminal Bench 2.0 leaderboard remains dominated by Anthropic, OpenAI, and Gemini.
  • An interesting finding was the "grep tax" against TOON, where unfamiliarity with the format led to increased token usage in iterations.

Analysis

The findings highlight the importance of model selection in handling complex structured data tasks, particularly in environments where large SQL schemas are involved. This research underscores the evolving capabilities of frontier models in effectively managing context retrieval, which is crucial for IT professionals working with LLMs.

Conclusion

IT professionals should consider leveraging frontier models for applications involving structured data and SQL schemas to enhance performance. Continuous evaluation of model capabilities is essential for optimizing context engineering strategies.