Structured Context Engineering for File-Native Agentic Systems
EXECUTIVE SUMMARY
Unlocking the Power of Context in Large Language Models: A New Study
Summary
This article discusses a new paper by Damon McMillan that explores the challenges of context engineering in large language models (LLMs) when dealing with structured data, particularly large SQL schemas. The study includes extensive experimentation across various models and formats to assess performance.
Key Points
- The study involved 9,649 experiments across 11 models and 4 formats: YAML, Markdown, JSON, and Token-Oriented Object Notation (TOON).
- SQL generation was used as a proxy for programmatic agent operations in structured data contexts.
- Frontier models such as Opus 4.5, GPT-5.2, and Gemini 2.5 Pro outperformed leading open-source models like DeepSeek V3.2, Kimi K2, and Llama 4.
- Filesystem-based context retrieval significantly benefited frontier models, while open-source models struggled with this approach.
- The Terminal Bench 2.0 leaderboard remains dominated by Anthropic, OpenAI, and Gemini.
- An interesting finding was the "grep tax" against TOON, where unfamiliarity with the format led to increased token usage in iterations.
Analysis
The findings highlight the importance of model selection in handling complex structured data tasks, particularly in environments where large SQL schemas are involved. This research underscores the evolving capabilities of frontier models in effectively managing context retrieval, which is crucial for IT professionals working with LLMs.
Conclusion
IT professionals should consider leveraging frontier models for applications involving structured data and SQL schemas to enhance performance. Continuous evaluation of model capabilities is essential for optimizing context engineering strategies.