Exploring Llama 4, OpenRouter, and Model Comparison Tools

Episode 11 April 26, 2025

Featuring: Jason Hand, Ryan MacLean

Ten million tokens. That's not a typo—Meta just dropped Llama 4 with a context window so massive it could swallow entire codebases for breakfast. Jason and Ryan dive headfirst into this technical marvel, testing whether such an enormous memory span might finally make RAG systems obsolete. But here's the plot twist: even with superhuman context capabilities, the latest models still stumble on basic questions about musicians and specialized programming knowledge. Watch them put Llama 4 through its paces alongside comparison platforms like OpenRouter and LM Arena, only to discover an unknown model called LunarCall quietly outperforming the giants. Sometimes the most interesting discoveries happen in the footnotes of technology.

In this episode, Jason and Ryan explore the freshly released Llama 4 model from Meta, which was just released over the weekend. They dive into its capabilities, testing it on Hugging Face, and discuss its groundbreaking 10+ million token context window. The conversation covers whether such a massive context window might eliminate the need for RAG (Retrieval Augmented Generation) and how it could simplify prompt engineering by allowing for more detailed system prompts and guardrails. They also explore two model comparison platforms—OpenRouter and LM Arena—which allow users to test and compare different AI models side by side. During their exploration, they discover a lesser-known model called LunarCall that surprisingly outperforms others on a specific test. This episode provides valuable insights into the rapidly evolving landscape of AI models and practical tools for comparing their performance.

Key Takeaways

Llama 4 features a massive 10+ million token context window, potentially revolutionizing how we work with large documents and complex instructions
Despite large context windows, RAG (Retrieval Augmented Generation) remains valuable for cost efficiency and performance optimization
Expanded context windows enable more comprehensive guardrails and detailed system prompts in production applications
Even the latest AI models still struggle with certain types of knowledge, particularly specialized programming techniques and niche factual information
AI hallucinations remain a concern, particularly for factual questions, as demonstrated by the incorrect musician information
Tools like OpenRouter and LM Arena provide valuable ways to compare different models for specific use cases

Exploring Llama 4, OpenRouter, and Model Comparison Tools

Jump To

Key Takeaways

Resources

Llama 4 on HuggingFace

OpenRouter

LM Arena

DeepSeek