In 'Milestone' for Open Source, Meta Releases New Benchmark-Beating Llama 4 Models

It's "a milestone for Meta AI and for open source," Mark Zuckerberg said this weekend. "For the first time, the best small, mid-size, and potentially soon frontier [large-language] models."Zuckerberg anounced four new Llama LLMs in a video posted on Instagram and Facebook — two dropping this weekend, with another two on the way. "Our goal is to build the world's leading AI, open source it, and make it universally accessible so that everyone in the world benefits."Zuckerberg's announcement:Zuck promised more news next month on "Llama 4 Reasoning" — but the fourth model will be called Llama 4 Behemoth. "This thing is massive. More than 2 trillion parameters." (A blog post from Meta AI says it also has a 288 billion active parameter model, outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks, and will "serve as a teacher for our new models.")"I'm not aware of anyone training a larger model out there," Zuckberg says in his video, calling Behemoth "already the highest performing base model in the world, and it is not even done training yet.""If you want to try Llama 4, you can use Meta AI in WhatsApp, Messenger, or Instagram Direct," Zuckberg said in his video, "or you can go to our web site at meta.ai ." The Scout and Maverick models can be downloaded from llama.com and Hugging Face "We continue to believe that openness drives innovation," Meta AI says in their blog post, "and is good for developers, good for Meta, and good for the world." Their blog post declares it's "The beginning of a new era of natively multimodal AI innovation," calling Scout and Maverick "the best choices for adding next-generation intelligence.""The impressive part about Llama 4 Maverick is that with just 17B active parameters, it has scored an ELO score of 1,417 on the LMArena leaderboard," notes the tech news site Beebom . "This puts the Maverick model in the second spot, just below Gemini 2.5 Pro, and above Grok 3, GPT-4o, GPT-4.5, and more."It also achieves comparable results when compared to the latest DeepSeek V3 model on reasoning and coding tasks, and surprisingly, with just half the active parameters."