AI discovers new math algorithms

PLUS: Anthropic reportedly set to launch new Sonnet, Opus models

Good morning, AI enthusiasts. The race to achieve AI that makes genuine scientific breakthroughs just hit a milestone — with DeepMind's AlphaEvolve discovering new math solutions that have eluded humans since the 1960s.

By harnessing Gemini's language capabilities within an evolutionary framework, this AI coding agent isn't just theoretically impressive — it's already optimizing Google's data centers and accelerating the very systems that power it.

In today’s AI rundown:

  • Google’s AlphaEvolve discovers math breakthroughs

  • Anthropic set to launch new Sonnet, Opus models

  • Transform text into polished PDFs instantly

  • OpenAI’s new Safety Evaluations dashboard

  • 4 new AI tools & 4 job opportunities

LATEST DEVELOPMENTS

GOOGLE

Image source: o3 / The Rundown

The Rundown: Google just debuted AlphaEvolve, a coding agent that harnesses Gemini and evolutionary strategies to craft algorithms for scientific and computational challenges — driving efficiency inside Google and solving historic math problems.

The details:

  • AlphaEvolve uses a mix of Gemini models (Flash for idea generation, Pro for analysis) to create code, which is tested by evaluators and evolved iteratively.

  • The system has already made several mathematical discoveries, including finding the first improvement on Strassen's algorithm from 1969.

  • It is also boosting efficiency for Google, optimizing data center scheduling, improving AI training (including its own), and helping with chip design.

  • When tested on 50+ open math problems, it matched SOTA solutions in 75% and discovered entirely new, improved solutions in another 20%.

Why it matters: Yesterday, we had OpenAI’s Jakub Pachocki saying AI has shown “significant evidence” of being capable of novel insights, and today Google has taken that a step further. Math plays a role in nearly every aspect of life, and AI’s pattern and algorithmic strengths look ready to uncover a whole new world of scientific discovery.

TOGETHER WITH ENCORD

The Rundown: Encord consolidates multimodal AI data management, curation, and annotation pipelines to one single platform — helping teams accelerate model iteration cycles by using an agentic AI data workflow system to prepare balanced, accurately labeled datasets 10x faster.

Join the Encord ML team on May 22 for a demo-focused webinar where you’ll learn to:

  • Use world models to build agents that adapt and reason across multimodal contexts

  • Identify and supervise edge-case behavior within petabyte-scale real-world sensor data

  • Create high-quality datasets powering VLAs for robotics, ADAS, and more

ANTHROPIC

Image source: Anthropic

The Rundown: Anthropic is reportedly preparing to launch advanced versions of Claude’s Sonnet and Opus models in the “upcoming weeks,” featuring hybrid thinking and expanded tool use capabilities.

The details:

  • The models are reportedly capable of alternating between reasoning and tool use, and can self-correct by stepping back to examine what went wrong.

  • For coding, the models can test their generated code, ID errors, troubleshoot with reasoning, and make corrections without requiring human intervention.

  • An Anthropic model, codenamed Neptune, is undergoing safety testing, with some believing the name hints at a 3.8 (8th planet from the sun) release.

  • The news coincides with Anthropic launching a new bug bounty program focused on testing Claude’s principles on safety measures.

Why it matters: While Anthropic has been in the mix with Google and OpenAI for the top model in the industry, the company has been much slower to bring new ones to market — with 3.7 Sonnet in February marking its only release in 2025. With both other rivals also likely releasing upgrades soon, we could be in for a wild few months.

AI TRAINING

The Rundown: In this tutorial, you will learn how to use Grok's new PDF rendering feature to create professional-looking documents directly from prompts — with instant previews and editing capabilities.

Step-by-step:

  1. Visit Grok from your computer browser to access the main chat.

  2. Write a detailed prompt describing the document you need (resume, literature review for a research paper, or invoices).

  3. Review the preview and refine your document using follow-up prompts or by editing the LaTeX code directly through the Code button.

  4. Download your finalized PDF using the download button.

Pro tip: For LaTeX research papers, remember to save both the PDF and source code for future editing or journal submissions that require the original LaTeX files!

PRESENTED BY HACKERRANK

The Rundown: Struggling to source high-quality data for your AI models? HackerRank now delivers custom datasets designed by the experts who test millions of human developers every year.

With HackerRank, you can:

  • Curate a custom dataset on specific software development skills

  • Access a workforce of development experts for data labelling and annotation

  • Request an evaluation dataset to test your model’s performance

OPENAI

Image source: OpenAI

The Rundown: OpenAI launched a new Safety Evaluations Hub that will publicly and regularly display test results for its AI models, showing how they perform on metrics like harmful content generation, hallucination rates, and jailbreak attempts.

The details:

  • The hub shows comparative performance data across OAI models, including metrics for refusing harmful content and accuracy on factual questions.

  • The dashboard currently focuses on four categories: harmful content, jailbreak vulnerability, hallucination rates, and adherence to instruction hierarchy.

  • OpenAI promises to update the page "periodically" as part of what it calls a company-wide effort to communicate more proactively about AI safety.

  • The release comes after critiques that the company is not transparent with safety testing, and following issues with a recent rollout of a GPT 4o update.

Why it matters: With labs racing to push out models to keep pace with rivals, many believe safety has been taking a backseat to speed. This is a great step towards more transparency, but it will be relying on OpenAI to self-report and continually update the data — which likely won’t completely satisfy those calling for stricter safety measures.

QUICK HITS

  • 🔌 Gemini Advanced - Connect Google’s advanced assistant to GitHub repos

  • 🤖 GPT 4.1 - OpenAI’s advanced coding model, now available in ChatGPT

  • 🤳 TikTok AI Alive - Turn static images into dynamic videos for TikTok Stories

  • 🐰 CodeRabbit - AI code reviews directly in Cursor, Windsurf, and VSCode

  • 🎨 The Rundown - Designer (Brand & Platform)

  • 🧪 Writer - AI Researcher

  • ⚙️ OpenAI - Software Engineer, Inference

  • 💻 Siena - Senior Fullstack Engineer

OpenAI added GPT 4.1 and GPT 4.1-mini coding-focused models to ChatGPT, now available to both free and paid users.

Stability AI open-sourced Stable Audio Open Small, a text-to-audio model for generating music samples, capable of running on consumer devices with no internet.

Perplexity and PayPal announced a new partnership, allowing users to check out with both PayPal and Venmo when making purchases on the AI platform.

Meta’s released science research, including the Open Molecules 2025 dataset, the Universal Model for Atoms, and a study on language development and AI training.

NVIDIA is securing AI chip deals in the Middle East, supplying Saudi Arabia’s Humain and the UAE after meetings with the Trump admin and other regional leaders.

Nous research launched Psyche, a new open, decentralized AI infrastructure that allows individuals to pool compute to train models without massive investment costs.

Klarna CEO Sebastian Siemiatkowski revealed the fintech giant cut 40% of its workforce due to AI, but now plans to hire human agents after a hit on work quality.

COMMUNITY

Join our next workshop this Friday, May 16th, at 4 PM EST with Dr. Alvaro Cintas, The Rundown’s AI professor. By the end of the workshop, you’ll confidently understand how to design, build, and deploy your own AI systems using OpenAI’s Agents SDK.

RSVP here. Not a member? Join The Rundown University on a 14-day free trial.

We’ll always keep this newsletter 100% free. To support our work, consider sharing The Rundown with your friends, and we’ll send you more free goodies.

That's it for today!

Before you go we’d love to know what you thought of today's newsletter to help us improve The Rundown experience for you.

Login or Subscribe to participate in polls.

See you soon,

Rowan, Joey, Zach, Alvaro, and Jason—The Rundown’s editorial team

Reply

or to participate.