- The Rundown AI
- Posts
- Poetiq cracks major reasoning benchmark
Poetiq cracks major reasoning benchmark
PLUS: Create LinkedIn carousels in ChatGPT with Canva
Good morning, AI enthusiasts. Six months ago, the best AI models could barely hit 5% on the ARC-AGI-2 reasoning benchmark. Today, a tiny startup just crossed 50% — and beat Google using its own model in the process.
With a “meta-system” that refines existing models rather than building from scratch, Poetiq's achievement shows that the next breakthroughs might come from clever engineering, not just pure scale.
In today’s AI rundown:
Poetiq tops ARC-AGI-2 with Gemini variant
The Rundown Roundtable: Our AI use cases
Create LinkedIn carousels in ChatGPT with Canva
Poetry prompts can bypass AI safety guardrails
4 new AI tools, community workflows, and more
LATEST DEVELOPMENTS
POETIQ

Image source: Poetiq
The Rundown: Six-person AI startup Poetiq just officially claimed the top spot on the ARC-AGI-2 reasoning benchmark, beating out Google’s Gemini 3 Deep Think at half the cost by orchestrating existing models over building its own.
The details:
Poetiq's meta-system adapts to new models within hours, achieving the top-ranked results shortly after Gemini 3 launched without any retraining.
Using Gemini 3 Pro as a base, Poetiq's refinement system scored 54% at $30 per task — outpacing Google's top variant Deep Think at 45% and $77.
The result marks the first system to crack the 50% barrier on ARC-AGI-2, with leading models previously struggling to hit 5% just six months ago.
The startup’s open-sourced approach uses LLMs to continuously refine their own outputs, with a built-in self-auditing system to ensure quality solutions.
Why it matters: The ARC-AGI-2 progress from sub-5% to over 50% in just months shows how quickly things are advancing. Poetiq’s refinement shows a future with AI gains coming from two directions at once: frontier model development and clever orchestration built on top of them from teams without massive compute budgets.
TOGETHER WITH LINDY
The Rundown: Describe what you need done, and Lindy builds custom AI agents that qualify your leads, draft your reports, handle customer support, and knock out the busywork eating up your team's day. No coding. No complexity. Just results.
What you can automate today:
Sales agents that qualify leads and book meetings while you sleep
Support agents who resolve tickets instantly across phone and chat
Ops agents that turn hours of manual work into minutes
Start free with $20 in credits today and get up and running in minutes with Lindy’s 6,000+ integrations.
THE RUNDOWN ROUNDTABLE

Image source: Ideogram / The Rundown
The Rundown: The Rundown Roundtable is a weekly feature in which we poll members of The Rundown staff about how we use AI in our work and daily lives.
Billy, Educator: I’m a big basketball fan. The launch of Nano Banana 3.0 coincided with the start of the NBA season. So to test its consistency, I used a dynamic prompt formula in Google Sheets + Nano Banana to generate product photos of hats for each NBA team. I was able to get consistent styling across each design as if they were part of a fictional brand. Now I just need AI to get me an NBA licensing deal…
Reagan, Strategic Partnerships: Being outdoors and in nature is part of my daily life. During the week, I often go for long walks in between work blocks and recently discovered Wispr Flow. It’s a time I’m often thinking through work solutions and brainstorming ideas, so having the ability to simply talk and have those ideas transcribed and sent directly to my workspace has been amazing.
Rishi, Product Marketing Manager: I’m building a new paid advertising tracker in Google Sheets, and want to document certain parts that need explanation in our central database (Notion).
An easy way to do this is filming looms, taking the transcript, and plugging it into ChatGPT with the following prompt "I filmed a Loom explaining X. Using the transcript below, please write a 5-8 sentence summary which explains what X is, what it does, what it means, and how to use it in a simple to understand way?"
AI TRAINING

The Rundown: In this tutorial, you’ll learn how to create professional LinkedIn carousels in minutes using ChatGPT's Canva app integration, which gives you the ability to draft content and design slides all within a single interface.
Step-by-step:
Go to ChatGPT, open a new chat, click the '+' button to select Canvas, then prompt: "Write a 5-slide LinkedIn carousel on '(your topic)'. Slide 1: A hook. Slides 2-4: One tip each. Slide 5: A CTA. Keep each under 40 words"
Refine your content in Canvas, then activate Canva by prompting: "@canva, create a 5-slide LinkedIn carousel using this content [paste slides]. Use a (detailed style of your choice). Stick to the content copy exactly"
Preview the 4 design options ChatGPT generates, select your favorite, and click the Canva link to open your editable carousel
Review each slide in Canva, make any final tweaks, then click Download and select PDF for LinkedIn documents or PNG for individual slides
Pro tip: Use your brand colors and fonts consistently — once you prompt them in chat, the integration applies them automatically to the carousels.
PRESENTED BY FIDDLER AI
The Rundown: Fiddler AI’s upcoming product webinar breaks down how agentic observability can improve AI performance and behavior with visibility, context, and control. Gain deep insights of your AI systems through end-to-end visibility, from pre-production evaluation to production monitoring.
In this live webinar, learn how to:
Validate agent behavior before production with golden and challenger datasets
Track system-wide health and drill into span-level metrics across the agentic hierarchy
Diagnose reasoning chains and decision paths to pinpoint points of failure
Register today to attend live or receive the recording afterward.
AI RESEARCH

Image source: Reve / The Rundown
The Rundown: A new study from Italy’s Icaro Labs just discovered that reformulating harmful requests as poetry can trick leading AI models into producing dangerous content, with some systems falling for the technique every single time.
The details:
Icaro Lab tested 25 frontier models from major labs like OpenAI, Google, and Anthropic, finding poetry verses achieved a 62% average jailbreak success rate.
Google's Gemini 2.5 Pro was most vulnerable at 100%, while OpenAI's smaller GPT-5 nano resisted all attempted poetry attacks.
The poem prompting unlocked dangerous responses on topics including weapons development, hacking, and psychological manipulation.
Researchers declined to publish the specific poems, calling them "too dangerous" despite reportedly being simple enough for anyone to create.
Why it matters: AI safety has become a whack-a-mole game, with poetry now joining roleplay scenarios, foreign language tricks, and encoding exploits on the growing list of unexpected vulnerabilities. Each patch seems to invite a new creative workaround — and there’s no finish line for a problem that is only going to get more advanced.
QUICK HITS
3️⃣ Mistral 3 - Mistral’s next-generation of open-source models
🌱 Seedream 4.5 - ByteDance’s image AI with powerful editing, text rendering
🧍Kling Avatar 2.0 - Upgraded avatar model with up to 5-minute generations
🗣️VibeVoice - Microsoft’s open-source, real-time text-to-speech model
OpenAI is turning off shopping suggestions after backlash over responses that looked like ads, with CRO Mark Chen saying they “fell short” on the implementation.
Meta acquired Limitless, a startup backed by Sam Altman that makes an AI-powered pendant for recording and transcribing real-world conversations.
The New York Times and Chicago Tribune filed separate lawsuits against Perplexity over copyright infringement, marking the NYT’s second lawsuit against the AI startup.
Meta announced a series of new AI licensing deals with publishers, including CNN, Fox News, and USA Today, to feed real-time news content into its Meta AI platform.
The U.S. Department of Energy launched AMP2, a new AI research platform that officials say will be the world’s largest autonomous system for studying microbes.
COMMUNITY
Every newsletter, we showcase how a reader is using AI to work smarter, save time, or make life easier.
Today’s workflow comes from reader Anonymous in Houston, TX:
"I recently used ChatGPT as a strategic partner throughout a full interview and negotiation process, and the experience was surprisingly impactful. I leaned on AI to help me prep for interviews, refine talking points, and rehearse answers so I was confident and concise.
Once the offer stage began, ChatGPT helped me craft positioning statements, negotiation language, and follow-up emails that were assertive but professional."
How do you use AI? Tell us here.
Read our last AI newsletter: Anthropic puts Claude in the interviewer chair
Read our last Tech newsletter: Netflix buys Warner Bros. in $82B deal
Read our last Robotics newsletter: Humanoid breaks record for fastest build
Today’s AI tool guide: Reverse Engineer Ad Creatives in Minutes
Watch our last live workshop: Nano Banana For Slide Decks
That's it for today!Before you go we’d love to know what you thought of today's newsletter to help us improve The Rundown experience for you. |
See you soon,
Rowan, Joey, Zach, Shubham, and Jennifer — the humans behind The Rundown




Reply