This is an old revision of the document!
Table of Contents
AI
Learning AI:
What is the best way to learn Artificial Intelligence for a beginner
Here's the data from the image in a list format, categorized by the type of task and then by the AI model: 1. Agentic Coding (SWE-bench Verified) * Claude Opus 4: 72.5% / 79.4% * Claude Sonnet 4: 72.7% / 80.2% * Claude Sonnet 3.7: 62.3% / 70.3% * OpenAI o3: 69.1% * OpenAI GPT-4.1: 54.6% * Gemini 2.5 Pro (Preview 05-06): 63.2% 2. Agentic Terminal Coding (terminal-bench) * Claude Opus 4: 43.2% / 50.0% * Claude Sonnet 4: 35.5% / 41.3% * Claude Sonnet 3.7: 35.2% * OpenAI o3: 30.2% * OpenAI GPT-4.1: 30.3% * Gemini 2.5 Pro (Preview 05-06): 25.3% 3. Graduate-level Reasoning (GPQA Diamond) * Claude Opus 4: 79.6% / 83.3% * Claude Sonnet 4: 75.4% / 83.8% * Claude Sonnet 3.7: 78.2% * OpenAI o3: 83.3% * OpenAI GPT-4.1: 66.3% * Gemini 2.5 Pro (Preview 05-06): 83.0% 4. Agentic Tool Use (TAU-bench) * Retail: * Claude Opus 4: 81.4% * Claude Sonnet 4: 80.5% * Claude Sonnet 3.7: 81.2% * OpenAI o3: 70.4% * OpenAI GPT-4.1: 68.0% * Airline: * Claude Opus 4: 59.6% * Claude Sonnet 4: 60.0% * Claude Sonnet 3.7: 58.4% * OpenAI o3: 52.0% * OpenAI GPT-4.1: 49.4% * Gemini 2.5 Pro (Preview 05-06): (No data provided) 5. Multilingual Q&A (MMMUA) * Claude Opus 4: 88.8% * Claude Sonnet 4: 86.5% * Claude Sonnet 3.7: 85.9% * OpenAI o3: 88.8% * OpenAI GPT-4.1: 83.7% * Gemini 2.5 Pro (Preview 05-06): (No data provided) 6. Visual Reasoning (MMMU (validation)) * Claude Opus 4: 76.5% * Claude Sonnet 4: 74.4% * Claude Sonnet 3.7: 75.0% * OpenAI o3: 82.9% * OpenAI GPT-4.1: 74.8% * Gemini 2.5 Pro (Preview 05-06): 79.6% 7. High School Math Competition (AIME 2024) * Claude Opus 4: 75.5% / 90.0% * Claude Sonnet 4: 70.5% / 85.0% * Claude Sonnet 3.7: 54.8% * OpenAI o3: 88.9% * OpenAI GPT-4.1: (No data provided) * Gemini 2.5 Pro (Preview 05-06): 83.0%