User Tools

Site Tools


ai

AI

Learning AI:

What is the best way to learn Artificial Intelligence for a beginner

AI Model Performance Comparison

This page presents a comparison of various AI models across different task categories, based on the provided data.

Agentic Coding (SWE-bench Verified)

Model Score 1 Score 2
Claude Opus 4 72.5% 79.4%
Claude Sonnet 4 72.7% 80.2%
Claude Sonnet 3.7 62.3% 70.3%
OpenAI o3 69.1%
OpenAI GPT-4.1 54.6%
Gemini 2.5 Pro (Preview 05-06) 63.2%

Agentic Terminal Coding (terminal-bench)

Model Score 1 Score 2
Claude Opus 4 43.2% 50.0%
Claude Sonnet 4 35.5% 41.3%
Claude Sonnet 3.7 35.2%
OpenAI o3 30.2%
OpenAI GPT-4.1 30.3%
Gemini 2.5 Pro (Preview 05-06) 25.3%

Graduate-level Reasoning (GPQA Diamond)

Model Score 1 Score 2
Claude Opus 4 79.6% 83.3%
Claude Sonnet 4 75.4% 83.8%
Claude Sonnet 3.7 78.2%
OpenAI o3 83.3%
OpenAI GPT-4.1 66.3%
Gemini 2.5 Pro (Preview 05-06) 83.0%

Agentic Tool Use (TAU-bench)

Retail

Model Score
Claude Opus 4 81.4%
Claude Sonnet 4 80.5%
Claude Sonnet 3.7 81.2%
OpenAI o3 70.4%
OpenAI GPT-4.1 68.0%
Gemini 2.5 Pro (Preview 05-06) N/A

Airline

Model Score
Claude Opus 4 59.6%
Claude Sonnet 4 60.0%
Claude Sonnet 3.7 58.4%
OpenAI o3 52.0%

Local deployment

Running AI Models Locally with Docker and Spring AI Play https://www.danvega.dev/blog/docker-model-runner

AI Gemma 3

Gemii-cli

https://www.youtube.com/watch?v=xqvprnPocHs

https://github.com/google-gemini/gemini-cli

winget install -e --id OpenJS.NodeJS

npm install -g @google/gemini-cli

npm upgrade -g @google/gemini-cli

# start
gemini

Docker Model Runner

Spring AI

Interfacing with the AI mode

MCP - Model Context Protocol

ai.txt · Last modified: by skipidar