Casa de Brain

Battle of Robustness: Gemini Flash 2.5 vs. GPT-5 on Noisy Sentiment Analysis

Small grammar mistakes are easy for humans to overlook — but for AI models, messy input can be a serious challenge. To test how robust today’s top models are, we ran Gemini Flash 2.5 and GPT-5 through the same series of prompts, each containing progressively noisier spelling and grammar errors, all describing the movie Misery in a highly positive tone.

Goal: Which model handles noise better, keeps higher confidence, and recovers accuracy after sentence correction?

Test Setup

Each prompt was a degraded version of:

‘Misery’ is the best movie I’ve ever seen, since I was a small boy.*

Across six tests, we introduced:

Both models returned probabilities for Positive, Neutral, and Negative sentiment.

Result: Gemini Flash 2.5 vs. GPT-5

Table — Sentiment Probabilities Across Noise Levels

Prompt # Noise Level Gemini Positive GPT-5 Positive
1 Minor typos 98.5% 94%
2 More typos 96.8% 96%
3 Grammar errors 94.5% 95%
4 Heavy corruption 92.1% 94%
5 Similar heavy corruption 91.5% 95%
6 With correction 99.2% 97%

How Each Model Behaves Under Noise

Gemini Flash 2.5: Smooth degradation, but sensitive to cumulative noise. Gemini begins with very high confidence (98.5%) and shows a gradual, consistent drop as spelling and grammar degrade.

Weakness:

GPT-5: More stable across noise, less shaken by corruption. GPT-5 starts slightly lower than Gemini (94%), but its probabilities stay tightly clustered between 94–96% across all noisy prompts.

Weakness:

Side-by-Side Performance Analysis

A. Noise Robustness
Winner: GPT-5
GPT-5 barely fluctuates. Its lowest confidence (94%) is still higher than Gemini’s lowest (91.5%).

B. Sensitivity to spelling and grammar errors
Winner: GPT-5
Gemini loses confidence faster as errors accumulate.

C. Correction + Re-classification Accuracy
Winner: Gemini Flash 2.5
After sentence correction:

D. Overall Confidence Trend
Gemini Trend:
98.5 → 96.8 → 94.5 → 92.1 → 91.5 → 99.2
GPT-5 Trend:
94 → 96 → 95 → 94 → 95 → 97

Comparison Table
Image Link

P.S. Gemini used default thinking mode, while no thinking mode was used in GPT-5 and their latency was identical.