Battle of Robustness: Gemini Flash 2.5 vs. GPT-5 on Noisy Sentiment Analysis
Small grammar mistakes are easy for humans to overlook — but for AI models, messy input can be a serious challenge. To test how robust today’s top models are, we ran Gemini Flash 2.5 and GPT-5 through the same series of prompts, each containing progressively noisier spelling and grammar errors, all describing the movie Misery in a highly positive tone.
Goal: Which model handles noise better, keeps higher confidence, and recovers accuracy after sentence correction?
Test Setup
Each prompt was a degraded version of:
‘Misery’ is the best movie I’ve ever seen, since I was a small boy.*
Across six tests, we introduced:
- Spelling errors
- Word swaps
- Verb-agreement mistakes
- Mis-spelled movie title
- Disordered pronouns
- Finally, the models were asked to correct the sentence before analyzing sentiment.
Both models returned probabilities for Positive, Neutral, and Negative sentiment.
Result: Gemini Flash 2.5 vs. GPT-5
Table — Sentiment Probabilities Across Noise Levels
| Prompt # | Noise Level | Gemini Positive | GPT-5 Positive |
|---|---|---|---|
| 1 | Minor typos | 98.5% | 94% |
| 2 | More typos | 96.8% | 96% |
| 3 | Grammar errors | 94.5% | 95% |
| 4 | Heavy corruption | 92.1% | 94% |
| 5 | Similar heavy corruption | 91.5% | 95% |
| 6 | With correction | 99.2% | 97% |
How Each Model Behaves Under Noise
Gemini Flash 2.5: Smooth degradation, but sensitive to cumulative noise. Gemini begins with very high confidence (98.5%) and shows a gradual, consistent drop as spelling and grammar degrade.
Weakness:
- Confidence decreases more sharply under heavy noise than GPT-5.
GPT-5: More stable across noise, less shaken by corruption. GPT-5 starts slightly lower than Gemini (94%), but its probabilities stay tightly clustered between 94–96% across all noisy prompts.
Weakness:
- Does not recover as strongly after sentence correction.
- Lacks the dramatic confidence “snap-back” that Gemini shows.
Side-by-Side Performance Analysis
A. Noise Robustness
Winner: GPT-5
GPT-5 barely fluctuates. Its lowest confidence (94%) is still higher than Gemini’s lowest (91.5%).
B. Sensitivity to spelling and grammar errors
Winner: GPT-5
Gemini loses confidence faster as errors accumulate.
C. Correction + Re-classification Accuracy
Winner: Gemini Flash 2.5
After sentence correction:
- Gemini jumps to 99.2%
- GPT-5 rises to 97%
Gemini’s language-correction pipeline gives it an advantage.
D. Overall Confidence Trend
Gemini Trend:
98.5 → 96.8 → 94.5 → 92.1 → 91.5 → 99.2
GPT-5 Trend:
94 → 96 → 95 → 94 → 95 → 97
P.S. Gemini used default thinking mode, while no thinking mode was used in GPT-5 and their latency was identical.
