Battle of Robustness: Gemini Flash 2.5 vs. GPT-5 on Noisy Sentiment Analysis

02 Dec, 2025

Small grammar mistakes are easy for humans to overlook — but for AI models, messy input can be a serious challenge. To test how robust today’s top models are, we ran Gemini Flash 2.5 and GPT-5 through the same series of prompts, each containing progressively noisier spelling and grammar errors, all describing the movie Misery in a highly positive tone.

Goal: Which model handles noise better, keeps higher confidence, and recovers accuracy after sentence correction?

Test Setup

Each prompt was a degraded version of:

‘Misery’ is the best movie I’ve ever seen, since I was a small boy.*

Across six tests, we introduced:

Spelling errors
Word swaps
Verb-agreement mistakes
Mis-spelled movie title
Disordered pronouns
Finally, the models were asked to correct the sentence before analyzing sentiment.

Both models returned probabilities for Positive, Neutral, and Negative sentiment.

Result: Gemini Flash 2.5 vs. GPT-5

Table — Sentiment Probabilities Across Noise Levels

Prompt #	Noise Level	Gemini Positive	GPT-5 Positive
1	Minor typos	98.5%	94%
2	More typos	96.8%	96%
3	Grammar errors	94.5%	95%
4	Heavy corruption	92.1%	94%
5	Similar heavy corruption	91.5%	95%
6	With correction	99.2%	97%

How Each Model Behaves Under Noise

Gemini Flash 2.5: Smooth degradation, but sensitive to cumulative noise. Gemini begins with very high confidence (98.5%) and shows a gradual, consistent drop as spelling and grammar degrade.

Weakness:

Confidence decreases more sharply under heavy noise than GPT-5.

GPT-5: More stable across noise, less shaken by corruption. GPT-5 starts slightly lower than Gemini (94%), but its probabilities stay tightly clustered between 94–96% across all noisy prompts.

Weakness:

Does not recover as strongly after sentence correction.
Lacks the dramatic confidence “snap-back” that Gemini shows.

Side-by-Side Performance Analysis

A. Noise Robustness
Winner: GPT-5
GPT-5 barely fluctuates. Its lowest confidence (94%) is still higher than Gemini’s lowest (91.5%).

B. Sensitivity to spelling and grammar errors
Winner: GPT-5
Gemini loses confidence faster as errors accumulate.

C. Correction + Re-classification Accuracy
Winner: Gemini Flash 2.5
After sentence correction:

Gemini jumps to 99.2%
GPT-5 rises to 97%
Gemini’s language-correction pipeline gives it an advantage.

D. Overall Confidence Trend
Gemini Trend:
98.5 → 96.8 → 94.5 → 92.1 → 91.5 → 99.2
GPT-5 Trend:
94 → 96 → 95 → 94 → 95 → 97

Comparison Table
Image Link

P.S. Gemini used default thinking mode, while no thinking mode was used in GPT-5 and their latency was identical.