LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms

Incident History

A log of AI service incidents, outages, and degraded performance events detected by TensorFeed monitoring.

Outages happen. They're embarrassing, costly, and completely predictable. Infrastructure fails. Load balancers misconfigure. Deployments break things. Databases run out of disk space. Not a single major AI provider is immune. By studying incident patterns, we can predict when failures are likely and design our systems to tolerate them.

This database captures every incident we've detected in the TensorFeed monitoring network: when it started, how long it lasted, severity (was it a full outage or partial degradation), and which provider was affected. The data reveals that outages cluster. Claude API might be flaky for a week, then stable for two months. OpenAI's API has experienced multiple major incidents, each lasting 30 to 90 minutes. Hugging Face and Replicate have historically lower reliability than the major commercial providers. Our monthly AI service outage report synthesizes this into actionable insights.

What should you learn from this? First, avoid single points of failure. Distribute your traffic across multiple providers if feasible. Claude and GPT-4 won't both go down at the same time often enough to matter, but their combination is more reliable than either alone. Second, implement exponential backoff and retry logic in your client code. Third, cache successful responses and degrade gracefully when APIs are down. Finally, monitor your own dependencies. The earlier you know an API is degraded, the earlier you can mitigate customer impact.

Incidents (30 days)

...

Avg. Resolution Time

...

Most Affected Service

...

Loading incidents...