← All articles
5 min readby Fernando Rueda Oliva

Why Google Fact Check Tools alone isn't enough for short-form video

Google's Fact Check Tools API is excellent — and it covers only a small fraction of the claims that actually show up in Instagram reels and TikToks. Here's the data we see from the verifAI pipeline and why we treat it as a starting point, not the answer.

Google's Fact Check Tools API is one of the best resources a fact-checking product can plug into. It aggregates ClaimReview markup from Snopes, AP, AFP, PolitiFact, Maldita.es, EFE Verifica, Chequeado, and a long tail of regional fact-checkers — all with original article URLs and a structured rating. When verifAI gets a hit there, we surface it verbatim, label the verdict "Authoritative", and link straight to the original article. Whenever Google has done the work, we shouldn't be re-doing it.

But if you build a product on top of it, you quickly run into the gap between what the API knows and what people actually watch. This post is about that gap, what causes it, and how we close it.

The match rate is low on social video — by design

Across the reels and TikToks that go through our pipeline, the Google Fact Check Tools API returns a direct match for fewer than 1 in 10 claims. That's not a knock on Google. It's structural:

  • Fact-checkers cover finished stories, not raw claims. A typical ClaimReview entry refers to a "viral claim", which is itself a thing that a journalist already noticed and wrote up. The reel you just paused on probably hasn't entered that pipeline yet.
  • The unit of fact-checking is the claim, but the unit of publishing is the article. A short reel might contain four claims; the fact-checker's article that addresses them covers one, summarised in their own words. The API match key has to align well enough for a lookup to fire.
  • Time lag is real. A misleading reel usually peaks in shares within 12-48 hours. A serious fact-check article often takes 1-5 days to publish. By the time the ClaimReview entry is indexed, the wave is already breaking somewhere else.
  • Language coverage is uneven. English and Spanish are well-served. Catalan, Basque, Portuguese, and the long tail of European and global languages aren't.

None of this is a flaw to fix. Fact-checkers should be careful, and careful takes time. But it does mean that a product whose only signal is "did Google Fact Check find this?" will say "no idea" to 90% of inputs.

What we do when there's no match

When the Fact Check Tools API comes back empty (which is most of the time), the pipeline falls back to a web search via Tavily. Tavily returns a small, ranked set of pages with title, snippet, and URL. We hand those to GPT-OSS-120B with a prompt that:

  1. Tells it the original claim, verbatim, in the language of the reel
  2. Hands it the search results
  3. Forces a structured response: one of true | false | partially_true | misleading | unverified, plus a short explanation, plus the URLs it actually used
  4. Explicitly says that if the sources are not enough to decide, it must answer unverified

This is the half of the pipeline that does the heavy lifting on viral content. It's also the half where the model can be wrong, so the verdict card always shows the sources alongside the rating — the reader is the final judge.

When the two paths disagree

Sometimes the Fact Check Tools API returns a stale article that says "false" while the live web search clearly indicates the story has moved on. Two real examples from this year:

  • A claim about a public official's resignation that Snopes had labelled "false" three months ago, but which had actually happened in the meantime. The fresh search agreed with the claim; the ClaimReview did not.
  • A health claim about a supplement, "partially true" in a 2022 Maldita.es article, that had since been retracted by the cited study. The article was still in the API.

We prefer the authoritative source by default — that's the policy — but we surface both in the verdict card with timestamps so the reader can see the disagreement. The right answer in those cases is rarely "trust the AI"; it's "look at the dates".

What a good fallback search looks like

Not every search API is created equal for this job. We landed on Tavily for three reasons:

  • Per-claim search budgets. Some APIs charge per query; the cost of running a small handful of queries per claim adds up across thousands of fact-checks a day.
  • Snippet quality. A search API that returns rich snippets lets the synthesis model decide without dereferencing every URL. The cheaper option (only URLs) forces a fetch step that doubles latency and triples token costs.
  • Recency control. When checking a claim about a current event, we want to bias the search toward the last week or two. APIs that expose a recency parameter let us do this cleanly.

For self-hosters we also wire up SearXNG as an alternative, behind a config flag. That route is cheaper and privacy-preserving but the snippet quality is uneven, so the verdict accuracy drops noticeably.

Why we still call out Google's hits when we get them

Even though Google Fact Check Tools answers only a slice of our queries, the slice it answers is the most important slice — the high-stakes, high-visibility claims that real journalists have spent real time on. The fact that a claim is in there at all means a credentialed human looked at it, gathered evidence, and published a finding under their byline.

So when we get a hit, the verdict card gets the "Authoritative" badge and the verdict description quotes the fact-checker's own rating. The user clicks straight through to the original. We try to add as little of our own voice as possible in that path — we're forwarding, not paraphrasing.

Takeaways for anyone building on top of Fact Check Tools

If you're shipping something similar, three things that took us longer than they should have:

  1. Don't treat an empty response as the answer. The API is high precision and low recall on social video. "No match" is the start of the work, not the end.
  2. Always show sources. A verdict without links is just an opinion. Even when the model gets it right, you want the reader to verify, because next time it will get something wrong.
  3. Expose the "unverified" verdict and let yourself use it. If you're afraid of looking unconfident, you'll push your model to invent answers. That's how AI fact-checkers earn the reputation they have.

If you've built something in this space and have a different read on any of the above, I'd be happy to compare notes — fernandoruedaoliva@gmail.com.