Method sheet — The Consensus

What this is

Every outlet named here really ran these words. Counting that echo is the point: apparent consensus is not independent confirmation. "47 outlets report X" often means: one source, 47 times. Counter-measurement counts what passes as independent but is copied.

1. Sources & licences

News via the GDELT DOC 2.0 API (GDELT — open / frei nutzbar). Eight broad beats (politics, economy, technology, health, science, business, sports, weather), English-language. The cited "independent" sources are the domains that ran the sentence verbatim — listed by name on the work page.

https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/

2. Cadence

Daily. The machine selects on its own: the phrase with the widest spread across distinct source domains is the "headline of the day". Canonical artefact: versioned JSON in src/data/consensus/ — git is the archive.

3. Processing

Pool articles (dedupe by URL) → count verbatim 6-gram title phrases across distinct domains → the most replicated is the headline. Echo index = share of titles belonging to a ≥3-domain echo. Symbolic provenance: the earliest timestamp marks the source candidate and the cascade.

→ pipelines/consensus

AI/ML — staged, checkable

The lab experiments with data AND AI. Implemented: v1 verbatim baseline; v2 TF-IDF/cosine catches paraphrased coordination (reworded wire copy that verbatim misses); v3 a symbolic, rule-based classifier separates chain syndication from scattered placement from the graph structure (TLD homogeneity, time window) — auditable, no black-box model. Optional/future: deep embeddings and an LLM classifier verified against the graph (prompt disclosed). Condition throughout: every AI step transparent, output verified or marked as an estimate.

4. Limits of the method

GDELT coverage skews English/Western; this measures a section, not the world.
GDELT timestamps have ~15-minute/hourly resolution — "first seen" is the earliest GDELT window, not a precise first-publication attribution.
v1 reads titles, not full text; paraphrased coordination escapes it (arrives in v2).
Legitimate wire/chain syndication ≠ manipulation — but it creates the illusion of independent confirmation; the instrument claims no intent.
Beats can drop on API rate limits (disclosed per day in the stats block).

5. Compute footprint

Eight lightweight HTTP fetches per day, no API key, no LLM in v1. The site itself is static.

6. Change log

v1 (2026-06) — first edition: verbatim synchronicity + echo index + symbolic provenance.

→ To the experiment