Method sheet — The Tell

What this is

An AI tool measuring AI’s footprint in science. No detector, no black box — just counts of documented LLM-tell words in PubMed. The collective jump after ChatGPT is the fingerprint; no single paper is accused. The self-reference is part of the point: the instrument discloses that it is itself built with AI.

1. Sources & licences

PubMed via NCBI E-utilities (Public domain (NLM/NCBI); counts only). Only hit counts per query/year — no full texts, no persons.

https://www.ncbi.nlm.nih.gov/books/NBK25501/

2. Cadence

On every build. PubMed indexes with a lag — the current year is excluded, recent years are incomplete (which is why the peak sits in 2024, not later). Canonical artefact: versioned JSON in src/data/tell/ — git is the archive.

3. Processing

Deterministic: per marker and year, hits in title/abstract (esearch), normalised per 100,000 abstracts. The index is the sum of marker shares. Baseline = pre-ChatGPT mean; peak = highest-index year; fold = peak / baseline.

→ pipelines/tell

AI/ML — checkable

v1 only counts (no model). Planned v2: a transparent LLM classifier estimating a synthetic-likelihood per abstract, verified against the marker counts — prompt disclosed, uncertainty as part of the measurement. Condition: never an unaccountable oracle.

4. Limits of the method

The markers are a PROXY — these words have legitimate uses; only the collective jump is meaningful, not any single hit.
Peak 2024 because PubMed still under-indexes 2025+ — and because the "delve" tell is now known and partly avoided.
Title/abstract only, English-skewed; counts, not semantic full-text analysis.
Correlation with ChatGPT, not causal proof per paper; it measures word use, not "fraud".

5. Compute footprint

About 64 keyless HTTP count requests per build, no LLM. The site is static.

6. Change log

v1 (2026-06) — first edition: eight markers, machine-speak index, fold since ChatGPT.

→ To the experiment