The Margin · Methodology

How this is made

The Margin is compiled by AI and edited by a human. This page documents the pipeline, the sources, and the ranking rules, in full.

Pipeline

Four stages

The pipeline detects, ranks, then sends. A human verdict sits between rank and send. Stages 1, 2, and 4 are automated. Stage 3 is not.

The architecture is adapted from automated AI-engineering digests, which rank by crowd velocity (what is shared and starred fastest). That works when the audience is expert. The education audience is not, and the loudest sources are vendors, so ranking by popularity would surface marketing. The crowd-velocity step is therefore replaced by a teacher-impact score plus a human verdict.

01 Detect
Scan
AI
Reads the source list, collects candidate items from the window.
02 Rank
Score
AI
Scores each item on four axes. Outputs a shortlist of 8.
03 Verdict
Judge
Human
Sets each verdict, kills noise, reads the primary source.
04 Send
Ship
AI
Renders the verdicted issue to template, sends.
Stages 1, 2, 4: automated. Stage 3: human, and not automatable.
Stage 1 · Sources

19 sources, four tiers

The source list is fixed and curated, not a wide scrape. It is the first filter. Each source sits in one of four tiers by trust. Research is weighted highest. Lab and vendor announcements are treated as claims to be tested, never as signal on their own.

Tier 1 · ResearcharXiv, RAND, IES, AERA, EdArXiv
primary, peer-reviewed
6
Tier 2 · JournalismEdWeek, Hechinger, Chalkbeat, The 74
reaching classrooms
4
Tier 3 · Labs / vendorsOpenAI, Anthropic, Google, Khan
claims to test
4
Tier 4 · InstitutionsStanford SCALE, MIT, CDT, hand-picked
named, reviewed quarterly
5
Bar = starting weight before judgment. A vendor's claim about its own product starts near zero.
Stage 2 · Ranking

One question, four axes

Items are not ranked by popularity. Each is scored against one question:

Does this change what a teacher should do, believe, or stop believing?

That question is scored on four axes, each 0 to 10, then summed. Marketing and items whose only signal is that they are trending are demoted.

Decision-relevance. Would a teacher act differently?
Durability. Does it survive the next model release?
Source credibility. Research high, vendor claims low.
Hype-correction. Is a loud claim measurably wrong?
Output: a ranked shortlist of 8 with provisional verdicts attached as drafts.
Stage 3 · Verdict

The human step

The pipeline attaches a provisional verdict to each item. These are drafts. The human overrules them freely, kills noise, and reads the primary source before anything ships. Three verdicts are used:

Real

Works, or is true. Act on it.

Hype

The claim is bigger than the thing.

Watch

Not there yet. Direction is real.

Default when uncertain: Watch.

This step exists because the pipeline produces confident errors. In one issue, the draft stated that removing AI before a test was what protected student learning. The primary source (Bastani et al., PNAS 2025) showed the opposite: the protective factor was the tool's design, not its removal. The draft was rewritten. The human verdict is the control for this failure mode.

Evidence context

Why the filter is strict

Stanford's SCALE initiative reviewed the K-12 AI research base in 2026. Of 800-plus studies, roughly 20 establish a causal effect with rigorous methods. None were conducted in a U.S. K-12 classroom.

800+
studies reviewed
~20
causal · rigorous
0
in U.S. K-12
Source: Stanford SCALE, 2026 review of the K-12 AI evidence base.
Cadence

Who runs what, when

StageOperatorFrequency
DetectAIContinuous
RankAIWeekly
Verdict & fact-checkHumanWeekly
Assemble & sendAIWeekly
One human touch per issue. The rest is automated.
Disclosure rule

Conflict of interest

The editor builds My Planning Partner, an AI lesson-planning tool. Any issue touching lesson-planning tools, that category, carries an explicit disclosure line. The disclosure is fixed policy and is never removed for length or tone.

The Margin · Compiled by AI, edited by a human
My Planning Partner · @myplanningpartner