Writprint · Trials · Edition 001 · 2026

TrialsThree trials on file

How we tested the analyzer.

Three trials are reported here. The first replicates the canonical 1964 stylometric attribution of the disputed Federalist papers. The second is an adversarial test of whether the analyzer falsely attributes AI-generated text to a human candidate — the failure mode an academic-integrity office most fears. The third tests closed-set discrimination across five distinct candidates, closer to the realistic caseload than the two-candidate trials above.

Trial 001 — Federalist →Trial 002 — AI vs. human →Trial 003 — Five candidates →

Trial No. 001 · Federalist

We replicated the canonical
stylometric benchmark.

In 1964, Mosteller and Wallace used statistical analysis of function words to attribute the historically disputed Federalist papers to James Madison. We ran our own analyzer on the same problem. It agreed with their conclusion in thirty seconds, with verifiable stylistic evidence.

MethodClosed-set, two candidates

Two candidate samples were supplied to the analyzer: Federalist No. 1 (Hamilton) and Federalist No. 10 (Madison). Four target documents were attributed in sequence: two known-authorship controls (No. 6 and No. 14), and two of the historically disputed papers (No. 49 and No. 51). The analyzer received only the bare text of the candidates and target — not which papers it was looking at, nor what the historical consensus was.

Results

F-006

Concerning Dangers from Dissensions Between the States

Federalist No. 6

Sanity check. Known Hamilton paper held out as target.

Expected: Hamilton · Predicted: Hamilton · p = 0.88 · medium confidence

✓

F-014

Objections from Extent of Territory Answered

Federalist No. 14

Sanity check. Known Madison paper held out as target.

Expected: Madison · Predicted: Madison · p = 0.82 · medium confidence

✓

F-049

Guarding Against Encroachments of Any One Department

Federalist No. 49

Historically disputed cluster (Nos. 49–58, 62–63). Mosteller–Wallace consensus: Madison.

Expected: Madison · Predicted: Madison · p = 0.82 · medium confidence

✓

F-051

The Proper Checks and Balances Between Departments

Federalist No. 51

Historically disputed. Contains the famous line, “If men were angels, no government would be necessary.”

Expected: Madison · Predicted: Madison · p = 0.88 · high confidence

✓

Representative Evidence — No. 51Five of the cited features

The analyzer does not just return a verdict. It returns the specific stylistic features it relied on — every one of which is independently verifiable in the text.

1.Parallel “two methods” construction in the target (“there are but two methods of providing against this evil”) directly mirrors Madison's Federalist No. 10: “there are two methods of curing the mischiefs of faction.”
2.Enumerated section headers “First.” and “Second.” as standalone paragraph introducers — a Madison structural habit visible throughout No. 10.
3.Use of “whilst” appears twice in the target and in the Madison sample, but is absent from the Hamilton sample.
4.Continuation of Federalist No. 10's extended-republic / multiplicity-of-factions argument, including the Rhode Island illustration.
5.Short aphoristic declarations (“If men were angels, no government would be necessary”) embedded in long analytical paragraphs match Madison's habit of crystallizing arguments into memorable axioms.

Trial No. 002 · Adversarial

The analyzer did not falsely attribute
AI-generated text to a human.

The most consequential failure mode for an academic-integrity tool is the false positive: confidently naming a human candidate as the author of text that was actually generated by a language model. We tested for it directly. The analyzer correctly hedged to “inconclusive” on a generic AI essay, and flagged stylistic mimicry as a concern in the case of an AI essay deliberately written to imitate one of the candidates.

MethodTwo human candidates, three targets

Two stylistically distinct human candidates were supplied as samples: Mark Twain (the opening of The Innocents Abroad) and Henry James (the opening of The Turn of the Screw). Three targets were then attributed: a held-out passage of real Twain as a control, an LLM-generated travel essay with no instruction to imitate any author, and an LLM-generated essay explicitly prompted to imitate Twain's voice. Both AI targets were generated by Claude Sonnet 4.6.

Results

A-001

Held-out passage of real Mark Twain

Control

Sanity check. A different excerpt of Innocents Abroad than the candidate sample.

Expected: Twain · Predicted: Twain · p = 0.93 · high confidence

✓

A-002

Generic AI-written travel essay, no imitation

Pure AI

An LLM-generated essay on a similar topic, without instruction to imitate any author. The analyzer correctly returned “inconclusive,” assigning 0.55 probability to neither candidate.

Expected: neither (AI) · Predicted: inconclusive · p = 0.55 · medium confidence

Hedged

A-003

AI essay specifically instructed to imitate Twain

AI imitation

An LLM-generated essay explicitly prompted to mimic Twain's voice. The analyzer attributed it to Twain, but its first caveat read: “The target document may be a deliberate pastiche.”

Expected: neither (AI) · Predicted: Twain · p = 0.82 · medium confidence

Flagged

Representative Reasoning — A-002 (Pure AI)From the analyzer's own report

On the pure AI essay, the analyzer not only declined to attribute the text to either candidate — it correctly identified the kind of writer it actually resembled.

“The target document is a contemporary travel essay written in a meditative, essayistic first-person voice characterized by short aphoristic observations, philosophical abstraction about perception and consciousness, and a quietly ironic but non-comic register. Neither Mark Twain nor Henry James is a plausible match… The evidence most strongly supports attribution to a writer outside this candidate set — a contemporary essayist influenced perhaps by writers such as Geoff Dyer, Teju Cole, or Zadie Smith in their essay mode.”

First Caveat — A-003 (AI imitation)Verbatim from the report

On the AI essay deliberately imitating Twain, the analyzer did attribute the text to Twain — but its very first caveat raised exactly the concern a forensic reviewer needs to see.

“The target document may be a deliberate pastiche of Twain's Innocents Abroad style; stylometric analysis cannot reliably distinguish skilled imitation from authentic authorship.”

Trial No. 003 · Multi-candidate

The analyzer held its discriminative power
across five candidates.

Real-world investigations rarely come with two suspects. An academic integrity office may have a class of five; an HR inquiry may have everyone in a Slack channel. We tested whether the analyzer could pick the right author out of a five-way closed set, on held-out passages from the same authors.

MethodFive candidates, three targets

Five candidate samples (~700 words each) were supplied: Mark Twain, Henry James, Ralph Waldo Emerson, Henry David Thoreau, and Edgar Allan Poe — five distinct 19th-century American voices, all from Project Gutenberg. Three target documents were then attributed, each a held-out passage from a different work by one of the five candidates. The analyzer received only the text — not the work names, the authors of the targets, or any other context.

Results

M-001

Innocents Abroad, Chapter V

Twain — held-out chapter

Different chapter than the candidate sample. Five candidates on the slate.

Expected: Twain · Predicted: Mark Twain · p = 0.93 · high confidence

✓

M-002

Different work than the Poe candidate sample

Poe — Tell-Tale Heart

Candidate sample was Fall of the House of Usher; target is Tell-Tale Heart.

Expected: Poe · Predicted: Edgar Allan Poe · p = 0.91 · high confidence

✓

M-003

Different essay than the Emerson candidate sample

Emerson — Friendship essay

Candidate sample was Self-Reliance; target is Friendship.

Expected: Emerson · Predicted: Ralph Waldo Emerson · p = 0.88 · high confidence

✓

Reproducibility

All three trials are reproducible end-to-end. Corpora are public-domain Project Gutenberg texts; AI targets in Trial 002 are generated by the harness at runtime and saved to disk. Each harness is a single self-contained Node script that loads the corpus, calls the analyzer, and writes its output to results.txt. A complete trial costs roughly $0.16 in API calls and finishes in about two minutes.

Download the trialswritprint-trials.tar.gz · 1.6 MB

Requires Node 20+ and an Anthropic API key. The bundle includes the harnesses, the public-domain corpora, and the prior results.txt from our run, so you can diff your run against ours.

curl -L https://writprint.com/writprint-trials.tar.gz | tar -xz
cd writprint-trials
npm install
cp .env.example .env       # then paste your ANTHROPIC_API_KEY

node federalist/run.mjs    # ~30s, ~$0.16
node ai-vs-human/run.mjs   # ~60s, ~$0.20
node multi-author/run.mjs  # ~45s, ~$0.16

SHA-25636820d522215926763d02d3176f0ab2a7d565aecf7c34b67bb6c2ce0862c5384Direct linkwritprint.com/writprint-trials.tar.gzLicenseMIT (harnesses) · public domain (corpora)

Bring us a case

The same analysis, on your investigation.

Two to ten writing samples and an anonymous target document. Ranked attribution, calibrated confidence, the specific evidence either way.

Try it free See a sample report

How we tested the analyzer.

We replicated the canonicalstylometric benchmark.

Federalist No. 6

Federalist No. 14

Federalist No. 49

Federalist No. 51

The analyzer did not falsely attributeAI-generated text to a human.

Control

Pure AI

AI imitation

The analyzer held its discriminative poweracross five candidates.

Twain — held-out chapter

Poe — Tell-Tale Heart

Emerson — Friendship essay

The same analysis, on your investigation.

We replicated the canonical
stylometric benchmark.

The analyzer did not falsely attribute
AI-generated text to a human.

The analyzer held its discriminative power
across five candidates.