Writprint
TrialsThree trials on file

How we tested the analyzer.

Three trials are reported here. The first replicates the canonical 1964 stylometric attribution of the disputed Federalist papers. The second is an adversarial test of whether the analyzer falsely attributes AI-generated text to a human candidate — the failure mode an academic-integrity office most fears. The third tests closed-set discrimination across five distinct candidates, closer to the realistic caseload than the two-candidate trials above.

Trial No. 001 · Federalist

We replicated the canonical
stylometric benchmark.

In 1964, Mosteller and Wallace used statistical analysis of function words to attribute the historically disputed Federalist papers to James Madison. We ran our own analyzer on the same problem. It agreed with their conclusion in thirty seconds, with verifiable stylistic evidence.

MethodClosed-set, two candidates

Two candidate samples were supplied to the analyzer: Federalist No. 1 (Hamilton) and Federalist No. 10 (Madison). Four target documents were attributed in sequence: two known-authorship controls (No. 6 and No. 14), and two of the historically disputed papers (No. 49 and No. 51). The analyzer received only the bare text of the candidates and target — not which papers it was looking at, nor what the historical consensus was.

Results
F-006
Concerning Dangers from Dissensions Between the States

Federalist No. 6

Sanity check. Known Hamilton paper held out as target.

Expected: Hamilton · Predicted: Hamilton · p = 0.88 · medium confidence
F-014
Objections from Extent of Territory Answered

Federalist No. 14

Sanity check. Known Madison paper held out as target.

Expected: Madison · Predicted: Madison · p = 0.82 · medium confidence
F-049
Guarding Against Encroachments of Any One Department

Federalist No. 49

Historically disputed cluster (Nos. 49–58, 62–63). Mosteller–Wallace consensus: Madison.

Expected: Madison · Predicted: Madison · p = 0.82 · medium confidence
F-051
The Proper Checks and Balances Between Departments

Federalist No. 51

Historically disputed. Contains the famous line, “If men were angels, no government would be necessary.”

Expected: Madison · Predicted: Madison · p = 0.88 · high confidence
Representative Evidence — No. 51Five of the cited features

The analyzer does not just return a verdict. It returns the specific stylistic features it relied on — every one of which is independently verifiable in the text.

Trial No. 002 · Adversarial

The analyzer did not falsely attribute
AI-generated text to a human.

The most consequential failure mode for an academic-integrity tool is the false positive: confidently naming a human candidate as the author of text that was actually generated by a language model. We tested for it directly. The analyzer correctly hedged to “inconclusive” on a generic AI essay, and flagged stylistic mimicry as a concern in the case of an AI essay deliberately written to imitate one of the candidates.

MethodTwo human candidates, three targets

Two stylistically distinct human candidates were supplied as samples: Mark Twain (the opening of The Innocents Abroad) and Henry James (the opening of The Turn of the Screw). Three targets were then attributed: a held-out passage of real Twain as a control, an LLM-generated travel essay with no instruction to imitate any author, and an LLM-generated essay explicitly prompted to imitate Twain's voice. Both AI targets were generated by Claude Sonnet 4.6.

Results
A-001
Held-out passage of real Mark Twain

Control

Sanity check. A different excerpt of Innocents Abroad than the candidate sample.

Expected: Twain · Predicted: Twain · p = 0.93 · high confidence
A-002
Generic AI-written travel essay, no imitation

Pure AI

An LLM-generated essay on a similar topic, without instruction to imitate any author. The analyzer correctly returned “inconclusive,” assigning 0.55 probability to neither candidate.

Expected: neither (AI) · Predicted: inconclusive · p = 0.55 · medium confidence
Hedged
A-003
AI essay specifically instructed to imitate Twain

AI imitation

An LLM-generated essay explicitly prompted to mimic Twain's voice. The analyzer attributed it to Twain, but its first caveat read: “The target document may be a deliberate pastiche.”

Expected: neither (AI) · Predicted: Twain · p = 0.82 · medium confidence
Flagged
Representative Reasoning — A-002 (Pure AI)From the analyzer's own report

On the pure AI essay, the analyzer not only declined to attribute the text to either candidate — it correctly identified the kind of writer it actually resembled.

“The target document is a contemporary travel essay written in a meditative, essayistic first-person voice characterized by short aphoristic observations, philosophical abstraction about perception and consciousness, and a quietly ironic but non-comic register. Neither Mark Twain nor Henry James is a plausible match… The evidence most strongly supports attribution to a writer outside this candidate set — a contemporary essayist influenced perhaps by writers such as Geoff Dyer, Teju Cole, or Zadie Smith in their essay mode.”
First Caveat — A-003 (AI imitation)Verbatim from the report

On the AI essay deliberately imitating Twain, the analyzer did attribute the text to Twain — but its very first caveat raised exactly the concern a forensic reviewer needs to see.

“The target document may be a deliberate pastiche of Twain's Innocents Abroad style; stylometric analysis cannot reliably distinguish skilled imitation from authentic authorship.”
Trial No. 003 · Multi-candidate

The analyzer held its discriminative power
across five candidates.

Real-world investigations rarely come with two suspects. An academic integrity office may have a class of five; an HR inquiry may have everyone in a Slack channel. We tested whether the analyzer could pick the right author out of a five-way closed set, on held-out passages from the same authors.

MethodFive candidates, three targets

Five candidate samples (~700 words each) were supplied: Mark Twain, Henry James, Ralph Waldo Emerson, Henry David Thoreau, and Edgar Allan Poe — five distinct 19th-century American voices, all from Project Gutenberg. Three target documents were then attributed, each a held-out passage from a different work by one of the five candidates. The analyzer received only the text — not the work names, the authors of the targets, or any other context.

Results
M-001
Innocents Abroad, Chapter V

Twain — held-out chapter

Different chapter than the candidate sample. Five candidates on the slate.

Expected: Twain · Predicted: Mark Twain · p = 0.93 · high confidence
M-002
Different work than the Poe candidate sample

Poe — Tell-Tale Heart

Candidate sample was Fall of the House of Usher; target is Tell-Tale Heart.

Expected: Poe · Predicted: Edgar Allan Poe · p = 0.91 · high confidence
M-003
Different essay than the Emerson candidate sample

Emerson — Friendship essay

Candidate sample was Self-Reliance; target is Friendship.

Expected: Emerson · Predicted: Ralph Waldo Emerson · p = 0.88 · high confidence
Reproducibility

All three trials are reproducible end-to-end. Corpora are public-domain Project Gutenberg texts; AI targets in Trial 002 are generated by the harness at runtime and saved to disk. Each harness is a single self-contained Node script that loads the corpus, calls the analyzer, and writes its output to results.txt. A complete trial costs roughly $0.16 in API calls and finishes in about two minutes.

Download the trialswritprint-trials.tar.gz · 1.6 MB

Requires Node 20+ and an Anthropic API key. The bundle includes the harnesses, the public-domain corpora, and the prior results.txt from our run, so you can diff your run against ours.

curl -L https://writprint.com/writprint-trials.tar.gz | tar -xz
cd writprint-trials
npm install
cp .env.example .env       # then paste your ANTHROPIC_API_KEY

node federalist/run.mjs    # ~30s, ~$0.16
node ai-vs-human/run.mjs   # ~60s, ~$0.20
node multi-author/run.mjs  # ~45s, ~$0.16
SHA-25636820d522215926763d02d3176f0ab2a7d565aecf7c34b67bb6c2ce0862c5384Direct linkwritprint.com/writprint-trials.tar.gzLicenseMIT (harnesses) · public domain (corpora)
Bring us a case

The same analysis, on your investigation.

Two to ten writing samples and an anonymous target document. Ranked attribution, calibrated confidence, the specific evidence either way.