AI Detection Heuristics Breakdown
Pruneify scores text on a 0–100 AI-likeness scale using transparent heuristics. No black-box model, no server-side classification — every signal is visible and auditable. This guide breaks down each heuristic: what it measures, how it is scored, its weight in the overall score, and what the numbers mean. If you want to understand why a piece of text scores high or low, this is the reference.
Key Takeaways
- Pruneify uses four signal categories: linguistic (phrase density), structural (burstiness + list density), statistical (vocabulary richness), and tone (first-person avoidance).
- Each signal is normalized to 0–1, multiplied by its weight, then summed and scaled to a 0–100 score.
- Phrase density (weight 0.35) and sentence burstiness (weight 0.30) are the two strongest contributors.
- Vocabulary richness uses a type-token ratio with a Zipf-style correction so longer texts are not penalized unfairly.
- The detection panel shows every signal, its raw value, threshold, and contribution — nothing is hidden.
How the 0–100 AI-Likeness Score Is Calculated
The scoring pipeline runs four functions — linguistic, structural, statistical, and tone — each returning one or more signals. Every signal has a normalized value (0–1), a threshold, and a weight. The final score is the sum of each signal's value multiplied by its weight, scaled to 0–100.
The formula is straightforward: for each signal, contribution = value × weight. Sum all contributions, divide by the maximum possible weighted sum, and multiply by 100. The result is a percentage — 0 means no AI-like patterns detected, 100 means every signal is at maximum.
Thresholds determine when a signal "activates." A signal below its threshold contributes minimally. Above the threshold, its contribution increases linearly. This prevents low-noise signals from inflating the score.
Takeaway: The score is deterministic and reproducible. The same text always produces the same score. No randomness, no model inference, no server calls.
Signal Weights and Thresholds Reference
| Signal | Category | Weight | Threshold | What it measures |
|---|---|---|---|---|
| LLM phrase density | Linguistic | 0.35 | 0.15 | Total templated phrase count per 1,000 chars |
| Individual phrase patterns | Linguistic | 0.25 each | 0.20 | Per-pattern occurrence density (6 patterns) |
| Sentence burstiness | Structural | 0.30 | 0.30 | Coefficient of variation of sentence lengths (inverted) |
| List/heading density | Structural | 0.20 | 0.40 | Ratio of list-like lines to total sentences |
| Vocabulary richness | Statistical | 0.30 | 0.50 | Type-token ratio with Zipf correction (inverted) |
| First-person avoidance | Tone | 0.20 | 0.40 | Ratio of first-person pronouns to total words (inverted) |
Higher weight means stronger influence on the final score. Phrase density and burstiness together account for the largest share — this aligns with research showing that phrase-level and structural features outperform perplexity for GPT detection.
Linguistic Signals: Templated Phrase Detection
LLMs produce recurring phrases at rates humans do not. Pruneify scans for six phrase pattern categories using regular expressions:
- "As an AI" — Direct self-reference. Rare in human writing, common in unedited LLM output.
- Refusal patterns — "I don't/can't provide/assist/help." Guardrail language that leaks into longer responses.
- Hedging openers — "Certainly, I can," "Sure, here is," "Absolutely, let me." LLMs use these to sound agreeable.
- Meta notes — "It's important to note," "It's worth noting," "Note that." Filler that adds no information.
- Buzz phrases — "Comprehensive solution," "robust solution," "scalable solution." Vague descriptors LLMs overuse.
- Formal verbs — "Delve," "leverage," "utilize," "facilitate." Humans rarely use these in informal or semi-formal writing.
Each pattern is counted and normalized per 1,000 characters. The individual pattern score is capped at 1.0 (calculated as occurrences × per1k / 5). The overall phrase density signal aggregates all pattern hits (total × per1k / 10, capped at 1.0) and carries the highest weight in the system: 0.35.
Why Phrase Density Matters Most
Templated phrases are the most reliable fingerprint because they persist even in well-structured AI text. An LLM can produce grammatically perfect, topically relevant content — but the stock phrases give it away. This is why phrase density has the highest weight. Cutting these phrases is the single most effective way to humanize AI text.
Phrase Highlights in the UI
Pruneify highlights matched phrases directly in the text. Each highlight is labeled with the pattern category (e.g., "hedging opening," "formal verb"). This lets you see exactly which phrases triggered detection and decide whether to remove or rewrite them.
Structural Signals: Burstiness and List Density
Sentence Burstiness
Burstiness captures how much sentence length varies within a text. The calculation: split the text into sentences, measure each sentence's character length, compute the mean and standard deviation, then divide standard deviation by mean to get the coefficient of variation (CV).
Human writing typically produces a CV between 0.4 and 0.8 — short punchy sentences mixed with long complex ones. LLM output tends to cluster around a CV of 0.2–0.3 because the model generates sentences of similar length. The signal value is inverted (1 − burstiness), so low variation produces a high AI-likeness contribution.
With a weight of 0.30 and a threshold of 0.30, burstiness is the second most influential signal. A text with perfectly uniform sentences will push this signal to its maximum.
List and Heading Density
LLMs overuse bulleted lists and numbered steps. Pruneify counts lines that start with a digit, dash, asterisk, or bullet character and divides by the total number of sentences. The result is the list-like ratio.
A high list ratio is not automatically bad — instructional content uses lists legitimately. The threshold is set at 0.40, meaning the signal only activates when more than 40% of content is list-like. The weight is 0.20, lower than burstiness, because lists are context-dependent.
Statistical Signals: Vocabulary Richness
Vocabulary richness is measured by the type-token ratio (TTR): unique words divided by total words. A TTR of 0.80 means 80% of words are unique — high richness. A TTR of 0.40 means heavy repetition — low richness.
Raw TTR has a well-known problem: it naturally declines as text gets longer, because any writer eventually reuses common words. To compensate, Pruneify applies a Zipf-style correction. The adjusted formula is: richness = (TTR × 0.7) + (length_penalty × 0.3), where length_penalty = min(1, word_count / 500). This ensures a 2,000-word article is not penalized simply for being long.
The signal value is inverted (1 − richness): low vocabulary richness produces a high AI-likeness contribution. Weight is 0.30, threshold is 0.50. LLM output often falls in the 0.35–0.50 TTR range; human writing on the same topic typically scores 0.55–0.75.
When Low Richness Is Not AI
Technical documentation, legal writing, and formulaic content (product specs, compliance text) naturally have lower TTR. This is why vocabulary richness alone is not conclusive. It is most meaningful when combined with high phrase density and low burstiness. The threshold at 0.50 prevents low-richness text from triggering unless other signals also fire.
Tone Signals: First-Person Pronoun Avoidance
LLMs default to impersonal, neutral phrasing. They use "it," "this," "that," "these," and "there" far more than "I," "my," "we," "our," or "me." Pruneify counts first-person pronouns, divides by total word count, and applies an inversion: avoidance = 1 − min(1, first_person_ratio × 10).
A first-person density of 2% (e.g., 10 first-person pronouns in 500 words) produces an avoidance value of 0.80 — still moderately high. To push avoidance below 0.50 (the comfort zone), you need roughly 5% or more first-person density, which is typical of personal essays or opinion pieces.
Weight is 0.20, threshold is 0.40. This makes tone the lowest-weighted signal. The reason: formal human writing (academic papers, news articles, legal text) also avoids first-person. Tone is confirmatory — it adds confidence when other signals are already elevated, but it should not drive the score on its own.
When Tone Misleads
Academic writing, journalism, and business communications often avoid first-person by convention. A doctoral thesis will score high on tone avoidance regardless of authorship. Always interpret tone in context. The detection panel labels this signal clearly so you can discount it when the genre explains the pattern.
How Signals Interact: Why the Combination Matters
No single signal is sufficient for reliable detection. A formal academic paper might have low first-person usage (high tone signal) and lower vocabulary richness (high statistical signal) but zero templated phrases and high burstiness. The weighted sum keeps the overall score low because the two strongest signals (phrase density and burstiness) are not contributing.
Conversely, unedited ChatGPT output typically fires on all four categories: stock phrases push phrase density above 0.15, uniform sentence lengths drop burstiness below 0.30, repetitive vocabulary lowers richness, and impersonal tone elevates avoidance. When all signals converge, the score climbs rapidly.
This multi-signal approach is what makes heuristic detection more robust than any single-metric test. It also explains why lightly edited AI text still scores high — editing out phrases helps, but if burstiness and vocabulary remain unchanged, the score stays elevated.
Takeaway: To meaningfully lower an AI-likeness score, address multiple signal categories. See the guide to making AI text undetectable for the step-by-step workflow.
How to Read the Detection Panel
When you run detection in Pruneify, the panel displays each signal with its label, raw value, threshold line, and weighted contribution. Here is how to interpret each element:
- Signal label: The name (e.g., "LLM phrase density," "Sentence burstiness").
- Value (0–1): The normalized measurement. Higher means more AI-like for that specific signal.
- Threshold: The activation point. Below threshold, the signal contributes minimally.
- Rationale: A plain-language explanation (e.g., "Coefficient of variation 22.5%; uniform length suggests synthetic text").
- Contribution: The actual impact on the final score (value × weight).
Focus on the signals with the highest contributions. Those are the patterns to address if you want to lower the score. If phrase density is the top contributor, cut templated phrases. If burstiness is high, vary sentence length. The panel turns detection from a black-box judgment into actionable feedback.
Takeaway: The detection panel is your editing roadmap. Use it to prioritize which patterns to fix first. Learn how to detect AI-generated text for the full 5-step workflow.
Limitations of Heuristic Detection
Heuristic detection is powerful but not perfect. Understanding its limitations helps you interpret results accurately:
Short Text
Below 100 words, statistical signals (burstiness, TTR) become noisy. There are not enough sentences to compute meaningful variation or enough words for a reliable type-token ratio. Scores on very short text should be treated as rough estimates.
Non-English Text
Phrase patterns are calibrated for English. "Certainly, I can" and "It's important to note" are English-specific. Pronoun ratios, vocabulary metrics, and sentence splitting also assume English grammar. Detection on other languages is experimental.
Heavily Edited AI Text
If a human rewrites sentences, adds voice, varies length, and removes stock phrases, the statistical fingerprint diminishes. At some point, the text is genuinely human-written — and the score reflects that. This is a feature, not a bug: the goal is to measure AI-like patterns, not to make unfalsifiable accusations.
Formal Human Writing
Academic papers, legal documents, and news articles can trigger tone and vocabulary signals because they share stylistic features with AI output (impersonal, formal, repetitive). Check the breakdown: if only tone and vocabulary are elevated while phrase density is zero, the text is likely human but formal.
Why Transparent Heuristics Beat Black-Box Detection
Black-box detectors give you a number. You do not know what drives it, you cannot verify it, and you cannot act on it except to blindly rewrite. Transparent heuristics give you a number and an explanation: which signals fired, by how much, and why.
Transparency serves three purposes. First, trust: you can audit the logic. Second, education: you learn what makes AI text detectable. Third, action: you know exactly what to edit. A detector that says "87% AI" without explanation is a coin flip with confidence. A detector that says "phrase density 0.72 (weight 0.35), burstiness 0.81 (weight 0.30)" is a diagnostic tool.
Pruneify shows every signal, threshold, and contribution. All processing happens in your browser. Your text never touches a server. For educators, businesses, and privacy-conscious users, this is not a nice-to-have — it is a requirement.
Takeaway: A detection score without an explanation is useless for improvement. Transparent heuristics turn detection into feedback. Read more about how Pruneify handles your data.
Frequently Asked Questions
What heuristics does Pruneify use to detect AI text?
Pruneify uses four categories of heuristics: linguistic signals (templated phrase density), structural signals (sentence burstiness and list density), statistical signals (vocabulary richness via type-token ratio with Zipf correction), and tone signals (first-person pronoun avoidance). Each signal is scored 0–1 and weighted to produce a 0–100 AI-likeness score.
How is the AI-likeness score calculated?
Each signal produces a normalized value between 0 and 1. The value is multiplied by the signal's weight (e.g., phrase density at 0.35, burstiness at 0.30, vocabulary at 0.30, tone at 0.20). The weighted contributions are summed and scaled to 0–100. The breakdown panel shows each signal's individual contribution.
What is sentence burstiness and why does it matter?
Sentence burstiness measures how much sentence length varies in a text. It is calculated as the coefficient of variation (standard deviation divided by mean) of sentence lengths. Human writing typically has high burstiness (0.4–0.8). AI output tends toward low burstiness (below 0.3), because LLMs produce sentences of similar length.
Can I see the exact formulas Pruneify uses?
Yes. Pruneify is transparent. The scoring logic is open: phrase density is normalized per 1,000 characters, burstiness uses coefficient of variation, vocabulary richness uses type-token ratio with a Zipf-style correction, and tone measures first-person pronoun density. The detection panel shows every signal and its rationale.
Every signal in Pruneify is documented, auditable, and transparent. Phrase density, sentence burstiness, vocabulary richness, and tone — each contributes to the 0–100 score with visible weights and thresholds. Use the detection panel to understand what drives your score, then target the strongest contributors. Try Pruneify to see the full heuristics breakdown on your own text — no uploads, no signup. For the complete detection workflow, see the guide to detecting AI-generated text.