Probeo
probeo

Content Quality: Readability Measurement and Analysis

Probeo measures content readability using established formulas and aggregate scoring. This page describes the scope of automated readability analysis, its limitations, and what the resulting scores do and do not tell you.

Last updated 02/08/2026

Probeo's content quality analysis measures readability using established formulas that quantify structural text properties: sentence length, syllable density, word familiarity, and character count. These formulas produce scores that correlate with reading difficulty under controlled conditions. They do not measure whether content is clear, accurate, well-organized, or appropriate for its audience. Readability scores are observability data, not quality verdicts.

What Probeo measures

Probeo applies 7 readability formulas to the visible text content of each page: Flesch-Kincaid Grade Level, Flesch Reading Ease, Gunning Fog Index, SMOG Grade, Automated Readability Index, Coleman-Liau Index, and Dale-Chall Readability Score. Each formula quantifies a different structural dimension of text. Syllable density, sentence length, word length, and vocabulary familiarity are the primary inputs. The outputs are numeric scores or grade-level estimates. Probeo also calculates 4 aggregate scores that combine individual formula outputs into composite metrics, reducing the bias of relying on any single formula.

Measurement, not judgment

Readability formulas were designed for specific contexts: military training materials, health literacy screening, educational publishing. Applying them to web content stretches each formula beyond its original design. A product page, a legal disclosure, an API reference, and a blog post will produce different scores because they serve different purposes and audiences. None of those scores indicate a problem on their own. Probeo surfaces readability data as measurement. It does not prescribe target scores, flag pages as failing, or recommend simplification. Whether a score warrants action depends on the content type, the intended audience, and the editorial intent behind the text.

What readability formulas cannot assess

Readability formulas operate on surface-level text statistics. They cannot evaluate whether an explanation is logically structured, whether a sentence conveys its intended meaning, whether technical terminology is appropriate for the audience, or whether the content achieves its communicative purpose. A page of grammatically simple nonsense will score as highly readable. A well-crafted technical explanation will score as difficult. Formulas also cannot account for formatting, visual hierarchy, illustrations, or interactive elements that affect how users actually process content. These are real dimensions of content quality that exist outside the measurement surface of any text-statistics formula.

Content types have legitimately different readability levels

Medical documentation, legal terms, API references, and consumer-facing product descriptions operate at different reading levels by design. A legal disclosure written at a sixth-grade reading level would likely be imprecise. An API reference simplified to avoid polysyllabic words would lose technical accuracy. Readability scores are most informative when compared within a content type, not across content types. A help article that scores three grade levels higher than similar help articles on the same site is worth investigating. That same score compared against a terms-of-service page is meaningless.

How teams use readability data

Readability scores serve three operational purposes. First, drift detection: tracking scores over time on the same page or content section reveals when complexity has shifted, often gradually and without anyone noticing. Second, consistency monitoring: comparing scores across pages of the same type identifies outliers that may warrant editorial review. Third, comparative benchmarking: aggregate scores across a content section provide a baseline that new content can be measured against. In all three cases, the scores identify where to look. They do not determine what to do.

Scope

Content quality analysis applies at the page level. Scores are calculated from the visible text content extracted during crawl. Navigation elements, footer text, and boilerplate are included in the extraction unless the page structure isolates them from the main content area. Readability scores are recalculated on each crawl, so they reflect the current state of the page content.

What becomes visible

  • Structural readability measurements across all crawled pages using 7 established formulas
  • Aggregate scores that synthesize individual formula outputs into composite metrics
  • Readability drift over time as content is edited, expanded, or restructured
  • Outlier pages within a content type that score significantly different from their peers
  • The boundary between what automated text analysis can measure and what requires editorial judgment

Common questions

Does a low readability score mean the content is bad?
No. A low readability score means the text has structural properties associated with higher reading difficulty: longer sentences, more syllables per word, less common vocabulary. Whether that is a problem depends on the content type and audience. Technical documentation for engineers has legitimately different readability characteristics than a consumer-facing help article.
Should we set target readability scores for our content?
Targets are less useful than baselines. Establishing the current readability range for each content type gives you a reference point for detecting drift and identifying outliers. Prescribing a universal target across content types forces inappropriate simplification in some areas and unnecessary complexity in others.
Why does Probeo use multiple readability formulas instead of just one?
Each formula weights different text properties. Flesch-Kincaid emphasizes syllable density and sentence length. Dale-Chall measures vocabulary familiarity against a fixed word list. Gunning Fog is sensitive to polysyllabic words. Using multiple formulas and aggregate scores reduces dependence on any single formula's assumptions and produces a more stable signal.
Can readability scores detect content that is unclear or misleading?
No. Readability formulas operate on statistical text properties. A sentence can be grammatically simple, use common words, and score as highly readable while being vague, ambiguous, or factually wrong. Clarity, accuracy, and intent are outside the measurement surface of any formula-based approach.
How do readability scores relate to SEO?
Search engines do not use readability formula scores as a direct ranking signal. Content that is genuinely difficult to read may correlate with engagement metrics that search engines observe, but that relationship is indirect and confounded by content type, audience, and intent. Readability scores are a content quality signal for editorial teams, not an SEO metric.
What is the difference between individual formula scores and aggregate scores?
Individual formula scores reveal which specific text feature is driving difficulty: syllable density, sentence length, vocabulary familiarity. Aggregate scores combine multiple formula outputs into a composite that reduces noise from any single formula. Use individual scores for diagnosis. Use aggregate scores for monitoring and comparison.