PRISM™ Methodology

How the PLI is built

Universities write differently from banks. Italian institutions differ from Swedish ones. These are measurable differences, and they matter.

The Observatory uses PRISM™ to track how European institutions communicate. We start with historical data from 2015–2019, before AI writing tools were widely available, and build empirical baselines for how different types of organisations in different countries actually wrote. Once those norms are established, we measure drift: how far current language has moved from those patterns.

If Italian universities averaged 15 cultural markers per 1,000 tokens in 2017 and now average 8, that is a measurable shift towards generic language. PRISM™ quantifies this shift across three levels of analysis, with statistical validation at every step.

Document enters pipeline

Level 1: Atomic Signals

PerplexityBurstinessLexical DiversityReadabilityCultural MarkersIdiom DensityGeneric PhrasesJargon LoadCitation DensityHedging Language

Runs on every document

Level 2: Composite Detectors

TONECULTURALCOMPLIANCEPERFORMANCE

Combines multiple Level 1 signals

Level 3: Observatory Indices

TEMPORALWithin-org drift vs own baselineHOMOGENISATIONCross-org convergence across sectors

Headline findings

Level 1 runs on every document. Ten atomic tests, each measuring a specific, independently verifiable property of text. A perplexity reading tells you something. A burstiness reading tells you something else. Together, they form the foundation for everything above.

Level 2 combines those signals into four composite detectors. Where Level 1 asks "what does this text look like?", Level 2 asks "what is happening to this text?" Each detector focuses on a specific dimension: how the language sounds, how culturally grounded it remains, how transparent and well-evidenced it is, and how stable the overall linguistic profile has stayed.

Level 3 produces the Observatory's two headline indices. The TEMPORAL index tracks how much each organisation's language has changed from its own baseline. The HOMOGENISATION index tracks whether different sectors and countries are beginning to sound alike. Together, they map the structural impact of AI on institutional communication across Europe.

PRISM™ measures change. It does not assume change is bad. An organisation that has deliberately repositioned its voice, whether through rebranding, a new editorial strategy, or a shift in target audience, will register drift on the TEMPORAL index. What matters is what the HOMOGENISATION index shows alongside it. If the organisation's language has changed but remains distinctively its own, that is evolution: a voice moving forward on its own terms. If the language has changed and now resembles everyone else's, that is erosion: distinctiveness lost to a shared algorithmic template. The two indices read together tell you which has occurred.

Architecture Overview

The diagram below maps all ten L1 tests through the four L2 composite detectors and into the two L3 indices. Hover over any element for a description.

Figure 1. PRISM™ architecture: 10 atomic L1 tests → 4 L2 composite detectors → 2 L3 indices → PLI score. Hover over any element for a description. L4–L5 VOICE modules are commercial extensions not included in Observatory data.

Level 1: Atomic Tests 10 metrics

Each test targets a specific, independently measurable property of text. Together they create a multidimensional fingerprint of how an organisation writes.

How predictable are the word choices? Human writers make surprising decisions: unexpected metaphors, unusual juxtapositions, distinctive phrasings. AI systems favour high-probability continuations. This difference is detectable.

We calculate perplexity using trigram language models trained on pre-AI institutional text (2015–2019). Lower perplexity means more predictable text, a potential AI signature. The test works across all languages because it measures statistical patterns, not vocabulary.

Formulaic human writing (annual report boilerplate, regulatory disclosures) also shows low perplexity. One signal among many, never used in isolation.

Human writers vary their sentence lengths naturally. Short for emphasis. Long for explanation. Medium for progression. This creates rhythm. AI systems tend toward uniformity, producing sentences that cluster around mean lengths. Grammatically correct, semantically coherent, rhythmically monotonous.

We measure burstiness as the coefficient of variation in sentence lengths. High burstiness means human-like variation. Low burstiness suggests AI influence.

Some genres conventionally show low burstiness, including technical specifications and regulatory filings. Genre classification accounts for this.

Vocabulary richness, measured using Type-Token Ratio: unique words divided by total words. TTR decreases naturally with document length, so we apply normalisation. Language-specific adjustments handle morphologically rich languages (German creates compound words freely, Finnish uses extensive case marking).

Low TTR may indicate AI generation or simply reflect constrained subject matter. Cross-referencing with other tests disambiguates.

Sentence and word complexity, measured using adapted Flesch-Kincaid and language-specific equivalents. AI-generated text tends toward a narrow readability band: accessible but not simple, complex but not demanding. Human institutional writing shows more variation, including deliberate complexity for specialist audiences and deliberate simplicity for public communication.

Readability norms vary significantly by sector and audience. A government document for citizens should read differently from a central bank research paper. Sector baselines account for this.

References to local context: place names, cultural figures, historical events, regional idioms, local institutions. These are the signals that root a text in a specific place and tradition. AI systems trained predominantly on English-language data tend to produce culturally generic output, even when writing in other languages.

We use language-specific lexicons of cultural markers, validated against institutional corpora from each country. Declining cultural marker density is one of the strongest signals in the Observatory corpus.

Lexicon coverage is uneven across languages. Italian and French lexicons are more developed than Finnish or Portuguese. Expansion is ongoing.

Idiomatic expressions, fixed phrases, and figurative language. These are among the hardest elements for AI systems to produce naturally. They require cultural knowledge, contextual judgment, and a feel for register. AI-generated text tends to avoid them, defaulting to literal, compositional language.

We measure idiom density using validated phrase libraries for each language, cross-referenced against institutional genre norms.

Formal institutional writing uses fewer idioms than informal communication. Genre baselines prevent false positives.

The presence of phrases that appear across multiple sectors, organisations, and languages at high frequency in AI-generated text. "We are committed to", "our mission is to", "in today's rapidly changing world". These phrases are not wrong. They are simply everywhere, and their prevalence is a measurable signal.

We maintain a corpus-derived list of generic phrases, updated quarterly as new patterns emerge.

Some generic phrases are genuinely appropriate in specific contexts (mission statements, press releases). Context classification reduces false positives.

Sector-specific technical vocabulary. Jargon is legitimate: it communicates precisely within a specialist community. But jargon patterns are changing. AI systems sometimes over-apply jargon (producing technically dense text that lacks substance) or under-apply it (producing accessible text that lacks precision). Both are detectable deviations from sector norms.

Jargon lexicons require continuous maintenance as terminology evolves. Sector specialists review updates quarterly.

References to external sources, data, research, and evidence. Human institutional writers, particularly in academic and government contexts, anchor claims in evidence. AI systems can produce plausible-sounding claims without evidence, or produce citation patterns that differ from human norms (over-citing, under-citing, or citing in unusual positions within a document).

Citation norms vary dramatically by sector. Academic text has very different citation patterns from corporate communications. Sector baselines are essential.

Epistemic markers: "may", "might", "could", "appears to", "suggests that". Human writers hedge when they are uncertain. AI systems can over-hedge (producing excessively cautious text) or under-hedge (producing overconfident assertions). Both patterns deviate from human institutional norms.

We measure hedging density and compare against sector and genre baselines.

Hedging norms vary by genre and regulatory context. Legal and compliance documents hedge more than press releases. Genre classification is required.

Level 2: Composite Detectors 4 dimensions

Each detector combines multiple Level 1 signals to capture a distinct aspect of how institutional language changes.

How has the emotional register shifted? Tone combines hedging, generic phrases, and readability into a composite picture. A rising score suggests communications are becoming more cautious, more generic, or more formulaic, consistent with AI-assisted production where human editorial voice is gradually displaced by algorithmic defaults.

Tone shifts can reflect legitimate strategic decisions. The detector flags the shift; interpretation requires context.

Is the text losing its locally-grounded language? Cultural combines cultural markers, idiom density, and jargon load. This detector is central to the Observatory's European mission. As AI tools trained predominantly on English-language data proliferate, European organisations may unconsciously adopt Anglophone communication patterns, eroding the linguistic diversity that reflects genuine cultural identity.

Internationalisation strategies legitimately reduce cultural specificity. The detector cannot distinguish intentional globalisation from unintentional erosion without human review.

Has the language become more cautious and defensive? Compliance combines hedging, citation density, and readability. AI-generated drafts frequently over-hedge and under-cite, producing text that reads as if written by a committee trying to avoid saying anything definitive. When this pattern appears across an organisation's output, it signals systematic reliance on AI-generated first drafts without sufficient editorial intervention.

Increased compliance language may reflect genuine regulatory pressure (GDPR, AI Act). Sector-specific baselines help disambiguate.

The three most language-agnostic signals combined: perplexity, burstiness, and lexical diversity. These metrics work across all languages and require no cultural or domain-specific databases. When all three shift simultaneously toward AI-characteristic patterns, the probability of genuine AI adoption is high.

Highly technical writing in narrow domains can score as AI-like due to constrained vocabulary and predictable sentence patterns. Genre-specific baselines mitigate this.

Level 3: Observatory Indices 2 indices

The Observatory's headline findings. These aggregate Level 1 and Level 2 results across time and across organisations.

How much has this organisation's language changed from its 2015–2019 baseline? By comparing each organisation to itself, the TEMPORAL index controls for sector conventions, cultural norms, and institutional style. A luxury brand is measured against its own past voice, not against a government agency's baseline.

The TEMPORAL index is the foundation of the PLI score. It captures the magnitude of linguistic change and provides early warning when an organisation's voice begins to shift significantly.

Are different organisations, sectors, and countries beginning to sound alike? Individual drift is interesting. Collective convergence is alarming. When a European luxury automaker and a French maison produce linguistically identical communications — organisations with distinct histories, audiences, and product categories — something systemic is at work.

The index computes pairwise similarity scores across the entire corpus and tracks how those distributions evolve. A narrowing distribution means AI tools are imposing a shared linguistic template across institutional boundaries.

Baselines

Every PRISM™ measurement is relative to a baseline. Baselines are calculated from the 2015–2019 institutional archive: how each organisation, in each sector, in each country, actually wrote before AI writing tools were widely available.

Baselines are organisation-specific, not sector averages. A luxury brand and a government ministry have different baselines, even if they operate in the same country. This specificity is what makes PRISM™ results meaningful: drift is measured against the organisation's own voice, not against a generic norm.

Baselines are updated annually as the archive expands and as validation improves. Organisations that joined the corpus after 2019 use a retrospective baseline calculated from their earliest available documents.

Scoring and normalisation

All PRISM™ scores are normalised to a 0–100 scale. 100 represents the highest observed distinctiveness in the corpus (2015–2019 baseline period). 0 represents complete convergence with the corpus mean.

Scores are calculated at document level and aggregated to organisation level using a weighted mean that accounts for document length, recency, and genre. Sector benchmarks are calculated from all organisations in the sector, weighted equally.

Confidence intervals are reported for all scores. Organisations with fewer than 50 documents in the analysis period show wider confidence intervals. The Observatory displays confidence intervals in all detailed views.

Data Sources

The Observatory corpus draws from two channels.

Web archives. Internet Archive (Wayback Machine) is the primary source, providing historical snapshots of institutional websites dating back to 2015. Archive.today, Arquivo.pt (the Portuguese web archive, with broad European coverage), and Common Crawl provide supplementary sources and cross-validation. Multiple archives ensure that no single source's collection biases dominate the corpus.

Academic partnerships. Validated datasets, specialist corpora, and cross-disciplinary linguistic research from collaborations with European research institutes and universities. These partnerships strengthen baseline quality and expand language coverage.

All documents undergo language detection, content extraction with boilerplate removal, tokenisation, and quality filtering. Exclusions are recorded with specific reasons. Every datapoint carries a cryptographic timestamp (see below).

Blockchain Verification

Every datapoint and every PRISM™ analysis result is cryptographically timestamped using OpenTimestamps on the Bitcoin blockchain. A research platform making claims about how language has changed over a decade must be able to prove when its evidence was collected and that results have not been altered after the fact. Blockchain verification provides this guarantee without requiring trust in any single institution. Including CopyMama itself.

For academic partners, every dataset contributed to the Observatory is independently verifiable. For the research community, the Observatory's findings are auditable from raw datapoint to published index. Early high-scoring authenticity results become increasingly valuable over time: timestamped proof of genuine human communication patterns recorded before widespread AI adoption reshaped the institutional linguistic landscape.

Interpretation and Appropriate Use

PRISM™ results should be interpreted within institutional context. Sector norms vary. Regional linguistic practices differ across EU member states. Genre conventions influence legitimate language patterns. Temporal trends require historical perspective.

Valid applications include longitudinal monitoring of communication patterns, comparative analysis across sectors or regions, baseline establishment for organisational standards, and research into linguistic evolution and AI impact.

Invalid applications include determining individual authorship or intent, enforcement actions based on automated analysis alone, definitive AI detection without human review, and policy decisions without contextual investigation.

Current Limitations

Language coverage currently encompasses English, Italian, French, German, and Spanish, with Finnish in early collection. Expansion to all 24 EU official languages is underway through academic partnerships.

AI detection reflects the state of the field. Current stylistic analysis detects patterns associated with LLM output but cannot definitively prove AI authorship. Hybrid human-AI workflows present particular challenges.

Baseline validation is ongoing. Current baselines derive from 2015–2019 institutional archives. Comprehensive establishment requires larger validated corpora across all sectors, regions, and languages. This is the core scaling objective of the Observatory's collection programme.

How to Cite

PLI Observatory. Public Language Index, PRISM v2.1. Milan: CopyMama S.R.L. Available at: pli.copymama.it

For academic use, please include the methodology version and data retrieval date.

Why Statistics, Not LLMs

PRISM™ uses statistical and computational linguistics methods — not large language models — to measure institutional text. This is a deliberate methodological choice, not a technical constraint.

The problem with using AI to detect AI influence is circularity. An LLM trained on post-2022 text has already absorbed the patterns it is being asked to identify. Its judgements about what sounds “AI-like” are contaminated by its own training distribution. Statistical measures — perplexity, burstiness, type-token ratio, idiom density — are derived from the text itself, not from a model’s prior expectations about what text should look like.

Statistical measures are also independently verifiable. A perplexity score of 42.7 is reproducible: run the same model on the same text and you get the same number. An LLM’s qualitative assessment of “this reads like AI” is not. For a research instrument designed to track change over time across 600 organisations, reproducibility is not optional.

Finally, statistical measures do not drift. An LLM’s sense of what constitutes “natural” institutional language will shift as its training data evolves. A burstiness calculation does not. The Observatory’s 2015 baseline documents will score identically in 2025 and in 2035. That stability is what makes longitudinal comparison meaningful.

PRISM™ does use language models in one limited context: the dimension interpretations on organisation profile pages, where Mistral generates plain-language explanations of what a specific score means for a specific organisation. These interpretations are clearly labelled, cached, and generated from the statistical scores — not used to produce the scores themselves.

Frequently Asked Questions

Not necessarily. A high PLI score means an organisation's language has drifted significantly from its own 2015–2019 baseline toward patterns that are statistically associated with AI-assisted production. That drift could reflect AI adoption, a deliberate editorial rebrand, staff turnover, or a shift in target audience.

PRISM™ measures linguistic change. It does not attribute cause. What it can say is: this organisation's language looks measurably different from how it used to write, and the direction of that change is consistent with AI influence across these specific dimensions.

Two reasons. First, European institutional communication is linguistically diverse — 24 official EU languages, distinct national communication cultures, and a regulatory environment that creates specific genre norms. This diversity makes AI-driven homogenisation both more detectable and more consequential.

Second, the EU AI Act creates a specific accountability context. European institutions are subject to transparency requirements that make linguistic analysis both legally relevant and practically actionable.

AI detection tools attempt to classify individual documents as AI-generated or human-written. The Observatory does not. It tracks longitudinal change across institutional corpora — how an organisation's language evolves over years, not whether a specific document was written by a human or a machine.

This distinction matters. A single document classified as AI-generated tells you something about that document. A corpus showing consistent drift across 200 documents over three years tells you something about an organisation's communication strategy.

Yes. Organisations can request their full PRISM™ profile, including dimension scores, baseline comparisons, and sector rankings, through the VOICE client module. The Observatory's public-facing data shows aggregate patterns and sector rankings; detailed organisation-level data is available under the client programme.

The corpus is updated continuously. The scraper runs daily, collecting current-period documents from the Wayback Machine archive. Scores are recalculated quarterly to incorporate new documents and any methodology updates. The baseline (2015–2019) is fixed and does not change.

PRISM™ currently analyses documents in English, French, German, Italian, Spanish, Dutch, Polish, Swedish, Danish, and Finnish. Language-specific models and lexicons are maintained for each. Coverage is uneven: English, French, Italian, and German have the most developed lexicons. Expansion to additional EU languages is ongoing.

Research Collaboration

The Observatory actively seeks collaboration with European computational linguistics research institutions for methodology validation, baseline corpus development, cross-linguistic analysis, and publication of research findings.

Research enquiries: research@copymama.it

Methodology version 2.1 · Last reviewed February 2026 · Quarterly updates incorporating academic feedback

How the PLI is built

Level 1: Atomic Signals

Level 2: Composite Detectors

Level 3: Observatory Indices

Architecture Overview

Level 1: Atomic Tests 10 metrics

1. Perplexity

2. Burstiness

3. Lexical Diversity

4. Readability

5. Cultural Markers

6. Idiom Density

7. Generic Phrases

8. Jargon Load

9. Citation Density

10. Hedging Language

Level 2: Composite Detectors 4 dimensions

TONE

CULTURAL

COMPLIANCE

PERFORMANCE

Level 3: Observatory Indices 2 indices

TEMPORAL Index

HOMOGENISATION Index

Baselines

Scoring and normalisation

Data Sources

Blockchain Verification

Interpretation and Appropriate Use

Current Limitations

How to Cite

Why Statistics, Not LLMs

Frequently Asked Questions

Does a high PLI score mean an organisation is using AI?

Why does the Observatory focus on European institutions?

How is the Observatory different from AI detection tools?

Can organisations request their own data?

How often is the corpus updated?

What languages does PRISM™ support?

Research Collaboration