Observing the Intrinsic Value of Information

Observing the Intrinsic Value of Information
Fragile Scarcity in a Tech Debt Spiral. (High Scarcity, Poor Lifecycle/Stability, Low Validity, Poor Integrity). Author, with Gemini

At some point we will want to turn the six Infonomics metrics into something an analyst can score by listing what they can see and count. For each metric, we can organise the observables into a simple hierarchy (Level 1 dimension → Level 2 indicators → Level 3 measures).

Below is a practical scaffold which can be adapted into a rubric or checklist for the first of six Infonomics metrics, the Intrinsic Value of Information. The IVI asks how correct, complete and scarce is the data itself, regardless of how it is used?

1. Intrinsic Value of Information (IVI)

Question: How good is this data in itself?

Level 1 dimensions

  1. Validity / Accuracy
  2. Completeness / Coverage
  3. Scarcity / Uniqueness
  4. Lifecycle / Stability
  5. Integrity / Consistency

Level 2 indicators and observables

  • Validity / Accuracy
    • % records passing validation rules (type, range, referential integrity)
    • % sample records that match trusted external sources (e.g. registry, benchmark).
    • Number of critical data quality incidents per period.
  • Completeness / Coverage
    • % required fields populated.
    • Records as % of total potential universe (e.g. “80% of active customers represented”).
    • Presence of key entities/attributes needed for common analyses.
  • Scarcity / Uniqueness
    • Existence of exclusive sources (only you can collect it).
    • Difficulty for competitors to replicate (regulatory barriers, physical access).
    • % of data elements that are not available via public or commercial datasets.
  • Lifecycle / Stability
    • Typical useful life of a record before it becomes stale (days/months/years).
    • Observed drift rates over time (how fast means/distributions shift).
    • Frequency of schema changes or breakages.
  • Integrity / Consistency
    • Degree of duplication / conflict between systems (same entity, different values).
    • Existence and quality of MDM / golden records.
    • % of key entities with a single, reconciled identifier.

How to use

  • For each dimension, define a 1–5 scale with concrete thresholds (e.g. 95%+ validity = 5, <70% = 1).
  • IVI score = weighted average of those five-dimension scores, with weights tuned to context.

IVI scoring rubric (1–5)

Here’s a concrete 1–5 scoring rubric for IVI (Intrinsic Value of Information), parallel to the BVI and PVI structures.

Assume IVI is the weighted sum of five dimensions:

  • Validity / accuracy – 30%
  • Completeness / coverage – 25%
  • Scarcity / uniqueness – 20%
  • Lifecycle / stability – 15%
  • Integrity / consistency – 10%

1. Validity / accuracy (weight 30%)

Question: How correct and error‑free is the data, objectively?

Score

Descriptor

1 – Very poor

High error rates; frequent contradictions with trusted sources. Data is often unusable without heavy manual correction. No systematic validation rules.

2 – Poor

Many known errors; basic validation exists but is patchy. Spot checks routinely uncover material inaccuracies. Not trusted for critical work.

3 – Adequate

Reasonable accuracy for most uses; automated validation catches obvious issues. Occasional material errors, but generally usable with some caution.

4 – High

Strong validation and monitoring; rare material errors. Spot checks against reference data/ground truth show high agreement (e.g. >95% for key fields).

5 – Very high

Accuracy approaches authoritative source levels for its domain. Rigorous validation, reconciliation with external references, and rapid correction of any discrepancies.

2. Completeness / coverage (weight 25%)

Question: How fully does the data describe the entities/events it is meant to cover?

Score

Descriptor

1 – Very incomplete

Many key fields are missing; large gaps in time or population. <50% of required attributes or records present for target use cases.

2 – Incomplete

Important fields often missing; 50–70% of required attributes/records present. Analysts regularly struggle with gaps or need to impute heavily.

3 – Adequate

70–90% of required attributes and records present. Some gaps exist but are tolerable or can be mitigated for most use cases.

4 – High

90–98% of required attributes and records present. Only minor or low‑value gaps remain. Considered “good enough” for almost all analyses.

5 – Near complete

>98% of required attributes and records present, with well‑understood and documented residual gaps. Often treated as the definitive record for its scope.

3. Scarcity / uniqueness (weight 20%)

Question: How difficult is it for others to obtain equivalent data?

Score

Descriptor

1 – Commodity

Essentially the same data is widely available (public/open data, low‑cost vendors). Easy for others to replicate or substitute.showmethedata+1

2 – Low scarcity

Slight differentiation (e.g. marginally better quality or convenience), but similar content is available from several external sources.

3 – Moderately unique

Some elements are hard to reproduce (e.g. specific combinations of variables, moderate barriers to collection), but partial substitutes exist.

4 – Highly unique

Most of the asset cannot be feasibly replicated by others due to technical, legal, or structural barriers (e.g. proprietary channels, exclusive rights).

5 – Exclusive / strategic

Data is essentially unique to the holder (exclusive access, legal monopoly, or deeply entrenched structural advantage). Competitors cannot obtain a close substitute at any reasonable cost.

4. Lifecycle / stability (weight 15%)

Question: How long does the data remain valid and how stable is its meaning over time?

Score

Descriptor

1 – Very short‑lived / volatile

Data becomes obsolete very quickly (e.g. hours/days), with high drift and frequent redefinitions. Past values have limited ongoing utility.

2 – Short‑lived

Useful life is limited (weeks) for most key uses; structure or semantics change often, requiring frequent rework.

3 – Moderate life

Useful for months; some drift, but manageable with periodic recalibration. Occasional schema or definition changes.

4 – Long‑lived

Useful for years; relatively stable semantics and distributions. Changes are controlled and well‑documented.

5 – Very long‑lived / persistent

Data elements (e.g. physical characteristics, long‑term contractual attributes) retain relevance for many years with minimal drift. Acts as durable reference data.

5. Integrity / consistency (weight 10%)

Question: How internally coherent is the data across systems, records, and time?

Score

Descriptor

1 – Highly inconsistent

Many conflicting records for the same entity; IDs unstable or reused; frequent contradictions across systems. No single source of truth.

2 – Inconsistent

Some attempts at reconciliation, but duplication and conflict are common. Analysts often build their own fixes to use the data.

3 – Mostly consistent

Clear primary system for most entities, with occasional conflicts that can be resolved. Basic master data practices in place.

4 – High integrity

Strong identity management and reconciliation. Golden records exist for key entities, and discrepancies are rare and quickly resolved.

5 – Very high integrity

Comprehensive, well‑governed master data and reference data. Entity relationships and history (slowly changing attributes, versions) are accurately maintained across the ecosystem.

Computing the composite IVI score

  1. For a given dataset, assign 1–5 for each IVI dimension based on the descriptors.
  2. Compute a weighted average:

𝐼𝑉𝐼composite=0.30⋅𝑉𝑎+0.25⋅𝐶𝑐+0.20⋅𝑆𝑢+0.15⋅𝐿𝑠+0.10⋅𝐼𝑐

Where:

  • 𝑉𝑎 = Validity / accuracy score
  • 𝐶𝑐 = Completeness / coverage score
  • 𝑆𝑢 = Scarcity / uniqueness score
  • 𝐿𝑠 = Lifecycle / stability score
  • 𝐼𝑐 = Integrity / consistency score

Optionally normalise by dividing by 5 to get a 0–1 IVI index