Observing the Intrinsic Value of Information

Fragile Scarcity in a Tech Debt Spiral. (High Scarcity, Poor Lifecycle/Stability, Low Validity, Poor Integrity). Author, with Gemini

At some point we will want to turn the six Infonomics metrics into something an analyst can score by listing what they can see and count. For each metric, we can organise the observables into a simple hierarchy (Level 1 dimension → Level 2 indicators → Level 3 measures).

Below is a practical scaffold which can be adapted into a rubric or checklist for the first of six Infonomics metrics, the Intrinsic Value of Information. The IVI asks how correct, complete and scarce is the data itself, regardless of how it is used?

1. Intrinsic Value of Information (IVI)

Question: How good is this data in itself?

Level 1 dimensions

Validity / Accuracy
Completeness / Coverage
Scarcity / Uniqueness
Lifecycle / Stability
Integrity / Consistency

Level 2 indicators and observables

Validity / Accuracy
- % records passing validation rules (type, range, referential integrity)
- % sample records that match trusted external sources (e.g. registry, benchmark).
- Number of critical data quality incidents per period.
Completeness / Coverage
- % required fields populated.
- Records as % of total potential universe (e.g. “80% of active customers represented”).
- Presence of key entities/attributes needed for common analyses.
Scarcity / Uniqueness
- Existence of exclusive sources (only you can collect it).
- Difficulty for competitors to replicate (regulatory barriers, physical access).
- % of data elements that are not available via public or commercial datasets.
Lifecycle / Stability
- Typical useful life of a record before it becomes stale (days/months/years).
- Observed drift rates over time (how fast means/distributions shift).
- Frequency of schema changes or breakages.
Integrity / Consistency
- Degree of duplication / conflict between systems (same entity, different values).
- Existence and quality of MDM / golden records.
- % of key entities with a single, reconciled identifier.

How to use

For each dimension, define a 1–5 scale with concrete thresholds (e.g. 95%+ validity = 5, <70% = 1).
IVI score = weighted average of those five-dimension scores, with weights tuned to context.

IVI scoring rubric (1–5)

Here’s a concrete 1–5 scoring rubric for IVI (Intrinsic Value of Information), parallel to the BVI and PVI structures.

Assume IVI is the weighted sum of five dimensions:

Validity / accuracy – 30%
Completeness / coverage – 25%
Scarcity / uniqueness – 20%
Lifecycle / stability – 15%
Integrity / consistency – 10%

1. Validity / accuracy (weight 30%)

Question: How correct and error‑free is the data, objectively?

Score	Descriptor
1 – Very poor	High error rates; frequent contradictions with trusted sources. Data is often unusable without heavy manual correction. No systematic validation rules.
2 – Poor	Many known errors; basic validation exists but is patchy. Spot checks routinely uncover material inaccuracies. Not trusted for critical work.
3 – Adequate	Reasonable accuracy for most uses; automated validation catches obvious issues. Occasional material errors, but generally usable with some caution.
4 – High	Strong validation and monitoring; rare material errors. Spot checks against reference data/ground truth show high agreement (e.g. >95% for key fields).
5 – Very high	Accuracy approaches authoritative source levels for its domain. Rigorous validation, reconciliation with external references, and rapid correction of any discrepancies.

2. Completeness / coverage (weight 25%)

Question: How fully does the data describe the entities/events it is meant to cover?

Score	Descriptor
1 – Very incomplete	Many key fields are missing; large gaps in time or population. <50% of required attributes or records present for target use cases.
2 – Incomplete	Important fields often missing; 50–70% of required attributes/records present. Analysts regularly struggle with gaps or need to impute heavily.
3 – Adequate	70–90% of required attributes and records present. Some gaps exist but are tolerable or can be mitigated for most use cases.
4 – High	90–98% of required attributes and records present. Only minor or low‑value gaps remain. Considered “good enough” for almost all analyses.
5 – Near complete	>98% of required attributes and records present, with well‑understood and documented residual gaps. Often treated as the definitive record for its scope.

3. Scarcity / uniqueness (weight 20%)

Question: How difficult is it for others to obtain equivalent data?

Score	Descriptor
1 – Commodity	Essentially the same data is widely available (public/open data, low‑cost vendors). Easy for others to replicate or substitute.showmethedata+1
2 – Low scarcity	Slight differentiation (e.g. marginally better quality or convenience), but similar content is available from several external sources.
3 – Moderately unique	Some elements are hard to reproduce (e.g. specific combinations of variables, moderate barriers to collection), but partial substitutes exist.
4 – Highly unique	Most of the asset cannot be feasibly replicated by others due to technical, legal, or structural barriers (e.g. proprietary channels, exclusive rights).
5 – Exclusive / strategic	Data is essentially unique to the holder (exclusive access, legal monopoly, or deeply entrenched structural advantage). Competitors cannot obtain a close substitute at any reasonable cost.

4. Lifecycle / stability (weight 15%)

Question: How long does the data remain valid and how stable is its meaning over time?

Score	Descriptor
1 – Very short‑lived / volatile	Data becomes obsolete very quickly (e.g. hours/days), with high drift and frequent redefinitions. Past values have limited ongoing utility.
2 – Short‑lived	Useful life is limited (weeks) for most key uses; structure or semantics change often, requiring frequent rework.
3 – Moderate life	Useful for months; some drift, but manageable with periodic recalibration. Occasional schema or definition changes.
4 – Long‑lived	Useful for years; relatively stable semantics and distributions. Changes are controlled and well‑documented.
5 – Very long‑lived / persistent	Data elements (e.g. physical characteristics, long‑term contractual attributes) retain relevance for many years with minimal drift. Acts as durable reference data.

5. Integrity / consistency (weight 10%)

Question: How internally coherent is the data across systems, records, and time?

Score	Descriptor
1 – Highly inconsistent	Many conflicting records for the same entity; IDs unstable or reused; frequent contradictions across systems. No single source of truth.
2 – Inconsistent	Some attempts at reconciliation, but duplication and conflict are common. Analysts often build their own fixes to use the data.
3 – Mostly consistent	Clear primary system for most entities, with occasional conflicts that can be resolved. Basic master data practices in place.
4 – High integrity	Strong identity management and reconciliation. Golden records exist for key entities, and discrepancies are rare and quickly resolved.
5 – Very high integrity	Comprehensive, well‑governed master data and reference data. Entity relationships and history (slowly changing attributes, versions) are accurately maintained across the ecosystem.

Computing the composite IVI score

For a given dataset, assign 1–5 for each IVI dimension based on the descriptors.
Compute a weighted average:

𝐼𝑉𝐼_composite=0.30⋅𝑉_𝑎+0.25⋅𝐶_𝑐+0.20⋅𝑆_𝑢+0.15⋅𝐿_𝑠+0.10⋅𝐼_𝑐

Where:

𝑉_𝑎 = Validity / accuracy score
𝐶_𝑐 = Completeness / coverage score
𝑆_𝑢 = Scarcity / uniqueness score
𝐿_𝑠 = Lifecycle / stability score
𝐼_𝑐 = Integrity / consistency score

Optionally normalise by dividing by 5 to get a 0–1 IVI index