At some point we will want to turn the six Infonomics metrics into something an analyst can score by listing what they can see and count. For each metric, we can organise the observables into a simple hierarchy (Level 1 dimension → Level 2 indicators → Level 3 measures).
Below is a practical scaffold which can be adapted into a rubric or checklist for the first of six Infonomics metrics, the Intrinsic Value of Information. The IVI asks how correct, complete and scarce is the data itself, regardless of how it is used?
Question: How good is this data in itself?
Level 1 dimensions
- Validity / Accuracy
- Completeness / Coverage
- Scarcity / Uniqueness
- Lifecycle / Stability
- Integrity / Consistency
Level 2 indicators and observables
- Validity / Accuracy
- % records passing validation rules (type, range, referential integrity)
- % sample records that match trusted external sources (e.g. registry, benchmark).
- Number of critical data quality incidents per period.
- Completeness / Coverage
- % required fields populated.
- Records as % of total potential universe (e.g. “80% of active customers represented”).
- Presence of key entities/attributes needed for common analyses.
- Scarcity / Uniqueness
- Existence of exclusive sources (only you can collect it).
- Difficulty for competitors to replicate (regulatory barriers, physical access).
- % of data elements that are not available via public or commercial datasets.
- Lifecycle / Stability
- Typical useful life of a record before it becomes stale (days/months/years).
- Observed drift rates over time (how fast means/distributions shift).
- Frequency of schema changes or breakages.
- Integrity / Consistency
- Degree of duplication / conflict between systems (same entity, different values).
- Existence and quality of MDM / golden records.
- % of key entities with a single, reconciled identifier.
How to use
- For each dimension, define a 1–5 scale with concrete thresholds (e.g. 95%+ validity = 5, <70% = 1).
- IVI score = weighted average of those five-dimension scores, with weights tuned to context.
IVI scoring rubric (1–5)
Here’s a concrete 1–5 scoring rubric for IVI (Intrinsic Value of Information), parallel to the BVI and PVI structures.
Assume IVI is the weighted sum of five dimensions:
- Validity / accuracy – 30%
- Completeness / coverage – 25%
- Scarcity / uniqueness – 20%
- Lifecycle / stability – 15%
- Integrity / consistency – 10%
1. Validity / accuracy (weight 30%)
Question: How correct and error‑free is the data, objectively?
Score | Descriptor |
1 – Very poor | High error rates; frequent contradictions with trusted sources. Data is often unusable without heavy manual correction. No systematic validation rules. |
2 – Poor | Many known errors; basic validation exists but is patchy. Spot checks routinely uncover material inaccuracies. Not trusted for critical work. |
3 – Adequate | Reasonable accuracy for most uses; automated validation catches obvious issues. Occasional material errors, but generally usable with some caution. |
4 – High | Strong validation and monitoring; rare material errors. Spot checks against reference data/ground truth show high agreement (e.g. >95% for key fields). |
5 – Very high | Accuracy approaches authoritative source levels for its domain. Rigorous validation, reconciliation with external references, and rapid correction of any discrepancies. |
2. Completeness / coverage (weight 25%)
Question: How fully does the data describe the entities/events it is meant to cover?
Score | Descriptor |
1 – Very incomplete | Many key fields are missing; large gaps in time or population. <50% of required attributes or records present for target use cases. |
2 – Incomplete | Important fields often missing; 50–70% of required attributes/records present. Analysts regularly struggle with gaps or need to impute heavily. |
3 – Adequate | 70–90% of required attributes and records present. Some gaps exist but are tolerable or can be mitigated for most use cases. |
4 – High | 90–98% of required attributes and records present. Only minor or low‑value gaps remain. Considered “good enough” for almost all analyses. |
5 – Near complete | >98% of required attributes and records present, with well‑understood and documented residual gaps. Often treated as the definitive record for its scope. |
3. Scarcity / uniqueness (weight 20%)
Question: How difficult is it for others to obtain equivalent data?
Score | Descriptor |
1 – Commodity | Essentially the same data is widely available (public/open data, low‑cost vendors). Easy for others to replicate or substitute.showmethedata+1 |
2 – Low scarcity | Slight differentiation (e.g. marginally better quality or convenience), but similar content is available from several external sources. |
3 – Moderately unique | Some elements are hard to reproduce (e.g. specific combinations of variables, moderate barriers to collection), but partial substitutes exist. |
4 – Highly unique | Most of the asset cannot be feasibly replicated by others due to technical, legal, or structural barriers (e.g. proprietary channels, exclusive rights). |
5 – Exclusive / strategic | Data is essentially unique to the holder (exclusive access, legal monopoly, or deeply entrenched structural advantage). Competitors cannot obtain a close substitute at any reasonable cost. |
4. Lifecycle / stability (weight 15%)
Question: How long does the data remain valid and how stable is its meaning over time?
Score | Descriptor |
1 – Very short‑lived / volatile | Data becomes obsolete very quickly (e.g. hours/days), with high drift and frequent redefinitions. Past values have limited ongoing utility. |
2 – Short‑lived | Useful life is limited (weeks) for most key uses; structure or semantics change often, requiring frequent rework. |
3 – Moderate life | Useful for months; some drift, but manageable with periodic recalibration. Occasional schema or definition changes. |
4 – Long‑lived | Useful for years; relatively stable semantics and distributions. Changes are controlled and well‑documented. |
5 – Very long‑lived / persistent | Data elements (e.g. physical characteristics, long‑term contractual attributes) retain relevance for many years with minimal drift. Acts as durable reference data. |
5. Integrity / consistency (weight 10%)
Question: How internally coherent is the data across systems, records, and time?
Score | Descriptor |
1 – Highly inconsistent | Many conflicting records for the same entity; IDs unstable or reused; frequent contradictions across systems. No single source of truth. |
2 – Inconsistent | Some attempts at reconciliation, but duplication and conflict are common. Analysts often build their own fixes to use the data. |
3 – Mostly consistent | Clear primary system for most entities, with occasional conflicts that can be resolved. Basic master data practices in place. |
4 – High integrity | Strong identity management and reconciliation. Golden records exist for key entities, and discrepancies are rare and quickly resolved. |
5 – Very high integrity | Comprehensive, well‑governed master data and reference data. Entity relationships and history (slowly changing attributes, versions) are accurately maintained across the ecosystem. |
Computing the composite IVI score
- For a given dataset, assign 1–5 for each IVI dimension based on the descriptors.
- Compute a weighted average:
𝐼𝑉𝐼composite=0.30⋅𝑉𝑎+0.25⋅𝐶𝑐+0.20⋅𝑆𝑢+0.15⋅𝐿𝑠+0.10⋅𝐼𝑐
Where:
- 𝑉𝑎 = Validity / accuracy score
- 𝐶𝑐 = Completeness / coverage score
- 𝑆𝑢 = Scarcity / uniqueness score
- 𝐿𝑠 = Lifecycle / stability score
- 𝐼𝑐 = Integrity / consistency score
Optionally normalise by dividing by 5 to get a 0–1 IVI index