Observing the Cost Value of Information (CVI)

Observing the Cost Value of Information (CVI)
The Suffocating Hoard (Very Large Thermodynamic Cost, Excessive Maintenance Costs and Infinite Loss of Value). Author, using Gemini

Question: What does it cost to produce, maintain and protect this data?

Level 1 dimensions

  1. Acquisition / production cost
  2. Processing / storage / maintenance cost
  3. Quality management cost
  4. Compliance / security overhead

Level 2 indicators and observables

  • Acquisition / production
    • Annual spend on collection channels (sensors, surveys, vendors).
    • Internal labour time attributed to data creation.
    • Licensing fees for external data.
  • Processing / storage / maintenance
    • ETL/ELT pipeline costs (compute, engineering time).
    • Storage and backup costs (per TB per month, by tier).
    • Number and cost of system integrations that exist purely to support this dataset.
  • Quality management
    • Hours / FTEs devoted to data cleaning, deduplication, reconciliation.
    • Cost of data quality tools used primarily for this asset.
    • Re‑work costs due to data errors (e.g. corrections, reversals).
  • Compliance / security overhead
    • Incremental costs for privacy/security controls specific to this dataset (e.g. high‑tier cloud/security SKUs).
    • DSAR / consent management volumes linked to this data.
    • Audit costs attributable to this asset.

How to use

  • Express CVI as annual total, and per‑unit (per record, per customer, per decision).
  • High CVI is not bad in itself; it becomes problematic when it exceeds medium‑term EVI.

CVI scoring rubric (1–5)

Here’s a concrete 1–10 scoring rubric for CVI (Cost Value of Information), aligned with the definition “what it costs to acquire, maintain, replace, or lose the data.”

Assume CVI is the weighted sum of four dimensions (all measured in monetary terms over a defined period, e.g. per year):

  • Acquisition / production cost – 30%
  • Processing / storage / maintenance cost – 25%
  • Quality / governance / compliance cost – 25%
  • Replacement / loss impact – 20%

We can invert the scoring (1 = low cost, 5 = high cost) if we want to treat CVI purely as a monetary figure; here we treat 5 = economically heavy data.

Acquisition / production cost (weight 30%)

Question: How expensive is it to obtain or generate this data?

Score

Descriptor

1 – 2 Very low

Minimal direct spend. Data is mostly generated as a by‑product of existing operations; no separate licences or collection channels.

3 – 4 Low

Some incremental cost (e.g. small vendor feeds, simple surveys), but clearly minor relative to typical project/opex budgets.

5 – 6 Moderate

Noticeable recurring spend: specialised collection infrastructure, paid sources, or significant internal labour. Still affordable at scale.

7 – 8 High

Substantial annual spend (e.g. major vendor contracts, dedicated sensor networks, sizable in‑house data‑creation teams). Stopping acquisition would be a budget decision.

9 – 10 Very high / strategic investment

Very large, long‑term investments (e.g. multi‑million external licences, extensive proprietary collection infrastructure) that require explicit executive sponsorship.

Processing / storage / maintenance cost (weight 25%)

Question: What does it cost to process, store, and technically maintain this data?

Score

Descriptor

1 – 2 Very low

Small volume, simple structure. Processing fits into existing pipelines; storage costs negligible. No special SLAs or performance requirements.

3 – 4 Low

Some dedicated ETL/ELT and storage, but incremental compute and storage spend is minor; no special technologies needed.

5 – 6 Moderate

Non‑trivial processing and storage footprint (e.g. large datasets, moderate complexity). Requires tuned pipelines and some capacity planning.

7 – 8 High

Significant infrastructure and engineering effort: large volumes, complex pipelines, or strict performance/availability demands. May drive dedicated clusters or premium storage tiers.

9 – 10 Very high

Processing and storage dominate part of the data platform budget (e.g. petabyte‑scale, real‑time, high‑availability requirements). Changes to this asset materially affect platform TCO.

3. Quality / governance / compliance cost (weight 25%)

Question: How much does it cost to ensure this data is clean, governed, and compliant?

Score

Descriptor

1 – 2 Very low

Little or no dedicated quality, governance, or compliance effort beyond standard controls. Few, if any, regulatory constraints.

3 – 4 Low

Some data quality rules and ownership defined, but limited investment in tools or specialist staff. Compliance requirements light.

5 – 6 Moderate

Regular data quality work, monitoring, and stewardship. Some specialised tools or processes. Moderate regulatory/privacy obligations.

7 – 8 High

Substantial ongoing investment in DQ tooling, stewards, audits, and compliance processes (e.g. frequent DSAR, sector regulation). Non‑compliance would be costly.

9 – 10 Very high

Heavy governance/compliance regime: strict regulation, complex consent models, intensive audits. Maintaining compliant, high‑quality data is a major cost line.

4. Replacement / loss impact (weight 20%)

Question: What would it cost to replace the data or absorb the impact if it were lost/compromised?

Score

Descriptor

1 – 2 Trivial

Easy and cheap to recreate or reacquire. Loss would cause minor disruption only. No meaningful financial or operational impact.

3 – 4 Low

Some effort to replace (e.g. re‑run processes, re‑buy feeds), but costs are limited and timelines short. Impact manageable.

5 – 6 Moderate

Replacement would take time and non‑trivial money; some downtime or degraded performance. Possible minor contract or SLA penalties.

7 – 8 High

Replacement costly and slow; loss would cause sustained operational or commercial issues, and may trigger material penalties or emergency spend.

9 – 10 Very high / critical

Replacement is extremely costly or impossible. Loss or corruption would force major remediation programs, significant fines, brand damage, or long‑term business impact.

Computing the composite CVI score

  1. For a dataset, assign 1–10 for each CVI dimension using the descriptors.
  2. Compute a weighted average:

𝐶𝑉𝐼composite=0.30⋅𝐴+0.25⋅𝑃+0.25⋅𝑄+0.20⋅𝑅

Where:

  • 𝐴 = Acquisition / production cost score
  • 𝑃 = Processing / storage / maintenance cost score
  • 𝑄 = Quality / governance / compliance cost score
  • 𝑅 = Replacement / loss impact score
  1. Separately, maintain the actual monetary CVI (annual cost plus estimated loss impact) in your financial model; use the 1–5 rubric as a normalised, comparable index across assets.