Observing the Cost Value of Information (CVI)

The Suffocating Hoard (Very Large Thermodynamic Cost, Excessive Maintenance Costs and Infinite Loss of Value). Author, using Gemini

Question: What does it cost to produce, maintain and protect this data?

Level 1 dimensions

Acquisition / production cost
Processing / storage / maintenance cost
Quality management cost
Compliance / security overhead

Level 2 indicators and observables

Acquisition / production
- Annual spend on collection channels (sensors, surveys, vendors).
- Internal labour time attributed to data creation.
- Licensing fees for external data.
Processing / storage / maintenance
- ETL/ELT pipeline costs (compute, engineering time).
- Storage and backup costs (per TB per month, by tier).
- Number and cost of system integrations that exist purely to support this dataset.
Quality management
- Hours / FTEs devoted to data cleaning, deduplication, reconciliation.
- Cost of data quality tools used primarily for this asset.
- Re‑work costs due to data errors (e.g. corrections, reversals).
Compliance / security overhead
- Incremental costs for privacy/security controls specific to this dataset (e.g. high‑tier cloud/security SKUs).
- DSAR / consent management volumes linked to this data.
- Audit costs attributable to this asset.

How to use

Express CVI as annual total, and per‑unit (per record, per customer, per decision).
High CVI is not bad in itself; it becomes problematic when it exceeds medium‑term EVI.

CVI scoring rubric (1–5)

Here’s a concrete 1–10 scoring rubric for CVI (Cost Value of Information), aligned with the definition “what it costs to acquire, maintain, replace, or lose the data.”

Assume CVI is the weighted sum of four dimensions (all measured in monetary terms over a defined period, e.g. per year):

Acquisition / production cost – 30%
Processing / storage / maintenance cost – 25%
Quality / governance / compliance cost – 25%
Replacement / loss impact – 20%

We can invert the scoring (1 = low cost, 5 = high cost) if we want to treat CVI purely as a monetary figure; here we treat 5 = economically heavy data.

Acquisition / production cost (weight 30%)

Question: How expensive is it to obtain or generate this data?

Score	Descriptor
1 – 2 Very low	Minimal direct spend. Data is mostly generated as a by‑product of existing operations; no separate licences or collection channels.
3 – 4 Low	Some incremental cost (e.g. small vendor feeds, simple surveys), but clearly minor relative to typical project/opex budgets.
5 – 6 Moderate	Noticeable recurring spend: specialised collection infrastructure, paid sources, or significant internal labour. Still affordable at scale.
7 – 8 High	Substantial annual spend (e.g. major vendor contracts, dedicated sensor networks, sizable in‑house data‑creation teams). Stopping acquisition would be a budget decision.
9 – 10 Very high / strategic investment	Very large, long‑term investments (e.g. multi‑million external licences, extensive proprietary collection infrastructure) that require explicit executive sponsorship.

Processing / storage / maintenance cost (weight 25%)

Question: What does it cost to process, store, and technically maintain this data?

Score	Descriptor
1 – 2 Very low	Small volume, simple structure. Processing fits into existing pipelines; storage costs negligible. No special SLAs or performance requirements.
3 – 4 Low	Some dedicated ETL/ELT and storage, but incremental compute and storage spend is minor; no special technologies needed.
5 – 6 Moderate	Non‑trivial processing and storage footprint (e.g. large datasets, moderate complexity). Requires tuned pipelines and some capacity planning.
7 – 8 High	Significant infrastructure and engineering effort: large volumes, complex pipelines, or strict performance/availability demands. May drive dedicated clusters or premium storage tiers.
9 – 10 Very high	Processing and storage dominate part of the data platform budget (e.g. petabyte‑scale, real‑time, high‑availability requirements). Changes to this asset materially affect platform TCO.

3. Quality / governance / compliance cost (weight 25%)

Question: How much does it cost to ensure this data is clean, governed, and compliant?

Score	Descriptor
1 – 2 Very low	Little or no dedicated quality, governance, or compliance effort beyond standard controls. Few, if any, regulatory constraints.
3 – 4 Low	Some data quality rules and ownership defined, but limited investment in tools or specialist staff. Compliance requirements light.
5 – 6 Moderate	Regular data quality work, monitoring, and stewardship. Some specialised tools or processes. Moderate regulatory/privacy obligations.
7 – 8 High	Substantial ongoing investment in DQ tooling, stewards, audits, and compliance processes (e.g. frequent DSAR, sector regulation). Non‑compliance would be costly.
9 – 10 Very high	Heavy governance/compliance regime: strict regulation, complex consent models, intensive audits. Maintaining compliant, high‑quality data is a major cost line.

4. Replacement / loss impact (weight 20%)

Question: What would it cost to replace the data or absorb the impact if it were lost/compromised?

Score	Descriptor
1 – 2 Trivial	Easy and cheap to recreate or reacquire. Loss would cause minor disruption only. No meaningful financial or operational impact.
3 – 4 Low	Some effort to replace (e.g. re‑run processes, re‑buy feeds), but costs are limited and timelines short. Impact manageable.
5 – 6 Moderate	Replacement would take time and non‑trivial money; some downtime or degraded performance. Possible minor contract or SLA penalties.
7 – 8 High	Replacement costly and slow; loss would cause sustained operational or commercial issues, and may trigger material penalties or emergency spend.
9 – 10 Very high / critical	Replacement is extremely costly or impossible. Loss or corruption would force major remediation programs, significant fines, brand damage, or long‑term business impact.

Computing the composite CVI score

For a dataset, assign 1–10 for each CVI dimension using the descriptors.
Compute a weighted average:

𝐶𝑉𝐼_composite=0.30⋅𝐴+0.25⋅𝑃+0.25⋅𝑄+0.20⋅𝑅

Where:

𝐴 = Acquisition / production cost score
𝑃 = Processing / storage / maintenance cost score
𝑄 = Quality / governance / compliance cost score
𝑅 = Replacement / loss impact score

Separately, maintain the actual monetary CVI (annual cost plus estimated loss impact) in your financial model; use the 1–5 rubric as a normalised, comparable index across assets.