Question: What does it cost to produce, maintain and protect this data?
Level 1 dimensions
- Acquisition / production cost
- Processing / storage / maintenance cost
- Quality management cost
- Compliance / security overhead
Level 2 indicators and observables
- Acquisition / production
- Annual spend on collection channels (sensors, surveys, vendors).
- Internal labour time attributed to data creation.
- Licensing fees for external data.
- Processing / storage / maintenance
- ETL/ELT pipeline costs (compute, engineering time).
- Storage and backup costs (per TB per month, by tier).
- Number and cost of system integrations that exist purely to support this dataset.
- Quality management
- Hours / FTEs devoted to data cleaning, deduplication, reconciliation.
- Cost of data quality tools used primarily for this asset.
- Re‑work costs due to data errors (e.g. corrections, reversals).
- Compliance / security overhead
- Incremental costs for privacy/security controls specific to this dataset (e.g. high‑tier cloud/security SKUs).
- DSAR / consent management volumes linked to this data.
- Audit costs attributable to this asset.
How to use
- Express CVI as annual total, and per‑unit (per record, per customer, per decision).
- High CVI is not bad in itself; it becomes problematic when it exceeds medium‑term EVI.
CVI scoring rubric (1–5)
Here’s a concrete 1–10 scoring rubric for CVI (Cost Value of Information), aligned with the definition “what it costs to acquire, maintain, replace, or lose the data.”
Assume CVI is the weighted sum of four dimensions (all measured in monetary terms over a defined period, e.g. per year):
- Acquisition / production cost – 30%
- Processing / storage / maintenance cost – 25%
- Quality / governance / compliance cost – 25%
- Replacement / loss impact – 20%
We can invert the scoring (1 = low cost, 5 = high cost) if we want to treat CVI purely as a monetary figure; here we treat 5 = economically heavy data.
Acquisition / production cost (weight 30%)
Question: How expensive is it to obtain or generate this data?
Score | Descriptor |
1 – 2 Very low | Minimal direct spend. Data is mostly generated as a by‑product of existing operations; no separate licences or collection channels. |
3 – 4 Low | Some incremental cost (e.g. small vendor feeds, simple surveys), but clearly minor relative to typical project/opex budgets. |
5 – 6 Moderate | Noticeable recurring spend: specialised collection infrastructure, paid sources, or significant internal labour. Still affordable at scale. |
7 – 8 High | Substantial annual spend (e.g. major vendor contracts, dedicated sensor networks, sizable in‑house data‑creation teams). Stopping acquisition would be a budget decision. |
9 – 10 Very high / strategic investment | Very large, long‑term investments (e.g. multi‑million external licences, extensive proprietary collection infrastructure) that require explicit executive sponsorship. |
Processing / storage / maintenance cost (weight 25%)
Question: What does it cost to process, store, and technically maintain this data?
Score | Descriptor |
1 – 2 Very low | Small volume, simple structure. Processing fits into existing pipelines; storage costs negligible. No special SLAs or performance requirements. |
3 – 4 Low | Some dedicated ETL/ELT and storage, but incremental compute and storage spend is minor; no special technologies needed. |
5 – 6 Moderate | Non‑trivial processing and storage footprint (e.g. large datasets, moderate complexity). Requires tuned pipelines and some capacity planning. |
7 – 8 High | Significant infrastructure and engineering effort: large volumes, complex pipelines, or strict performance/availability demands. May drive dedicated clusters or premium storage tiers. |
9 – 10 Very high | Processing and storage dominate part of the data platform budget (e.g. petabyte‑scale, real‑time, high‑availability requirements). Changes to this asset materially affect platform TCO. |
3. Quality / governance / compliance cost (weight 25%)
Question: How much does it cost to ensure this data is clean, governed, and compliant?
Score | Descriptor |
1 – 2 Very low | Little or no dedicated quality, governance, or compliance effort beyond standard controls. Few, if any, regulatory constraints. |
3 – 4 Low | Some data quality rules and ownership defined, but limited investment in tools or specialist staff. Compliance requirements light. |
5 – 6 Moderate | Regular data quality work, monitoring, and stewardship. Some specialised tools or processes. Moderate regulatory/privacy obligations. |
7 – 8 High | Substantial ongoing investment in DQ tooling, stewards, audits, and compliance processes (e.g. frequent DSAR, sector regulation). Non‑compliance would be costly. |
9 – 10 Very high | Heavy governance/compliance regime: strict regulation, complex consent models, intensive audits. Maintaining compliant, high‑quality data is a major cost line. |
4. Replacement / loss impact (weight 20%)
Question: What would it cost to replace the data or absorb the impact if it were lost/compromised?
Score | Descriptor |
1 – 2 Trivial | Easy and cheap to recreate or reacquire. Loss would cause minor disruption only. No meaningful financial or operational impact. |
3 – 4 Low | Some effort to replace (e.g. re‑run processes, re‑buy feeds), but costs are limited and timelines short. Impact manageable. |
5 – 6 Moderate | Replacement would take time and non‑trivial money; some downtime or degraded performance. Possible minor contract or SLA penalties. |
7 – 8 High | Replacement costly and slow; loss would cause sustained operational or commercial issues, and may trigger material penalties or emergency spend. |
9 – 10 Very high / critical | Replacement is extremely costly or impossible. Loss or corruption would force major remediation programs, significant fines, brand damage, or long‑term business impact. |
Computing the composite CVI score
- For a dataset, assign 1–10 for each CVI dimension using the descriptors.
- Compute a weighted average:
𝐶𝑉𝐼composite=0.30⋅𝐴+0.25⋅𝑃+0.25⋅𝑄+0.20⋅𝑅
Where:
- 𝐴 = Acquisition / production cost score
- 𝑃 = Processing / storage / maintenance cost score
- 𝑄 = Quality / governance / compliance cost score
- 𝑅 = Replacement / loss impact score
- Separately, maintain the actual monetary CVI (annual cost plus estimated loss impact) in your financial model; use the 1–5 rubric as a normalised, comparable index across assets.