The Data Governance Acid Test Sense Check: Statistical Physics

The Data Governance Acid Test. Author with Gemini

If we have lined up our management, accounting and economics cases, have moved our data governance to a profit centre and know how to get data assets onto the balance sheet, then we’ve avoided the floor of our habitable range and worked out our maturity pathway to using portfolio techniques to optimise our data governance regime.

The next question we need to ask ourselves is how much information and governance a person, firm or network can actually hold before they get saturated or start losing structural integrity. There is a limit to what I can know, a limit to what my smart network can know, a limit to what my organisation can know and a limit to what my group of organisations can know.

Even if we obtain the highest possible informational efficiency, there is still a limit. If Elon Musk bought all the world's organisations, there would still be a limit. And yes, if he filled space with data centres, there would still be that limit. To describe that limit I turn to statistical physics described by César Hidalgo in Why Information Grows. Hidalgo’s central thesis is that economic growth is the growth of physical order (information), constrained by the computation capacity of humans and the cost of connecting them.

This was initially a notional test, but then quantum computing came along and now our nonrivalrous superscaling data starts to look quite different. The question is, can our data governance regime cope?

Personbytes, firmbytes and saturation in data governance

Hidalgo’s insight is to treat knowledge and knowhow as bounded capacities embodied in people and networks. Knowledge involves relationships or linkages between entities and knowhow is the embodied tacit computational capacity that allows us to perform actions.

A personbyte is the maximum knowledge + knowhow one human nervous system can carry.
A firmbyte is the maximum knowledge + knowhow that can be effectively embodied in a single organisation before it becomes more efficient to split work across a network of organisation.

These physical limits have significant implications for our system of data governance. Reflecting on those implications leads to the rational decision of observing stopping points in scalar strategies and avoiding holding too much data. Information grows through the combination of diverse capabilities, so if we work within a complex industry we need all the unique pieces. This goes to Ashby’s Law: the effective limit of a regulatory system is set by how well the regulator can model and simulate the regulated system.

For individuals, learning is experiential and finite. When the complexity of a task exceeds one personbyte, the individual cannot perform it alone. This insight puts the shadow AI problem in a different light: we can interpret it as a personbyte saturation event. The individual’s administrative load has exceeded their biological capacity to process it and they’re reaching for ChatGPT to augment their personbyte capacity using external computation, similar to how we use tools to lift heavy weights.

This suggests if we are overly conservative with AI, we force the task back into the biological limit. Since the limit is exceeded, the result is either failure via burnout and errors or simplification because tasks don't get done. Our response here is to form networks but here we run into the firmbyte limit. If interaction is hard (e.g. via high DWL), the network fragments. This is how we can get large organisations with low information capacity because most of the personbytes available are consumed by internal procedures.

Governance designs must respect cognitive bandwidth.

A Data Steward, Data Protection Officer, or Product Owner can only internalise so many policies, rules, standards, guidelines, models, frameworks, processes, procedures, exceptions and metrics before their personbyte is saturated. This would be true if the person in question was Nikola Tesla, Albert Einstein or Stephen Hawking. At some point they will get saturated.

This means if our governance model requires the individual to track 20+ metrics (IVI, BVI, PVI, CVI, EVI, n data governance specialisms, n local policy variants, etc.), we’re pushing beyond a personbyte and guaranteeing some degree of non-compliance or tick-boxing. Given that our individual might also have three devices, a stuffed calendar and a concatenation of deadlines, then we will reach saturation sooner than later.

“One giant platform” is physically suspect.

A single mega‑platform that tries to internalise all know‑how about all data will eventually hit the firmbyte limit, even if they’re wildly successful. As we add to surface area, volume multiplies and with all that interactional complexity the system will inevitably slow. If we are governing in a pre-AI mode, then we will be solving many problems via administration and bureaucracy eats personbytes, the network becomes large but low in effective knowledge density.

This pushes us to reconsider networks, because information can be routed through the links which connect the networks and so boost our effective person- and firm-bytes. The science we can turn to here is modern portfolio theory which will help us manage the portfolio effects which are also inevitable as our network grows. Whatever the case, we don’t want to be sitting on a huge pile of data.

Use personbytes/firmbytes as explicit design constraints.

As I noted in my last post, the data governance professions include the likes of architecture, assurance, audit, cybersecurity, data, digital, ethics, information, legal, policy, privacy, risk, research and, in New Zealand, Māori. That’s a lot of internal complexity to negotiate and the optima is going to come from either design or luck (and I know which one I rely on).

This means when we’re structuring the roles of data governance personnel, it’s in our interest to cap the number of governance concepts that role is expected to master. Given that each concept is also usually internally complex, working through 3–5 core metrics per persona will keep that role in the habitable zone (e.g., IVI/BVI/PVI for the CDO).

If the design problem is organisational, then we want to ask the question as to how many distinct domains / products / controls this unit can manage before they get saturated. This gives us a firmbyte limit, which means we know the limit at which we need to split into specialised teams or extend into external networks. This means we’d have a hard decision rule to say no to the budget bid for 1,000 advisors the DPO thinks they need to get effective coverage (given their existing business model).

There are two important implications here. The first is that if we need to get effective coverage (and we need to in order to remain a profit centre) then we must form the group of data governance professionals into a network and govern that network efficiently. The second is our regime shifts administration load across organisational boundaries to a partner with a lower firmbyte capacity, then we will swamp them immediately and drive all the potential value out.

This is why I push for simple, role‑specific metrics and why I warn against over‑controlled governance regimes that consume more personbytes than it adds in knowledge and knowhow.

Link costs, standards, and the compliance wobble

Hidalgo makes a key rule explicit: network capacity is limited by the cost of links, and those costs can be made more or less expensive. When links are cheap, large networks can form, and more knowledge + knowhow can be distributed and recombined. This is what we want. But we don’t want expensive links; bureaucracy, distrust, incompatible language/standards all act to make links more expensive. We should expect networks to fragment, less knowledge + knowhow to be embodied and more personbytes to be consumed by internal processes.

The key insight is that trust is the mechanism that lowers link cost, allowing larger, more complex networks to form without expensive verification. This is very convenient for data governance regimes as there are very few which don’t explicitly address trust as a key aim.

The deadweight compliance and Millennium Bridge effects are high link‑cost regimes.

Where we find over‑engineered approvals, bespoke DPIAs, duplicative sign‑offs, we should expect expensive interpersonal and firm links, which Hidalgo shows reduce a network of firms’ ability to accumulate knowledge + knowhow. Excessive governance raises the cost of links, so more personbytes are spent navigating procedure than creating new information. This is why overcontrolled data governance regimes destroy PVI/EVI.

When we privilege ‘simplicity’, we drive out requisite complexity and replace information with entropy. The lesson here is that a monoculture reduces the combinatorial complexity of the network. If every node is identical (isomorphic compliance), the network loses the ability to encode complex, differentiated information. This means we design in systemic fragility and reduce the total information content of the sector. The compliance wobble is the system collapsing to a lower energy state because it lacks the diversity to handle complex shocks.

Standards are good—until they induce unstable synchrony.

When our governance and management protocols apply standards, use repeatable processes, exchange via shared rules, then we lower our link costs and enable larger productive networks. My Millennium Bridge post talked about how over‑standardisation can create destructive synchrony and single points of failure.

The design principle is this: use standards to reduce link cost, but preserve controlled diversity in architectures and vendors. Using the Infonomics grammar, high deadweight loss (DWL) + high Cost Value of Information generates high link costs in the form of excessive bureaucracy, lengthy approvals and incompatible processes). A governance reform that simplifies link contracts, templates and standards would show up as: reduced DWL, reduced CVI and higher observed Performance Value of Information/Economic Value of Information.

Out‑of‑equilibrium systems and information growth

Why Information Grows is grounded in physics, hence Hidalgo’s core message: information grows naturally in out‑of‑equilibrium systems in a steady state and which minimise entropy production. Our Earth is a ‘singularity of physical order’ and economies are systems that amplify knowledge and knowhow through products. One thing our data governance regime can’t do is supplant physics and our fundamental physical limits.

The AI economy is an out‑of‑equilibrium information engine.

In Hidalgo’s poetic phrase, data products and AI models are ‘crystals of imagination’, embodied information that augments human capacity and allows accumulation of more information. We can understand high‑IVI / high‑BVI / high‑PVI assets as being examples of such ‘crystals’ and the portfolio governance view then becomes tracking where information is accumulating productively and where it is being dissipated. This is one reason why we don’t want to over-saturate people, business units or organisations.

Governance as entropy control, not entropy maximisation.

In physical systems, steady states can minimise entropy production. Information is the opposite of entropy and consists of uncommon, highly correlated configurations. Over-controlled governance practices (such as over‑sanitisation or over‑standardisation) effectively push systems toward high‑entropy average states of low information value. This is where we end up with bland over‑anonymised data, identical compliance architecture and reduced variety of use‑cases. In Infonomics terms, this looks like IVI preserved in a narrow sense, but BVI/PVI/EVI ends up crushed and there is a high residual DWL.

Three physics‑friendly tests for governance reforms. For any proposed governance control we can ask whether it:

increases the economy’s capacity to compute (more useful knowledge and knowhow distributed across networks), or does it consume personbytes in administrivia?
enables more uncommon correlated configurations (such as new products, novel data combinations and varied architectures), or does it force everything toward a high‑entropy low-information average?
lower the cost of links while preserving diversity, or does it raise link costs and push systems towards a brittle and low resilience monoculture?

If the answer is ‘more admin, fewer links and more average states’, then we can argue - physically - that the data governance strategy is anti‑information and anti‑growth.

To improve the data governance strategy, it will pass a physics sanity check if:

it respects the hard bounds of personbytes and firmbytes.
it aims to reduce link costs without enforcing brittle synchrony.
it treats governance as a way to minimise entropy production while allowing information to grow.