An Effective Data Governance Framework for the AI Act Era

Q: What is a data governance framework, and why is implementing it urgent now?

Un data governance framework es el conjunto de políticas, procesos, roles y tecnologías que regulan cómo una organización gestiona, accede y usa sus datos. Con la entrada en vigor del AI Act europeo, su implementación pasa de ser una buena práctica a una obligación legal para empresas que usen sistemas de IA de alto riesgo.

Q: What roles are essential in a data governance framework?

Los roles mínimos son: Chief Data Officer (CDO), Data Stewards por dominio, Data Owners con responsabilidad formal, un equipo de Data Engineering para implementación técnica y un Data Governance Committee para decisiones estratégicas. En entornos regulados, el DPO debe tener visibilidad directa sobre el framework.

Q: How does the AI Act affect my company's data governance?

El AI Act exige que los sistemas de IA de alto riesgo dispongan de datos de entrenamiento documentados, linaje trazable, métricas de calidad auditables y gobernanza de acceso. Tu framework debe cubrir no solo los datos operacionales, sino también los datasets usados en modelos de IA, con evidencia documental que soporte inspecciones regulatorias.

Q: What data lineage tools are best to start with?

Para equipos en cloud con Snowflake o Databricks, OpenMetadata es la opción open source más sólida. Para soluciones enterprise, Collibra, Alation o Microsoft Purview son las más maduras. Para la mayoría de equipos que empiezan, dbt más OpenMetadata ofrece la mejor relación entre coste, automatización y curva de adopción.

Q: How long does it take to implement a complete data governance framework?

Un framework mínimo viable puede estar operativo en 3 a 6 meses con voluntad organizativa y recursos dedicados. Un framework maduro con automatización completa y cobertura total de dominios requiere entre 12 y 24 meses según el tamaño de la organización y su deuda técnica de datos.

What a Data Governance Framework Is Today (and What It Isn't)

A data governance framework is the structured set of policies, processes, roles, and technologies that determines how an organization defines, manages, protects, and uses its data across its entire lifecycle. It's not a cataloging project with an end date. It's not installing a tool and calling it a "catalog." It's not naming someone "data owner" without giving them real authority or time to exercise it.

In 2026, a complete framework spans six dimensions that must operate in parallel with cross-visibility. If each lives in a silo, what you have isn't governance: it's the illusion of governance, which is almost worse because it creates false confidence:

Data governance architecture: structure of domains, platforms, and access layers.
Data governance roles: who decides, who executes, who audits, and with what authority.
Data quality management: rules, thresholds, alerts, and continuous remediation processes.
Metadata automation: frictionless, automated capture, enrichment, and publishing of metadata.
Data lineage tools: traceability from origin to the report or AI model.
AI Act compliance: dataset documentation, bias management, and auditable records of automated decisions.

Real Problems I See in Complex Environments

Managing data in a group with multiple airlines — each with its own stack, its own teams, and its own business definitions — you learn to spot structural governance failures before they show up in an audit report. These are the ones that recur most often, regardless of industry:

The same KPI, five different definitions. In multi-entity environments without a centralized business glossary, each team calculates its metrics independently. The result is a leadership meeting that starts by arguing over which number is correct instead of which decision to make.
Catalogs created and abandoned. Budget gets invested in a tool, an initial metadata load is done, and three months later nobody updates it because "there's no time." Without a maintenance process built into day-to-day work, the catalog becomes a liability, not an asset.
Inherited RBAC with no review. Access groups in Snowflake or Power BI configured in year one are still active in year three, with users who changed roles, left the company, or have privileges that no longer match their function. This is both a security risk and an audit problem.
Broken lineage between layers. Data arrives from a transactional source, goes through several transformations, and lands on an executive dashboard. But if someone asks "where exactly does this number come from?", the answer is silence or a multi-day investigation.
One-off data quality management. Quality is validated at the start of the project, a report is published, and it isn't measured again until the next audit. Data quality isn't a state; it's a continuous process.

These problems aren't exclusive to small or technically immature organizations. I've found them in environments with thousands of users, modern cloud stacks, and well-trained data teams. The root cause is almost always the same: governance is treated as a compliance project with an end date, not as a permanent organizational capability.

How the AI Act Changes the Rules of the Game

AI Act compliance introduces a level of documentary rigor few organizations have today. Since August 2, 2026, AI systems classified as high-risk — HR tools, credit scoring, candidate selection, critical infrastructure control — must meet a set of requirements on the data feeding them.

What matters here is that the AI Act doesn't just regulate the model: it regulates the data the model was trained, validated, and monitored with. This means your data governance framework must explicitly extend to AI datasets:

Training dataset documentation: origin, transformations applied, selection and exclusion criteria.
Bias management: representativeness analysis, fairness metrics, and ongoing monitoring evidence.
Full traceability: lineage covering everything from the original source to the model deployed in production.
Automated decision logging: auditable logs of when and how the model made decisions affecting people.
AI data lifecycle management: retraining processes, dataset versioning, and controlled model retirement.

If you already have an operational data governance framework with lineage, a catalog, and quality control, adapting to the AI Act is incremental and manageable. If you don't, the cost of adaptation multiplies. Penalties for non-compliance can reach 3% of global annual turnover for provider obligation infringements, and up to 7% for infringements of the prohibited practices regime.

For a deeper look at the Regulation's specific deadlines and obligations, see AI Act Key Dates 2025-2027: Complete Regulatory Calendar.

Data Governance Framework, Step by Step

1. Define the Data Governance Architecture Before Touching Tools

The first mistake organizations make is buying a tool before the structure is clear. Data governance architecture answers three questions: what are my data domains, who's sovereign over each one, and how are they connected and what access layers exist? Without this map, any catalog you install will be a metadata repository without context, useful to nobody.

In multi-entity environments like the one I managed at IAG, this architecture is designed in two layers: a corporate layer with master domains (Customers, Product, Operations, Finance) and shared canonical definitions; and a local layer per entity with its own adaptations, always subordinate to corporate standards.

2. Formalize Data Governance Roles With Real Authority

Data governance roles without formal authority are decorative. The Data Steward who can't make any decision without going through three committees can't do their job. The Data Owner with no dedicated time to review their domain's quality is a name on a RACI, not an owner.

The minimum viable structure includes: Data Owners per domain with formal accountability and dedicated time, Data Stewards with operational decision-making capacity, a Data Governance Committee meeting monthly with a real agenda, and a Data Engineering team that implements and maintains the framework's technical infrastructure. In regulated environments, the DPO must have direct visibility into the framework, not just into GDPR.

3. Implement a Metadata Catalog With Metadata Automation

Metadata automation is what makes a catalog sustainable long-term. If populating and maintaining metadata depends on intensive manual work, the catalog dies within weeks. The goal is for technical metadata (schemas, types, lineage) to be captured automatically from Snowflake, dbt, or your data platform, with Data Stewards only needing to enrich the business context: definitions, owners, sensitivity classification, and quality rules.

In practice, this means integrating the catalog with your data pipelines from day one, not as an afterthought. A catalog populated retroactively is never up to date.

4. Activate Data Lineage Tools With End-to-End Coverage

Lineage is the proof of life of data. It must cover the full journey: transactional source → ingestion → transformation → semantic model → consumption (dashboard, API, or AI model). With Snowflake and dbt, technical lineage can be almost fully automated. Business lineage — what business transformation each step represents, what rules were applied, who approved them — requires initial manual enrichment, but maintains itself once established.

Data lineage tools are also the basis for the documentation the AI Act requires for high-risk systems. No lineage, no traceability. No traceability, no compliance.

5. Establish Data Quality Management as a Continuous Process

Data quality management isn't an ETL validation. It's a data SLA: a set of defined rules, acceptance thresholds agreed with the business, automatic alerts when they're breached, and remediation processes with an assigned owner. Tools like dbt tests, Great Expectations, or Soda let you define these rules as code, integrate them into pipelines, and publish results on dashboards visible to Data Owners.

In environments with multiple data sources — like those found in groups with several operating entities — quality rules are defined on two levels: global rules applicable to all domains and local rules specific to each entity or market.

6. Govern Access With Granular RBAC and Periodic Review

Access management is one of the most neglected dimensions of governance and one of the most critical in audits. Implementing RBAC in Snowflake with Row-Level Security in Power BI's semantic layer ensures every user sees exactly what they should, no more. Automating access request and approval workflows adds full traceability: who requested, who approved, when, and why.

Equally important is periodic review. Active access should be reviewed at least quarterly, with documented evidence. An audit dashboard showing active access, last review dates, and users with recently unreviewed privileges turns this task into a manageable process, not a semi-annual investigation project.

7. Document AI Datasets Per the AI Act

For every high-risk AI model, create a dataset spec sheet: source, extraction date, transformations applied, representativeness metrics, owner, and last review date. This document is the first line of defense in an AESIA or AEPD inspection. It doesn't need to be complex; it needs to be complete, current, and accessible to the compliance team.

Recommended Tools by Maturity and Stack

There's no perfect tool for every context. The choice depends on the technology stack, budget, and existing governance maturity level. This table summarizes the most solid options on the market in 2026:

Tool	Category	Best for	Note
OpenMetadata	Catalog + Lineage	Cloud teams, SMEs	Very active open source; native integration with Snowflake and dbt
Collibra	Enterprise governance	Large corporations	Very complete stewardship workflow; high cost
Microsoft Purview	Catalog + Compliance	Microsoft ecosystem	Native integration with Azure, Power BI and M365; AI Act features on roadmap
dbt + dbt Docs	Lineage + Quality	Data engineering teams	Native tests, automatic lineage; a must in any modern stack
Apache Atlas	Metadata + Lineage	Legacy Hadoop/Hive environments	Powerful but complex; consider OpenMetadata as a more modern alternative
Alation	Catalog + Collaboration	Organizations with a strong data culture	Excellent UX for business users; mid-to-high price
Great Expectations / Soda	Data Quality	Modern data pipelines	Perfect for defining quality SLAs as code and integrating into CI/CD

For teams working with Power BI who want to dig deeper into the BI Ops and access control layer, the article ISO 42001 vs NIST AI RMF: How to Choose Your AI Governance Framework complements this guide well from the management framework angle.

What Works in Practice: Cases From Complex Environments

Case 1 — Federated Governance in a Multi-Airline Group

Managing data in a group with seven airlines operating under the same holding company poses a challenge most frameworks don't account for: each entity has its own definition of key concepts like "active passenger," its own quality rules, and its own data teams with different cultures. Implementing a single, centralized framework doesn't work; it generates pushback and immediate workarounds.

The model that worked was a federated, two-layer data governance architecture: a corporate layer with master domains, canonical definitions, cross-entity lineage in Snowflake, and global access policies; and a local layer per airline with its own adaptations, always subordinate to corporate standards. The centralized catalog with automated ingestion from Snowflake and the semantic models in Power BI drastically reduced discrepancies between reports in the first six months of operation.

Case 2 — RBAC and Access Auditing in Snowflake and Power BI

In auditable environments with sensitive operational data, access management can't depend on emails and verbal approvals. Implementing granular RBAC in Snowflake — with roles by business domain, not individual user — combined with Row-Level Security in Power BI's semantic layer ensures every profile accesses exactly what it needs, with no over-privilege.

The differentiating element was the audit dashboard: a Power BI view showing active access per role in real time, last review dates, and users with privileges pending quarterly validation. This turned access review from a dreaded manual task into a twenty-minute process with documented evidence.

Case 3 — Metadata Automation in Commercial BI Teams

In BI consulting projects for commercial teams, one of the main bottlenecks is the time analysts spend looking for a metric's correct definition or a field's origin. The solution was integrating dbt Docs as a living documentation layer — automatically updated with every deploy — and publishing business metadata directly into Power BI Service datasets, with descriptions, owners, and sensitivity classification visible to any user.

The most tangible result wasn't technical: it was the reduction in onboarding time for new analysts and fewer questions to the data team about definitions. When documentation lives where the user works, it gets used. When it's in a separate wiki, it doesn't.

Common Mistakes Adopting a Data Governance Framework

These are the most common failure patterns observed in the market, both in organizations just starting out and in those that have spent years trying to consolidate their governance:

Starting with the tool, not the structure. Buying Collibra or Purview before domains, roles, and policies are defined guarantees an expensive, empty catalog. The tool amplifies what already exists; if nothing exists, it amplifies chaos.
Data governance roles with no authority or time. The Data Steward "is accountable" but can't make any decision without three committee approvals and spends 10% of their day on governance. Accountability without authority or resources doesn't produce results; it produces frustration and quiet abandonment of the role.
Data quality as a one-off event. It's measured at project kickoff, a findings report is published, and it isn't reviewed again until the next audit. Data quality is perishable: a process without continuous monitoring degrades quickly.
Assuming the AI Act is only for Big Tech. Any company using risk scoring, candidate selection systems, predictive analytics in essential services, or automated decisions affecting people may fall within the regulation's scope. Company size isn't the criterion; the type of AI use is.
Technical lineage with no business context. Knowing data comes from table X of system Y isn't enough. What the business needs — and what the AI Act requires — is understanding what business transformation that data represents, what rules were applied, who approved them, and whether the data is subject to regulatory restrictions.
Not measuring governance ROI. Without visible metrics — reduced quality incidents, time to resolve data questions, number of unauthorized access attempts detected — the governance budget gets cut in the next planning cycle. What isn't measured isn't defended.

Conclusion: Data Governance Is a Competitive Advantage

A well-implemented data governance framework doesn't just reduce regulatory risk under the AI Act. It speeds up analyst onboarding, improves trust in data for decision-making, reduces production quality incidents, and creates the documentary infrastructure that allows AI use to scale sustainably.

The difference between organizations that move forward and those that stall isn't budget or technology: it's whether they treat governance as a project with an end date or as a permanent organizational capability. Those who understand it this way are the ones who, in the coming years, will be able to deploy high-risk AI with confidence, speed, and documentary evidence for any inspector.

If you're figuring out where to start or how to mature your current framework, the first step is always the same: an honest diagnosis of where you stand. Without that map, any investment in tools or processes is likely to get lost in already-accumulated technical and organizational debt.

Checklist: An AI Act-Ready Data Governance Framework

Data domains defined and assigned to Data Owners with a formal RACI.
Operational data governance roles with real authority, dedicated time, and periodic meetings.
Active metadata catalog with automated ingestion from data platforms.
End-to-end data lineage documented: transactional source → transformation → consumption.
Data quality management with defined rules, thresholds, alerts, and remediation owners.
RBAC implemented with RLS in the semantic layer and documented quarterly access review.
Dataset spec sheets for all high-risk AI models (AI Act, Art. 10).
Auditable automated decision logs accessible to AESIA and the AEPD.
Audit dashboard with governance KPIs visible to stakeholders and compliance.
Governance ROI measured and communicated quarterly to leadership.

Frequently Asked Questions

What is a data governance framework, and why is implementing it urgent now?

A data governance framework is the set of policies, processes, roles, and technologies that govern how an organization manages, accesses, and uses its data. With the AI Act coming into full effect on August 2, 2026, implementing it goes from being a best practice to a legal obligation for companies operating high-risk AI systems. Additionally, in an advanced analytics environment, data quality and traceability are directly proportional to the quality of business decisions.

What roles are essential in a data governance framework?

The minimum roles are: Chief Data Officer (CDO) or equivalent with executive authority, Data Stewards per business domain with operational decision-making capacity, Data Owners with formal accountability for data quality and use, a Data Engineering team for technical implementation, and a Data Governance Committee for strategic decisions. In regulated environments, the DPO must have direct visibility into the framework.

How does the AI Act affect my company's data governance?

The AI Act requires high-risk AI systems to have documented training data, traceable lineage, auditable quality metrics, and access governance. This means your framework must cover not just operational data, but also datasets used in AI models, with documentary evidence supporting AESIA and AEPD inspections. Fines for non-compliance reach up to 3% of global annual turnover.

What data lineage tools are best to start with?

For cloud teams with Snowflake or Databricks, OpenMetadata is the most solid and active open source option in 2026. For supported enterprise solutions, Collibra, Alation, or Microsoft Purview are the most mature. For most teams just starting out, dbt plus OpenMetadata offers the best balance of cost, automation level, and a reasonable adoption curve.

How long does it take to implement a complete data governance framework?

A minimum viable framework — defined roles, basic catalog, access policies, and lineage for critical domains — can be operational in 3 to 6 months with organizational will and dedicated resources. A mature framework with metadata automation, continuous data quality management, and full domain coverage requires 12 to 24 months, depending on organization size, accumulated technical debt, and the level of cultural change required.

Cómo implementar un An Effective Data Governance Framework for the AI Act Era