
This guidance document examines how two complementary technical mechanisms — data provenance and machine unlearning — can support compliance with EU data governance obligations while preserving the utility of legitimate data resources.
As the European Union accelerates its data strategy through landmark legislation such as the Data Act and the European Health Data Space Regulation (EHDS), new governance challenges are emerging around the traceability, accountability, and lawful reuse of data in complex, multi-actor ecosystems.
The report provides a detailed analysis of how provenance frameworks can enable systematic tracking of data origin, legal basis, and transformation history across the full data lifecycle, including within machine learning pipelines. It further explores machine unlearning as a targeted compliance mechanism for removing the influence of unlawfully obtained or processed data from analytical models, without requiring the invalidation of entire datasets. Both exact and approximate unlearning approaches are assessed, with particular attention to the trade-offs between formal guarantees, computational cost, and scalability.
Building on this technical and legal analysis, the document sets out concrete policy recommendations for integrating provenance-by-design into data sharing infrastructures, establishing machine unlearning as a recognised compliance tool under EU law, and clarifying liability in interconnected data ecosystems. The report is intended for policymakers, regulatory authorities, and technical practitioners working at the intersection of data governance, AI regulation, and digital infrastructure.
by
Pratiksha Ashok, Inge Graef, Pradeep Kumar, Patricia Prüfer, Berkay Serceoglu, Teun Siebers
/
Read more
