Removing Corporate Data from AI Models
If multiple collaborating companies feed data into an AI, the learning model will contain an especially wide variety of data. This improves the quality and reliability of the results it generates. Companies rely on federated, decentralized training approaches to retain data sovereignty. In this approach, the data is not sent to a central server. Instead, it is fed into a local copy of the AI model. The partners then exchange only abstract parameters rather than the actual data. This enables each partner to provide data to the AI without having to disclose it to the other companies.
But there is still a problem: When a company leaves the collaborative project, its data and parameters still remain deeply embedded in the AI model. It has previously been nearly impossible to extract this data from the "black box" of the AI without compromising the quality of the results, such as in predictions or simulations.
A clean sweep
In collaboration with industrial partner Fujitsu Research, the Fraunhofer Institute for Software and Systems Engineering ISST in Dortmund has developed a solution: unlearning for decentralized, federated AI collaborative projects. This method goes back through the history of the step-by-step AI learning process to the point where the relevant partner introduced its data.
AI training resumes from this point—only without the data from the partner who has withdrawn. This method ensures a clean sweep of the AI, removing all the information and data from the company leaving the collaboration. Retraining the model with the stored parameters is also more efficient than the first time through.
Fraunhofer ISST research scientist Florian Zimmer explains: "The learning model isn't rebuilt all the way back from zero with the remaining partners’ data. Relatively little effort is thus needed to restore the performance and integrity of the AI. Depending on the application, a certain loss in the quality of the results is unavoidable due to the removal of part of the data, but this is compensated for by further AI learning as time goes on."
Promoting greater use of AI
The federated unlearning method for decentralized AI models allows companies to engage in collaborative projects without reservations. They can draw on the enormous potential of AI to very efficiently develop high-quality solutions. At the same time, they can rest assured of their ability to withdraw from the collaborative project without having to leave behind their own proprietary data. This also benefits companies who are required to handle data in compliance with regulatory conditions such as the General Data Protection Regulation (GDPR).