User privacy is now top-of-mind for every company across any industry. As such, Google has released an open source version of its differential privacy library to developers and organizations to help mask their user’s personal data, the company said in a blog post.
Differentially-private data analysis enables organizations to leverage their data into actionable results without allowing an individual's data to be distinguished or re-identified. In releasing its open source set of privacy tools, Google’s intention is to make it easier for companies to protect their user’s privacy.
"If you are a health researcher, you may want to compare the average amount of time patients remain admitted across various hospitals in order to determine if there are differences in care,” wrote Miguel Guevara, product manager in Google’s privacy and data protection office. "Differential privacy is a high-assurance, analytic means of ensuring that use cases like this are addressed in a privacy-preserving manner,” he said.
According to Guevara, here are some of the key features of the library:
- Statistical functions: Most common data science operations are supported by this release. Developers can compute counts, sums, averages, medians, and percentiles using our library.
- Testing: Getting differential privacy right is challenging. Besides an extensive test suite, we’ve included an extensible ‘Stochastic Differential Privacy Model Checker library’ to help prevent mistakes.
- Ready to use: The real utility of an open-source release is in answering the question “Can I use this?” That’s why we’ve included a PostgreSQL extension along with common recipes to get you started. We’ve described the details of our approach in a technical paper that we’ve just released today.
- Modular: We designed the library so that it can be extended to include other functionalities such as additional mechanisms, aggregation functions, or privacy budget management.
Differential privacy tools are challenging to build from scratch, Google said. "To make the library easy for developers to use, we're focusing on features that can be particularly difficult to execute from scratch, like automatically calculating bounds on user contributions," he said.
So far this year, Google has announced several open-source, privacy technologies, including Tensorflow Privacy, Tensorflow Federated, Private Join and Compute. And, Google has used differential privacy to create features in its products, such as assessing the popularity of a particular restaurant’s dish in Google Maps and improving Google Fi. “From medicine, to government, to business, and beyond, it’s our hope that these open-source tools will help produce insights that benefit everyone,” Guevara said.