AI, Read The Privacy Policy For Me

Welcome to this week’s Tip of the Hat! Last week we took a deep dive into ALA’s privacy policy to figure out where our information was going if we agreed to receive information from exhibitors while registering for the Annual Conference.

[Which, ICYMI, LDH will be exhibiting at Annual! Let us know if you want to meet up and talk about all things privacy and libraries!]

As we encountered last week, privacy policies are not the most exciting documents to read. In fact, you can test out this theory by checking out the impressive list of electronic resource vendor privacy policies generated by the folks at York University (the code is available on GitHub). Try picking out a couple of privacy policies and read them from start to finish now. We’ll be here waiting for you.

…..

……. all done?

Chances are, you probably found yourself skimming the policies if you made it all the way to the bottom. If so, you’re not alone – studies have shown that the majority of folks do not read these policies, which could lead to surprises and confusion when your data is collected, shared, or breached. The fact is that it takes a long time to get through long, detailed documents – a recent study showed that many privacy policies require a high reading level and up to around a half hour to read. What’s a busy person to do?

One way some folks are addressing this is to let the machines do the reading for you. The last few years have seen several tools that use AI and machine learning (ML) to analyze privacy policies, selecting the very important parts that users should know. For example, the Usable Privacy Policy Project, an NSF funded project, used a collection of 115 privacy policies annotated by law students to train machine classifiers to annotate over 7000 privacy policies. Another group of researchers used the same 115 annotated privacy policies for ML training, creating two different tools for AI-generated analysis of policies. The first is Polisis, which creates a Sankey diagram based off of the AI’s analysis of the policy, while the second is Pribot, a chatbot that allows users to explore and ask questions about specific privacy policies.

Each AI privacy analysis tool takes a different approach in displaying the results to the end users. Let’s use OverDrive’s privacy policy as our test policy. [1] The Usable Privacy site uses different colored fonts to indicate which parts of the policy belong to 10 different categories. The site also directs us to another policy analysis of OverDrive’s Privacy Policy for Children. Users can click on a category to only show the colored sections of the policy, or to exclude it.

A screenshot of Usable Privacy's analysis of the OverDrive privacy policy.

For Polisis’ analysis of OverDrive’s policy, the site takes the same ten categories and creates separate visualizations for most of them. Users can click on a stream to highlight it in the diagram – for example, showing what information is shared and for what reason.

A screenshot of Polisis' analysis of OverDrive's privacy policy.

We are still a ways away before widespread adoption of AI-annotated privacy policies; however, the possibilities are promising. With GDPR, CCPA, and other upcoming privacy regulations, AI and ML could help end users in keeping up with all the changes in policies, as well as dig through mountains of text in a fraction of the time it would have taken to manually read all of the text. It will still take a considerable human role in training the AI and supervising the ML to ensure proper analysis, though, as well as human labor in creating effective and accessible interfaces. Perhaps one day there could be an API service that can have AI analyze the privacy policies listed on the York University page.

[1] Both sites are analyzing older versions of OverDrive’s privacy policy. The most up to date privacy policy is at https://company.cdn.overdrive.com/policies/privacy-policy.htm.