So You Want to Work With Patron Data… De-identification Basics

Welcome to this week’s Tip of the Hat! This week’s post is a “back to basics” about de-identification and patron data. Why? After reading a recent article published in the Code4Lib Journal where patron data was not de-identified before combining it with external data sets, now’s a good time as any to remind library workers about de-identification. [1]

De-identification Definitions

Before we talk about de-identification, we must talk about anonymization and the differences between the two:

  • De-identification is when you remove the connection between the data and any identifiable individual in the real world. Sometimes de-identified datasets have a unique identifier replacing personally identifiable information (PII) to data points, which is then called pseudonymization.

De-identification provides a way for some to work with data to track individual trends with a reduced risk of re-identification and other privacy risks. Why “for some” and “reduced”? We’ll get into the whys of the issues with de-identification later in this post.

De-identification Method Basics

PII comes in two forms: data about a person and data about a person’s activities that can be linked back to the person. The methods and level of work needed to sufficiently de-identify patron data depend on the type of PII in the data set. The methods commonly used to de-identify PII include truncation, obfuscation, and aggregation.

  • Obfuscation moves the reference point of the data up a few levels of granularity. An example is using a birth year or age instead of the person’s full birth date.
  • Truncation strips the raw data to a small subsection general enough that it cannot be easily connected to an identifiable person. A real-world example of truncation is HIPAA’s guidance on physical address de-identification, truncating the address to the first three digits of the zip code.
  • Aggregation further groups individual data points creating a more generalized data set. Going back to the obfuscation example, individual ages can be aggregated into age ranges.

There are more methods to de-identify data, some of which can get quite complex, such as differential privacy. The three methods mentioned above, nonetheless, are some of the more accessible de-identification methods available to libraries.

Before You De-identify…

Remember in the first section that we mentioned that de-identification only works for some data sets and only reduces privacy risk? There are two main reasons why this is:

  1. De-identification does not protect outliers in data or for small population data sets. There are equations (more) and properties that can help you determine if your dataset cannot be re-identified, but for most libraries, de-identification is not possible due to the type or size of the data set they wish to deidentify.
  2. De-identified data can still be re-identified through the use of external data sets, particularly if the data in the de-identified dataset was not properly de-identified. An evergreen example is the AOL data set that retained identifying data in the search queries, even though AOL scrubbed identifying data about the searcher.

It is possible to have a de-identified data set of patron data, but the process is not fool-proof. De-identification requires multiple sample de-identification processes and analysis in determining the risk of how easy it is to reconnect the data to an individual.

Overall, de-identification is a tool to help protect patron privacy, but it should not be the only privacy tool used in the patron data lifecycle. The most effective privacy tools and methods in the patron data lifecycle are the questions you ask at the beginning of the lifecycle:

  • Why are you collecting this data?
  • Does this reason tie to a demonstrated business need?
  • Are there other ways you can achieve the business need without collecting high-risk patron PII?

If you want to learn more about de-identification and privacy risks, check out the resources below:

[1] The article contains additional privacy and security concerns that we will not cover in this post, including technical, administrative, and ethical concerns.

Ch-ch-ch-ch-changes…

Welcome to this week’s Tip of the Hat!

We’ve been busy the last couple of weeks with website and newsletter changes, and now with the dust mostly settled from these changes, we’d like to give you an update about these changes.

Newsletter changes

LDH has been sending newsletters to your inbox for almost a year and a half. While it’s a convenient way to receive the latest privacy updates, searching and linking to these posts were less than convenient. To make access to our privacy updates easier for our subscribers and to the general public, we are proud to launch our Tip of the Hat blog!

What does this mean for newsletter subscribers? You will still receive the latest posts in your inbox. The greatest change is the ease of searching and accessing older posts. The majority of the newsletter archive have now been migrated to the blog, where you can search the archive in multiple ways: free text search, tags, and categories. Each post also has a shorter, permanent URL for easier sharing with your colleagues. We hope that this new blog will give you easier access to all the privacy news you can use!

Website changes

In addition to the blog, LDH has updated our website, including:

  • Services – updated list of services LDH provides for clients and examples of previous client work
  • About – updated list of library privacy work in the field, as well as adding a personnel entry for our Assistant to the Executive Assistant

We’re always looking for ways to improve the website, including content offerings. What would you like to find on the LDH website? Let us know by sending an email to newsletter@ldhconsultingservices.com and we’ll take it from there.

Summer Homework – Understanding Your State’s Library Privacy Law

Welcome to this week’s Tip of the Hat!

Have you always dreamed of spending countless hours reading legal regulations and reviews? If so, you might be suited for legal life! Reading laws is probably not high on your list of things to do; nonetheless, it’s always good to know how to navigate the text of a legal regulation when you are researching what laws could apply to you or to the third parties that you do business with. Even though we’re not lawyers, knowing how to read legal regulation text enables people to have more productive conversations with legal staff.

Here are three questions that can help you start understanding a law or statute:

  1. Who is covered by this law?
    • Does your state library privacy law cover only for publicly-funded libraries, or does the scope include other types of libraries, no matter the funding source? Does it include third parties acting on behalf of the library?
  2. What types of information (and what uses of information) are covered?
    • What does the law mean when it says “patron data”? Are there any definitions or descriptions of specific data points covered by the law?
  3. What exactly is required or prohibited?
    • In particular, what exemptions are listed in the law?

You might not be able to answer all the questions depending on what law you choose to study. However, not being able to answer a question might be a topic of discussion with legal staff, particularly around the specifics of who is within the scope of the law. There’s also the question of preemption between different governmental levels of legal regulation (or even within the same level of government). Sometimes a lower government’s law is stricter than a higher government’s law, but if the higher government’s law states that their law preempts any laws from lower governments, then you are not bound to follow the lower government’s law in that specific matter.

Now it’s time to take what you learned and put it into practice. Find your state’s library privacy law and read the law while trying to answer the questions above. Let us know if these questions help you through the legal text! Don’t be afraid to let us know if this exercise brings up more questions than it answers – we’ll do our best in addressing them, or at least help you prepare in asking these questions to your legal staff.

[Legal questions source: Swire, Peter, and DeBrae Kennedy-Mayo. (2018). U.S. Private-Sector Privacy: Law and Practice for Information Privacy Professionals, 2nd ed.]

Libraries, Privacy, and… Tropes?

Welcome to this week’s Tip of the Hat!

A popular way to procrastinate at LDH is to dig through the pile of articles and other literature about all facets of privacy: regulations, ethics, practices, current events… the current events pile is at overcapacity at the moment. In these piles of articles, we come across one particular trope that we’d like to address – libraries as exemplars of privacy ethics and practices.

This trope is similar to others in other mainstream stories that use libraries as exemplars for other things, such as community engagement, democracy, and learning centers. The “library as privacy exemplar” trope coexists with these other tropes, sometimes in the same story. Other times the trope is front and center of an article. An example of this is an IAPP article about general privacy practices at the library. At best, this article demonstrates the attitude and tone of how many writers think about the library as an enlightened entity with their focus on privacy. Near the end of the article comes another trait that these articles tend to share, which is modeling privacy practices off of the library profession: “While library culture tilts heavily in favor of protecting the ‘citizen from state’ intrusion, that same culture can be mobilized to advocate for ‘customer’ privacy as well in relation to third-party service providers.”

All of this leads us to a hidden danger in the “library as privacy exemplar” trope, which is unquestioned trust in libraries in all matters of privacy and data ethics. Some of that trust has been earned – there are several library privacy initiatives, such as the Library Freedom Institute, that are very active in the greater community in their advocacy and education around data privacy. In addition, LDH’s conversations with technology workers in other fields have made it clear that professionals in other industries wished that they had strong professional ethics and standards like the library profession.

Nonetheless, others from outside the library profession take this trust too far. For example, in Emma Trotter’s “Patron Data Privacy Protection at Public Libraries: The Ethical Model Big Data Lacks”, Trotter proposes that libraries should become personal data stores (PDS) where people can gather their data in one secure place and then manage the processing of their data by third parties. Trotter is very confident that libraries can become the ethical role model for Big Data with this marriage between PDS and library privacy ethics. Overall, Trotter believes that the ethical issues around Big Data would be negated once libraries become front and center in the overall management of Big Data.

While libraries do have a strong ethical basis around advocacy and adoption of privacy practices, libraries also have their fair share of privacy issues and gaps. Libraries are not immune to the same threats and vulnerabilities as other professions and industries, such as data leaks and breaches, ransomware attacks, phishing, and even underfunding or undertraining staff in ways to protect patron privacy. Librarianship also deals with ethical issues around their collection and processing of patron data, particularly for marketing and user profiling, as well as working with vendors who also collect and process patron data without giving the patron control over what is collected and processed. One doesn’t need to search too far to find an example of such – one being the Santa Cruz Public Library’s Civil Grand Jury Report about the numerous ethics breaches surrounding their use of patron data without full patron notice and consent, among other violations of patron privacy.

Yes, other industries can learn from libraries about how to approach privacy in their daily work, including ethics and advocacy, but libraries also have to be honest about the profession’s struggles around data privacy, both on a practical and ethical level. Part of that is being public with these struggles in the public discourse, be it with patrons or with people from other industries who are looking for a model to base their professional privacy ethics and practices on. Another part is re-evaluating how we, as a library profession, market ourselves as privacy experts and safe-keepers of data to our patrons. Again, libraries set themselves apart from other industries regarding privacy ethics and advocacy, but they cannot set themselves apart from the reality that is working with data in the real world that has real needs that fall into ethical gray areas and real data security and privacy risks.

Summer Homework – Requesting Your Data

Welcome to this week’s Tip of the Hat!

Have you ever wondered what data OverDrive collects while you’re reading the latest ebook? Or what Kanopy collects when you’re watching a documentary? As library workers, we have some sense as to what vendors are collecting, but we are also patrons – what exactly are vendors collecting about *us*?

GDPR and CCPA both give different sets of users (EU residents and CA consumers, respectively) the right to access the data collected by organizations and businesses; however, some organizations extended that right to all users, regardless of geographic residency. Below are some of the more well-known library vendors who are offering some form of data request process for their users (aka library patrons, including you!):

  • Cengage
  • Elsevier
  • Kanopy’s data request appears only to apply to CA consumers: “Under California Civil Code Section 1798.83, if you are a California resident and your business relationship with us is primarily for personal, family or household purposes, you may request certain data regarding our disclosure, if any, of personal information to third parties for the third parties’ direct marketing purposes. To make such a request, please send an email to privacy@kanopy.com with “Request for California Privacy Information” in the subject line. You may make such a request up to once per calendar year. If applicable, we will provide to you via email a list of the categories of personal information disclosed to third parties for their direct marketing purposes during the immediately-preceding calendar year, along with the third parties’ names and addresses. Please note that not all personal information sharing is covered by Section 1798.83’s requirements.”
  • LexisNexis
  • OverDrive
  • ProQuest
    • ExLibris, owned by ProQuest, appears to have a different data request process: “You may request to review, correct or delete the personal information that you have previously provided to us through the Ex Libris Sites. For requests to access, correct or delete your personal information, please send your request along with any details you may have regarding the method by which the information was submitted to privacy@exlibrisgroup.com. Requests to access, change, or delete your information will be addressed within a reasonable timeframe.”

What is surprising is that there are not more library vendors that offer this option, or not extending the option to all users. This might change over time, depending on how the newest data privacy ballot initiative in California goes in November, or if additional regulations are passed in other states or even in the federal government. If more companies provide this right to access for all users, then it’s more likely that this practice will become a standard practice industry-wide. LDH will provide the latest updates around data access options from library vendors when they come along!

New ALA Guidelines and Zoom Update

Welcome to this week’s Tip of the Hat!

In case you missed it – last week ALA announced a trio of new guidelines for libraries concerned with patron privacy during the reopening process as well as libraries who use security cameras at their branches:

Guidelines for Reopening Libraries During the COVID-19 Pandemic – Theresa Chmara, J.D. guides libraries with planning reopening procedures and policies, including requirements around wearing masks, health screenings of both patrons and staff, and contact tracing. While these guidelines are not legal advice, these guidelines should inform your discussions with your local legal advisors.

Guidelines on Contact Tracing, Health Checks, and Library Users’ Privacy – This statement from IFC reaffirms the importance of patron privacy in the reopening process, including giving newly published guidelines around contact tracing at the library. The statement also directs libraries to the Protecting Privacy in a Pandemic Resource Guide, which brings together several privacy resources for libraries to incorporate into their reopening processes, as well as the expansion of existing patron services to online.

Video Surveillance in the Library Guidelines – Libraries who use security cameras should review their existing policies around camera placement, recording storage and retention, and law enforcement requests for recordings considering the new guidelines. There are also sections around patrons filming library staff and other patrons which public libraries should review regarding staff and patron privacy and safety.

Take some time to review the above guidelines and discuss how these guidelines might affect your library’s reopening or use of security cameras in the building!

Zoom Update

Zoom reported that they will not provide end-to-end encryption for free-tier users so Zoom can comply with law enforcement. Now that you know how Zoom will respond to law enforcement requests, does their stance line up with your library’s law enforcement request policy, as well as your patron privacy policy? If not, how will your library adjust your use of Zoom for patron services? One option is to not use Zoom, but as we covered in previous newsletters, Zoom is arguably one of the user-friendly video conferencing software in the market. Nonetheless, there are alternatives out there that do a better job protecting privacy, including Jitsi. If you must use Zoom for patron services, check out the Zoom Security Recommendations, Settings List, and Resources document from LDH’s Remote Work presentation in April to help you secure your Zoom calls.

Black Lives Matter

Hello everyone,

Black Lives Matter.

If your library or archive is thinking about collecting photographs, videos, or other materials from the protests around George Floyd’s death caused by Minneapolis police, what are you doing to protect the privacy of the protesters? Black Lives Matter protestors and organizers, as well as many protesters and organizers in other activist circles, face ongoing harassment due to their involvement. Some have died. Recently Vice reported on a website created by white supremacists to dox interracial couples, illustrating how easy it is to identify and publish personal information with the intent to harm people. This isn’t the first website to do so, and it won’t be the last.

Going back to our question – if your response to the protests this weekend is to archive photos, videos, and other materials that personally identifiable information about living persons, what are you doing to protect the privacy and security of those people? There was a call made this weekend on social media to archive everything into the Internet Archive, but this call ignores the reality that these materials will be used to harass protesters and organizers. Here is what you should be considering:

  • Scrubbing metadata and blurring faces of protesters – a recently created tool is available to do this work for you: https://twitter.com/everestpipkin/status/1266936398055170048
  • Reading and incorporating the resources at https://library.witness.org/product-tag/protests/ into your processes and workflows
  • Working with organizations and groups such as Documenting The Now
    A tweet that summarizes some of the risks that you bring onto protestors if you collect protest materials: https://twitter.com/documentnow/status/1266765585024552960

You should also consider if archiving is the most appropriate action to take right now. Dr. Rachel Mattson lists how archives and libraries can do to contribute right now – https://twitter.com/captain_maybe/status/1267182535584419842

Archives, like libraries, are not neutral institutions. The materials archivists collect can put people at risk if the archives do not adopt a duty of care in their work in acquiring and curating their collections. This includes protecting the privacy of any living person included in these materials. Again, if your archive’s response is to archive materials that identify living people at these protests, how are you going to ensure that these materials are not used to harm these people?

Black Lives Matter.

Just Published! Library Data Risk Assessment Guide

Welcome to this week’s Tip of the Hat!

To build or to outsource?

Building an application or creating a process in a library takes time and resources. A major benefit of keeping it local, though, is that libraries have the greatest control over the data collected, stored, and processed by that application or system. Conversely, a major drawback of keeping it local is the sheer number of moving parts to keep track of in the building process. Some libraries have the technical know-how to build their own applications or have the resources to keep a process in house. Keeping track of privacy risks is another matter. Risk assessment and management must be addressed in any system or process that touches patron data, so how can libraries with limited privacy risk assessment or management experience make sure that their local systems and processes mitigate patron privacy risks?

Libraries have a new resource to help with privacy risk management! The Digital Library Federation’s Privacy and Ethics in Technology Working Group (formerly known as the Technologies of Surveillance Working Group) published “A Practical Guide to Performing a Library User Data Risk Assessment in Library-Built Systems“. This 28-page guide provides best practices and practical strategies in conducting a data risk assessment, including:

  • Classifications of library user data and privacy risk
  • A table of common risk areas, including probability, severity, and mitigation strategies
  • Practical steps to mitigate data privacy risks in the library, ranging from policy to data minimization
  • A template for readers to conduct their own user data inventory and risk assessment

This guide joins the other valuable resources produced by the DLF Privacy and Ethics in Technology Working Group:

The group also plans to publish a set of guidelines around vendor privacy in the coming months, so be sure to bookmark https://wiki.diglib.org/Privacy_and_Ethics_in_Technology and check back for any updates!

Contact Tracing At The Library

Welcome to this week’s Tip of the Hat!

Contact tracing has been used in the past with other diseases which helped curve infection rates in populations, so health and government officials are looking at contact tracing once again as a tool to help control the spread of disease, this time with COVID-19. There have been various reports and concerns about contact tracing through mobile apps, including ones developed by Google and Apple. However, mobile contact tracing will not stop local health and government officials in taking other measures when it comes to other contact tracing methods and requirements, and libraries should be prepared when their local government or health officials require contact tracing as part of the reopening process.

While there are no known cases of libraries doing contact tracing as part of their reopening process, there are some ways in which libraries can satisfy contact tracing requirements while still protecting patron privacy.

Collect only what you absolutely need

What is the absolute minimum you need to contact a patron: name, email address, and/or telephone number are all options. Sometimes patrons do not have a reliable way of contacting them outside the library – health and government officials should have recommendations in handling those cases.

But what about having patrons scan in with their library card and using that as the contact tracing log? What seems to be a simple technological solution is, in reality, one that introduces complexity in the logging process as well as privacy risks:

  • Some of the people visiting the library will not have their library card or are not registered cardholders.
  • Contact logs can be subject to search or request from officials – maintaining the separation between the contact log and any other patron information in the library system will minimize the amount of patron data handed over to officials when there is a request for information.

Paper or digital log?

Some libraries might be tempted to have patrons scan in with their barcodes (see above section as to why that’s not such a good idea) or keep an electronic log of patrons coming in and out of the building. However, an electronic log introduces several privacy and security risks:

  • Where is the digital file being stored? Local drive on a staff computer that isn’t password protected? Network storage? Google Drive (yikes!)?
  • Who has access to the digital file? All staff in the library?
  • How many other copies of the file are floating around the library’s network, drives, or even printed out?

In this instance, however, a paper log will provide better privacy and security protections when you take the following precautions:

  • The paper log should be securely stored in a locked cabinet or desk in a secured area, preferably a locked office or other controlled entry space.
  • During business hours, the paper log should be filled out by designated staff members tasked to collect information from patrons. Do not leave the paper log out for patrons to sign – not only you give patrons the names of others in the building (for example, a law enforcement agent can read the log and see who’s in the building without staff knowledge) you also potentially expose patrons and staff to health risks by having them share the same hard surfaces and pen.
  • Restrict access to the paper log to only staff who are designated to keep logs, and prohibit copying (both physical or electronic copies) of the log.

Equitable service and privacy

Some patrons might not have reliable contact information or might refuse to give information when asked. If the local government or health officials state that someone can’t enter a building if they don’t provide information, how can your library work with your officials in addressing the need for libraries to provide equitable service to all patrons who come to the library?

Retention and disposal

Keep the contact tracing logs for only as long as the government or health officials require. If there is no retention period, ask! Your logs should be properly disposed of – a paper log should be shredded and the shredded paper should go to a secured disposal area or service.

Keeping a log of visits to the library is something not to be taken lightly – you are creating a log of a patron’s use of the library. Several other privacy concerns might be specific to your library that could affect how you go about contact tracing, such as unaccompanied minors. Contact tracing is an effective tool in containing disease outbreaks in the past, but it doesn’t have to come at the expense of losing entire personal privacy if the library works with its staff and government officials in creating a process that minimizes patron data collection, access, and retention.

Choose Privacy Week Recap

Welcome to this week’s Tip of the Hat!

This weekend was hot in Seattle, with temperatures near 90 F. While the Executive Assistant took this time to bask in this heat, we at LDH tried to find a cool spot in the home office to work, away from the Executive Assistant’s gaze.

Last week was a busy week on the Choose Privacy Every Day site for Choose Privacy Week! Here’s what you might have missed:

  • Virtual Programming and Patron Privacy – Jaime Eastman along with the ALSC Children and Technology committee give much-needed guidance for library workers who are moving children-oriented programs and services online due to the pandemic. The post goes into the Children’s Online Privacy Protection Act (COPPA), and what library workers need to do to protect the privacy of children while keeping in compliance with COPPA. Bookmark the ALSC Virtual Storytime Services Resource Guide for additional guidance (coming soon!).
  • Protecting Privacy In A Pandemic: A Resource Guide – On Friday, May 8th, OIF hosted a Privacy Town Hall about patron privacy. While we wait for the recording of the Town Hall event, the blog post lists the main topics and resources covered by the panelists in the Town Hall.
  • When libraries become medical screeners: User health data and library privacy – Some libraries are now giving medical screenings to patrons who want to enter the library building. What privacy risks are there in collecting health data of your patrons? Read the article by LDH to find out why library workers might not be the best choice in handling health data.

Finally, if you have that one library privacy topic that you’ve been meaning to write about or if you want to share your privacy thoughts to a wide audience, Choose Privacy Every Day is looking for blog authors! There are some requirements for being an author for the blog, but this is a great opportunity to get your ideas and thoughts out into the library world.

That’s a wrap! Or, at least, the computer core temperature says it’s time to put the computer in the freezer. If you’re on the West Coast, stay cool, and for those of you who got snow on the East Coast, stay warm!