Just Published! Library Data Risk Assessment Guide

Welcome to this week’s Tip of the Hat!

To build or to outsource?

Building an application or creating a process in a library takes time and resources. A major benefit of keeping it local, though, is that libraries have the greatest control over the data collected, stored, and processed by that application or system. Conversely, a major drawback of keeping it local is the sheer number of moving parts to keep track of in the building process. Some libraries have the technical know-how to build their own applications or have the resources to keep a process in house. Keeping track of privacy risks is another matter. Risk assessment and management must be addressed in any system or process that touches patron data, so how can libraries with limited privacy risk assessment or management experience make sure that their local systems and processes mitigate patron privacy risks?

Libraries have a new resource to help with privacy risk management! The Digital Library Federation’s Privacy and Ethics in Technology Working Group (formerly known as the Technologies of Surveillance Working Group) published “A Practical Guide to Performing a Library User Data Risk Assessment in Library-Built Systems“. This 28-page guide provides best practices and practical strategies in conducting a data risk assessment, including:

  • Classifications of library user data and privacy risk
  • A table of common risk areas, including probability, severity, and mitigation strategies
  • Practical steps to mitigate data privacy risks in the library, ranging from policy to data minimization
  • A template for readers to conduct their own user data inventory and risk assessment

This guide joins the other valuable resources produced by the DLF Privacy and Ethics in Technology Working Group:

The group also plans to publish a set of guidelines around vendor privacy in the coming months, so be sure to bookmark https://wiki.diglib.org/Privacy_and_Ethics_in_Technology and check back for any updates!

CRMS 101

Welcome to this week’s Tip of the Hat! Today we have a brief overview of an acronym that is becoming a popular tool in libraries – the customer relationship management system [CRMS] – and how this new player in the library field affects patron privacy. While some folks know about CRMS, there might be others that are not exactly sure what they are, and what they have to do with libraries. Below is a “101”- type guide to help folks get up to speed on the ongoing conversation.


What is a CRMS?

A customer relationship management system [CRMS] manages an organization’s interactions with customers with the goal to grow and maintain customer relationships with the organization. CRMS products have been used in other fields outside of librarianship for decades, mostly in commercial businesses, but the increased importance in data analysis and improving customer experiences has led for wider adoption of CRMS products in other fields, including libraries.

What is a CRMS used for?

Many organizations use CRMS products to track various communications with customers (email, social media, phone, etc.) as well as data about a customer’s interests, demographics, and other data that can be used for data analysis. This analysis is then used to improve and customize the user experience (targeted marketing, personal recommendations, and invitations, etc.) as well as making business decisions surrounding products, services, and organization-customer relations. This analysis can also be used to create user profiles or for market segmentation research.

What are some examples of CRMS?

There are many proprietary and open source options, though Salesforce is one of the most recognized CRM companies in the overall field. In the library world, several library vendors sell standalone CRMS products, such as OrangeBoy’s Savannah. Other library vendors have started offering products that integrate the CRMS into the Integrated Library System [ILS]. OCLC’s WISE is one such example of this integration, while other library vendors plan to release their versions in the near future.

What data is collected in a CRMS?

A CRMS is capable of collecting a large quantity of very detailed data about a customer. Types of patron data that can be collected with a library CRMS includes (but not limited to):

  • Demographic information
  • Circulation information like total checkouts, types of materials checked out, and physical location of checkouts
  • Public computer reservation information
  • Electronic resource usage
  • Program attendance

In addition to library supplied data, other data sets from external sources can be imported into the CRMS ranging from US Census data to open data sets from cities and other organizations that could include other demographic information by geographical area (such as zip code) or by other indicators.

How is patron privacy impacted by CRMS?

The amount of information that can be collected by a CRMS is akin to the type of information collected by commercial companies who sell services and products. By creating a user profile, the company can use that information to personalize that customer’s experience and interactions with the company, with the ultimate goal of creating and maintaining return customers. Traditionally libraries do collect and store some of the same information that CRMS products collect; however, it is usually not stored in one central database. Creating a profile of a patron’s use of the library leaves both the library and the patron at high risk for harm on both a personal and organizational level. This user profile is subject to unauthorized access by library staff, data breaches and leaks, or intentional misuse by staff or by the vendor that is hosting the system. This user profile can also be subject to a judicial subpoena, which puts patrons who are part of vulnerable populations at higher risk for personal harm if the information is collected and stored in the CRMS.

Further reading on the conflict between the CRMS, data collection, and library privacy:

What can we do to mitigate privacy risks if we use a CRMS?

If your library chooses to use a CRMS:

  • Limit the type and amount of patron data collected by the system. For data that is collected and stored in the CRMS, consider de-identification methods, such as aggregation, obfuscation, and truncation
  • Perform risk assessments to gauge the level of potential harm connected by collecting and storing certain types of patron information as well as matching up patron information with imported data sets from external sources
  • Negotiate at the contract signing or renewal stage with the vendor regarding privacy and security policies and standards around the collection, storage, access, and deletion/retention of patron data, as well as who is responsible for what in case there is a data breach
  • Perform regular privacy and security audits for both the library and the vendor

We hope that you find this guide useful! Please feel free to forward or pass along the guide in your organizations if you are having conversations about CRMS adoption or implementation. LDH can also help you through the decision, negotiation, or implementation processes – contact us to learn more!

[REDACTED] – Redacting PII From Digital Collections

Welcome to this week’s Tip of the Hat! The Executive Assistant is back, and you know what that means…

A sitting black cat looking up at the camera, meowing loudly.
We’re back in business, newsletter-wise!

This week’s topic comes from a recent post to the Code4Lib mailing list. A library is planning to scan a batch of archival documents to PDF format, and are looking for ways to automate the process of identifying personally identifiable information [PII] in the documents and redacting said PII. The person mentioned that the documents might contain Social Security Numbers or credit card numbers.

Many libraries and archives have resources – digital and physical – that contain some form of PII in the source. While physical resources can be restricted to specific physical locations (unless someone copies the source via copier, pencil and paper, camera, etc.), digital resources that are available through a digital repository can increase the risk of privacy harm if that digital resource contains unredacted PII.

When libraries and archives are incorporating personal collections, research data sets, or other resources that may contain PII, here are some considerations to keep in mind to help through the process of mitigating the risk of data breaches and other privacy harms:

Who is included or mentioned in the resource – Some archival collections contain PII surrounding the individual who donated their materials. When dealing with institutional/educational records or research data sets, however, you might be dealing with different types of PII regulations and policies depending on who is included in the resource and what type of PII is present.

What PII is in the resource – When most folks think about PII, they think about information about a person: name, Social Security number, financial information, addresses, and so on. What tends to be overlooked is PII that is information about an activity surrounding a person that could identify that person. Think library checkout histories, web search histories, and purchase history. You will need to decide what types of PII needs redacting, but keep both facets of PII in mind when deciding.

What is the redaction workflow – This gets into the question from the mailing list. The workflow of redacting PII depends on several factors, including what PII needs to be redacted, the number of resources needing to be redacted, and what format the resource is in. Integrating redaction into a digitization or intake workflow reduces the time spent retroactively redacting PII by staff. Here I’d like to offer a word of caution – while automating workflows for efficiency can be positive, sub-optimizing a part of a workflow can lead to a less efficient overall workflow as well as have negative effects on work quality or resources.

What tools and resources are available – While looking at the overall workflow for redacting PII, the available resources and knowledge available to you as an organization to build and maintain a redaction workflow will greatly shape said workflow, or even the ability to redact PII in a systematic manner. There are many commercial tools that automate data classification and redaction workflows, and there are options to “roll your own” identification and redaction tool using various programming languages and regular expressions. If you work at a library or archive that is part of a bigger institution, there might be tools or resources already available through central IT or through departments that oversee compliance or information security and privacy. Don’t be afraid to reach out to these folks!

If you’re wondering where to begin or what other organizations approach redaction, here are a few resources, here are some resources to start with: