[REDACTED] – Redacting PII From Digital Collections

Welcome to this week’s Tip of the Hat! The Executive Assistant is back, and you know what that means…

A sitting black cat looking up at the camera, meowing loudly.
We’re back in business, newsletter-wise!

This week’s topic comes from a recent post to the Code4Lib mailing list. A library is planning to scan a batch of archival documents to PDF format, and are looking for ways to automate the process of identifying personally identifiable information [PII] in the documents and redacting said PII. The person mentioned that the documents might contain Social Security Numbers or credit card numbers.

Many libraries and archives have resources – digital and physical – that contain some form of PII in the source. While physical resources can be restricted to specific physical locations (unless someone copies the source via copier, pencil and paper, camera, etc.), digital resources that are available through a digital repository can increase the risk of privacy harm if that digital resource contains unredacted PII.

When libraries and archives are incorporating personal collections, research data sets, or other resources that may contain PII, here are some considerations to keep in mind to help through the process of mitigating the risk of data breaches and other privacy harms:

Who is included or mentioned in the resource – Some archival collections contain PII surrounding the individual who donated their materials. When dealing with institutional/educational records or research data sets, however, you might be dealing with different types of PII regulations and policies depending on who is included in the resource and what type of PII is present.

What PII is in the resource – When most folks think about PII, they think about information about a person: name, Social Security number, financial information, addresses, and so on. What tends to be overlooked is PII that is information about an activity surrounding a person that could identify that person. Think library checkout histories, web search histories, and purchase history. You will need to decide what types of PII needs redacting, but keep both facets of PII in mind when deciding.

What is the redaction workflow – This gets into the question from the mailing list. The workflow of redacting PII depends on several factors, including what PII needs to be redacted, the number of resources needing to be redacted, and what format the resource is in. Integrating redaction into a digitization or intake workflow reduces the time spent retroactively redacting PII by staff. Here I’d like to offer a word of caution – while automating workflows for efficiency can be positive, sub-optimizing a part of a workflow can lead to a less efficient overall workflow as well as have negative effects on work quality or resources.

What tools and resources are available – While looking at the overall workflow for redacting PII, the available resources and knowledge available to you as an organization to build and maintain a redaction workflow will greatly shape said workflow, or even the ability to redact PII in a systematic manner. There are many commercial tools that automate data classification and redaction workflows, and there are options to “roll your own” identification and redaction tool using various programming languages and regular expressions. If you work at a library or archive that is part of a bigger institution, there might be tools or resources already available through central IT or through departments that oversee compliance or information security and privacy. Don’t be afraid to reach out to these folks!

If you’re wondering where to begin or what other organizations approach redaction, here are a few resources, here are some resources to start with:

Baking Privacy Into Your Library: The What, How, and Why of Privacy by Design

Welcome to this week’s Tip of the Hat!

This week’s topic comes to you thanks to the endless hours spent last week cleaning up inactive online accounts as part of our #dataspringcleaning efforts at LDH. It’s frustrating as a user not to have the ability to choose how applications and third parties collect and process your data. Not having the ability to delete your account is one example – some systems were not built to delete data. You all have run across other examples, such as the lack of opting out of having certain personal data collected, processed, or shared with other third parties. To an extent, library workers and vendors can determine how much control patrons have over how applications and services collect and process personal data. However, these controls are put into place after the fact, leaving patrons in a lurch with limited privacy options. How can library workers and vendors avoid this lurch?

Enter Privacy by Design (PbD). First created by Ann Cavoukian, then refined by various international organizations, PbD advocates making privacy a priority throughout the lifecycle of a service or application, including the planning and implementation stages. PbD has made a major impact in the privacy and systems development worlds, as well as the legal realm – GDPR is the latest regulation where PbD has made an appearance.

There are seven foundational principles of PbD:

  1. Proactive not reactive; preventative not remedial
  2. Privacy as the default setting
  3. Privacy embedded into design
  4. Full functionality – positive-sum, not zero-sum
  5. End-to-end security – full lifecycle protection
  6. Visibility and transparency – keep it open
  7. Respect for user privacy – keep it user-centric

What would PbD look like for library workers and vendors? An example is turning off any features that might share user activity to others by default. Users who want to share their activity would have the option to turn on the feature, giving the application their consent in doing so. Another example comes from our tale of woe at the beginning of the newsletter – building a system so that a user can delete their account or personal data without consequence to the system’s integrity. It is much easier to create a system that can handle such deletions than to try to retroactively get a legacy system to learn a new trick!

Both examples highlight the user’s ability to control what data is collected, stored, and shared. Notice that privacy by default does not mean not collecting or processing data at all, but instead takes the position of letting users decide what level of privacy they are most comfortable with. On another level, PbD’s integrated approach to privacy in the development lifecycle guides all those involved in the development and planning processes in assessing how systems can protect user privacy and meet business needs at the same time. Discussing data collection and processing, privacy features, and how to address potential user concerns early in the development process can save both time and headaches when the system launches to users.

Below are a few resources to get you started with PbD:

[H/T to Chad Nelson for the inspiration for this week’s newsletter title!]

Into the Breach!

Welcome to this week’s Tip of the Hat!

Last week brought word of two data leaks from two major library vendors, Elsevier and Kanopy. Elsevier’s leak involved a server storing user credentials, including passwords, that was not properly secured. Kanopy’s leak involved an unsecured database storing website logs, including user activity. Both leaks involved library patron information, and both leaks were caused by a lapse in security measures on the part of the vendor.

As the fallout from these two breaches continues in the library world, now is as good of a time than any to talk about data breaches in general. Data breaches are inevitable, even if you follow security and privacy best practices. What matters is what you do when you learn of a possible data breach at your library.

On a high level, your response to a possible data breach should look something like this:

  1. Determine if there was an actual breach – depending on the nature of the breach, this could be fairly easy (like a lost laptop with patron information) or requires more investigation (like looking at access logs to see if inactive accounts have sudden bursts of activity).
  2. Contain and analyze the breach – some breaches can be contained with recovering lost equipment, while others can be contained by shutting off access to the data source itself. Once the breach is contained, you can then investigate the “who, what, when, where, and how” of the breach. This information will be useful in the next steps…
  3. Notify affected parties – this does not only include individual users but organizational and government agencies as well.
  4. Follow up with actions to mitigate future data breaches – this one is self-explanatory with regard to applying what you learned from the breach.

The US does not have a comprehensive federal data breach notification law. What the US does have is 50+ data breach notification laws that vary from state to state. These laws have different regulations pertaining to who needs to be notified at a certain time, and what information should be included in the notification. If you are also a part of a larger organization, that organization might have a data breach incident response procedure. All of the above should be taken into consideration when building your own incident response procedure.

However, that does not address what many of you might be thinking in light of last week’s data breaches – how do you prevent having your patrons’ information breached in a vendor’s system? It’s frustrating when your library’s patron information is left unsecured with a vendor, be it through unencrypted passwords and open databases containing patron data. There are a couple of steps in mitigating risk with the vendor:

  • Vendor security audits – One practice is to audit the vendor’s data security policies and procedures. There are some library related examples that you can pull from: San Jose Public Library performed a vendor security audit in 2018, while Alex Caro and Chris Markman created an assessment framework in their article for the Code4Lib Journal.
  • Contract negotiations – Writing in privacy and security clauses into a vendor contract introduces a layer of legal protection not only for your patrons but to your organization as a whole, with regards to possible liability that comes with a data breach. Additions can clarify expectations about levels of security surrounding patron data in vendor systems as well as data breach management expectations and roles between the vendor and the library.

Ultimately, it’s up to the vendor if they want to follow security best practices and have a data breach incident management procedure (though, if a vendor chooses not to implement security protocols, that could adversely affect their business). Nonetheless, it never hurts to regularly bring up security and privacy in contract negotiations and renewals, procurement processes, and in regularly scheduled vendor rep meetings. Make it clear that your library considers security and privacy as priorities in serving patrons, and (hopefully) that will lead to a partnership that is beneficial to all involved and leaves patrons at a lower risk of having their data breached.

Phew! There’s a lot more on this topic that can be said, but we must leave it here for now. Below are a couple of resources that will help you in creating a data breach incident response plan:

#dataspringcleaning

Welcome to this week’s Tip of The Hat!

This week’s newsletter is inspired from last week’s #ChatOpenS Twitter chat about patron privacy, where the topic of #dataspringcleaning made its appearance.

I’m starting the hashtag #dataspringcleaning — I need to do this in my personal life, too! https://t.co/ueVfafKDQ0
— Equinox OLI (@EquinoxOLI) March 13, 2019

Springtime is around the corner, which means Spring Cleaning Time. While you are cleaning your physical spaces, take some time to declutter your data inventory. By getting rid of personally identifiable data that you no longer need, you are scrubbing some of the toxicity out of your data inventory, and lessening the privacy risks to patrons.

When you are done with data, what do you do with it? First, you need to check in to see if you are truly done with that data. Unfortunately, we cannot use Marie Kondo’s approach by asking if the data sparks joy, but here are some questions to ask instead:

  • Is the dataset no longer needed for operational purposes?
  • Are you done creating an aggregated dataset from the raw data?
  • Is the dataset past the record retention period set by policy or regulation? Don’t forget about backup copies as well!

Once you have determined that you no longer need the data, it’s time to clean up! For data on paper – surveys, signup or sign in sheets, reservation sheets – shred the paper and dispose of it through a company that securely disposes of shredded documents. Resist the temptation of throwing the shredding into the regular recycling bin – if your shredder shreds only in long strips, or otherwise doesn’t turn your documents into tiny bits of confetti, dumpster divers can piece together the shredded document.

Electronic data requires a bit more scrubbing. When you delete electronic data, the data is still there on the drive; you’ve just deleted the pointer to that file. Using software that can wipe the file or the entire drive will reduce the risk of someone finding the deleted file. There are free and paid software options to complete the task, depending on your system and your needs (hard drive, USB sticks, etc.).

And now we get to the fun part of deleting data. Any disc drives, CDs, floppy disks, or (where I give my age away) backup tape drives that held patron data need to be disposed of properly as well. Sometimes you are close to a disk disposal center where you can destroy your drives via degaussing machines. If you can’t find a center, then you have to literally take matters into your own hands. Remember that scene from Office Space with the printer?

A man beating a printer with a baseball bat.
That is what you are going to do, but with safety gear. Hammers, power drills, anything that will destroy the platters in the drive or the disk itself – just practice safety while doing so!

And who says that cleaning can’t be fun?

Resources to get you started:

California [Privacy] Dreamin’

A young white boy standing outside of a car saying Californiaaaa.
California is a trendsetter when it comes to state regulation. California’s 2003 data breach notification regulation served as the inspiration for many other states in later data breach regulations. It should be no surprise to learn that California is again setting a trend in data privacy and security regulation.

The California Consumer Protection Act (CCPA) passed in 2018 after a short six months in the state legislature. The Act models the European Union’s GDPR. Depending on who you talk to, GDPR’s enforcement date of May 2018 was one of the reasons why the Act was rushed through the state legislature. Some of the similarities between GDPR and CCPA include user’s rights to request, access, receive, and to delete any personal data that the business has collected.

CCPA differs in several key ways from GDPR, nonetheless. One difference is CCPA’s scope. To fall under CCPA, your business (this most likely includes libraries and library vendors!) must meet at least one of the following criteria:

  • Have $25 million or more in annual revenue,
  • Possess the personal information of more than 50,000 Californian consumers, households, or devices, or
  • Earn more than half of its annual revenue selling Californian consumers’ personal information

Not having a physical business presence in California is not a guaranteed exemption from CCPA compliance. You have to prove that you are not doing business in the state, which can be tricky at best. Most libraries who will fall under the scope of CCPA will most likely do so due to the second criteria of processing personal information.

Even though the CCPA passed in 2018, the enforcement date is not until January 1st, 2020. State legislators can change the Act up to the enforcement date, which makes planning for CCPA compliance difficult. There have been major amendment proposals to CCPA in the past few months: some to address problematic lines in the Act, while others add extra protections. The latest amendment is the “Privacy for All” Act in which further extend the rights of consumers, including more explicit notification and consent for data collection and use, as well as prohibiting discrimination against customers who choose to limit the data collected and shared by the business.

There remain many other loopholes. One loophole that will affect libraries and vendors is who can make a data request. Currently, the definition of “personal information” is very broad in the CCPA – not only it includes data about a person, but about the household associated with that person. For libraries, this could have ramifications regarding patrons requesting information about a member of their household, including adult children, ex-partners, or for libraries who grant teens over the age of 13 the same confidentiality privileges as adults. Confidentiality and privacy policies and procedures will need to be reviewed in light of this broad definition, as well as organization-wide discussions about the unintended consequences for patron privacy.

With other states adopting CCPA-type laws, libraries and vendors who do not fall under CCPA’s scope will have to reckon with CCPA. That is unless the US Federal Government passes a privacy law that overrules individual state laws. As always, stay tuned!

Resources for further reading:

There’s a Checklist For That!

Welcome to this week’s Tip of the Hat!

Last week was a busy week on both state and federal privacy regulation fronts, and it was a busy week for one-half of LDH too due to jury duty! The Executive Assistant was tasked to keep an eye on the state and federal updates; however, when asked for the report, the Executive Assistant was not forthcoming:

A black cat curled up on a yellow and green blanket.
While we catch up from a very busy week of updates, let’s talk about checklists.

Many of us use checklists each day, either as a to-do list, or to confirm that everything is in place before opening a library, or launching a new online service. Checklists can help prioritize and direct focus on otherwise large nebulous encompassing things, making sure that the important bits are not overlooked.

When we talk about privacy, many folks become overwhelmed as to what they should be doing at work to protect patron privacy. Libraries, in particular, have many bases to cover when it comes to implementing privacy best practices, ranging from electronic resources, public computing, websites, and applications. Where does one start?

In 2016, the ALA Intellectual Freedom Committee published the ALA Library Privacy Guidelines, aimed to help libraries and vendors in developing and implementing best practices surrounding digital privacy and security:

  • E-book Lending and Digital Content Vendors
  • Data Exchange Between Networked Devices and Services
  • Public Access Computers and Networks
  • Library Websites, OPACs, and Discovery Services
  • Library Management Systems
  • Students in K-12 Schools

There is a lot of good information in these guides; however, we run into the same overwhelming feeling when reading all the guides, not knowing where to start. Enter the checklists!

To give folks direction in working through the Library Privacy Guidelines, volunteers from the LITA Patron Privacy Interest Group and the Intellectual Freedom Committee’s Privacy Subcommittee created Library Privacy Checklists for each corresponding Guideline. Each checklist is broken down into three sections:

Priority 1 lists best practices that the majority of libraries and vendors should take with minimal additional resources. These practices are a baseline, the minimal amount that one needs to do to protect patron privacy.

Priority 2 are practices that will require a bit more planning and effort than those in the previous section. These practices can be done with some additional resources, be it in-house knowledge/skills or external vendors or contractors. Depending on the checklist, many libraries and vendors can implement at least one practice in this section, but some might not be able to go beyond this section.

Priority 3 are practices that require a higher level of technical skill and resources to implement. For those libraries and vendors that have the available resources, this section gives guidance as to where to focus those resources.

These checklists break the ALA Library Privacy Guidelines down into prioritized, actionable tasks for libraries and organizations to use when trying to align themselves with the Guidelines. The prioritization helps those organizations with limited resources to focus on core best privacy practices as well as giving more resourced organizations guidance as to where to go next in their privacy efforts. These checklists can also be used as a foundation for conversations about overall privacy practices at an organizational level, which could turn into a comprehensive privacy program review. There are many ways one can use these checklists at their organization!

The checklists were published in 2017; nevertheless, even though the technological landscape rapidly changes year to year, many of the practices in the checklists are still good practices to follow in 2019. Take some time today to visit revisit the checklists, and think about how those checklists can help you address some of your organization’s privacy questions or issues.

“It’s complicated”: GDPR Compliance and US Libraries

Hello and welcome to the inaugural issue of Tip of The Hat! Today’s topic is the complicated relationship between GDPR compliance and US Libraries.

We mean it when we say it’s complicated.

Many academic and public libraries scrambled in 2018 to determine if they would need to comply with the European Union’s launch of the General Data Protection Regulation (GDPR). Some libraries, particularly academic and special libraries, are following the lead of their parent organization in deciding if they need to comply. In the case of academic libraries, some higher education institutions have satellite campuses in the European Union, making compliance almost a certainty. Public libraries find themselves wondering if they need to comply even though they do not have a physical presence in the EU. Instead, public libraries might have EU citizens with library cards (if they are visiting workers or students, for example) or otherwise have EU citizens using library resources that collect user information.

In her article for the The Privacy Advisor, Katya Kulesova, CIPP/US, lays out five questions for US organizations wondering if they fall under the scope of GDPR:

  1. Do you personalize your goods or services for EU customers?
  2. Do you target EU users with advertising campaigns?
  3. Is there an establishment in the EU that is processing personal data on your entity’s behalf?
  4. Do you monitor European users?
  5. Do you have a large customer base in the EU?

Katya explores each question, noting key gray areas that can pop up in each question. For example, does using web analytic software, such as Google Analytics, on the library website count as monitoring EU users? If you are using that data to create user profiles that would then be used to influence user behavior, you might fall under the scope of GDPR.

The best way to determine if your library needs to comply with GDPR is to talk with your legal staff . Nonetheless, GDPR case law is few and in between, and it could take a couple of years to build a solid foundation of case law surrounding GDPR enforcement. In the meantime, these questions can help you and your legal staff start the conversation about GDPR compliance.

Even if your legal staff advises that your library does not fall under the scope of GDPR, you may still want to implement some of the privacy requirements laid out in the regulation. Many state laws, including the California Consumer Privacy Act, share many similarities with GDPR. With talk of a federal privacy law in recent months, it’s only a matter of time until US libraries will need to look into revising data privacy policies and procedures to comply to state and/or federal law. Take advantage of the advanced notice GDPR is giving you and start work now on your procedures and policies – you’ll be in good standing when your library is covered under an upcoming state or federal privacy law!

A few more resources surrounding GDPR and US libraries: