The Threat Within

A headshot of Chadwick Jason Seagraves with text overlay: 'Anonymous Comrades Collective - Doxer Gets Doxed: "Proud Boy" Chadwick Jason Seagraves of NCSU'

People sometimes ask what keeps privacy professionals up at night. What is that one “worst-case scenario” that we dread? Personally, one of the scenarios hanging over my head is insider threat – when a library employee, vendor, or another person who has access to patron data uses that data to harm patrons. A staff person collecting patron addresses, birthdays, and names to steal the patrons’ identities is an example of insider threat. Another example is a staff person accessing another staff’s patron records to obtain personal information to harass or stalk the staff member.

Last week, an IT employee at NCSU was doxed as a local leader of a white supremacist group. This person, who worked IT for the libraries in the past, doxed individuals, including students in his own university, to harass and, in some cases, incite violence toward the people being doxed. As an IT employee, this person most likely had unchecked access to students, staff, and faculty personal information. It wouldn’t be a stretch to say that he still had access to patron information, given his connections to the library and his IT staff position.

Libraries spend a lot of time and attention worrying about external threats to patron privacy: vendors, law enforcement, even other patrons. We forget that sometimes the greatest threat to patron privacy works at the library. Library workers who have access to patron data – staff, administration, board members, volunteers – can exploit patrons through the use of their data for financial gain in the case of identity theft or harm them through searching for specific library activity, checkouts of certain materials, or even names or other demographic information with the intent to harass or assault. The reality is that there might not be many barriers, if at all, to stop library workers from doing so.

The good news is that there are ways to mitigate insider threat in the library, but the library must be proactive in implementing these strategies for them to be the most effective:

Practice data minimization – only collect, use, and retain data that is necessary for business operations. If you don’t collect it, it can’t be used by others with the intent to harm others.

Implement the Principle of Least Privilege – who has access to what data and where? Use roles and other access management tools to provide staff (and applications!) access to only the data that is absolutely needed to perform their intended duty or function.

Regularly review internal access to patron data ­­– set up a scheduled review of who has what access to patron data. When an employee or other library worker/affiliate changes roles in the organization or leaves the library, develop and implement policies and procedures in revoking or changing access to patron data at the time of the role change or departure.

Confidentiality Agreements For Library Staff, Volunteers, and Affiliates – your privacy and confidentiality policy should make it clear to staff that patrons have the right to privacy and confidentiality while using library resources and services. Some libraries go further in ensuring patron privacy by using confidentiality agreements. These confidentiality agreements state the times when patron data can be access and the acceptable uses for patron data. Violation of the agreement can lead to immediate termination of employment. Here are some examples of confidentiality agreements to start your drafting process:

Regularly train and discuss about privacy  – ensure that everyone who is involved with the library – staff, volunteers, board members, anyone that might potentially access patron data as part of their role with the library – is up to date on current patron privacy and confidentiality policies and procedures. This is also an opportunity to include training scenarios that involve insider threat to generate discussion and awareness of this threat to patron privacy.

A note about IT staff, be it internal library IT staff or an external IT department (campus IT, city government IT, or another form of organizational IT) – Do not automatically assume that IT staff are following privacy/security standards and policy just because they are IT. Now is the time to discuss with your IT connections about their current access is and what is the minimum they need for daily operations. However, even if the IT department practices good security and privacy hygiene (such as making sure they follow the Principle of Least Privilege), any IT staff member who works with the library in any capacity must also sign a confidentiality agreement and be included in training sessions at the very minimum.

A data inventory is a good place to start if you are not sure who has access to what data in the library. The PLP Data Privacy Best Practices for Libraries project has several templates and resources to help with creating a data inventory, assessing privacy risks, and practical actions libraries can take in reducing the risk of an insider threat.

Libraries serve everyone. We serve patrons who are already at high risk for harassment and violence. Libraries must do their part in mitigating the risk that insider threat creates for our patrons who depend on the library for resources and support. Otherwise, we become one more threat to our patrons’ privacy and potentially their lives or the lives of their loved ones.

Just Published – Data Privacy Best Practices Toolkit for Libraries

Welcome to this week’s Tip of the Hat!

Today we’re happy to announce the publication of the Data Privacy Best Practices Toolkit for Libraries. This toolkit is part of the Data Privacy Best Practices Training for Libraries project, an LSTA-funded collaborative project between the Pacific Library Partnership and LDH focusing on teaching libraries the basics of data privacy. This introduction into data privacy in libraries serves as a guide for both administration and front-line workers, providing practical advice and knowledge in protecting patron data privacy.

The cover page for Data Privacy Best Practices Toolkit for Libraries: A Guide for Managing and Protecting Patron Data.

What does the toolkit cover? The topics range from the data lifecycle and managing vendor relationships to creating policies and procedures to protect patron privacy. The toolkit covers specific privacy concerns in the library, including law enforcement requests, surveillance, and data analytics. We also get to meet Mel and Rafaël, two library patrons who have unique privacy issues that libraries need to consider when thinking about patron privacy.  At the end of the toolkit is an extensive resource section with library privacy scholarship, professional standards, and regulations for further reading.

This toolkit is part of a larger group of resources, including templates and examples libraries can use to develop contract addendums, privacy policies and procedures, and data inventories and privacy risk assessments. In short, there are a lot of resources that are freely available for you to use in your library! Please let us know if you have any questions about the project resources.

Finally, stay tuned – the project is going into its second year, focusing on “train the trainer” workshops for both data privacy and cybersecurity. We’ll keep you updated as more materials are published!

Just Published! Library Data Risk Assessment Guide

Welcome to this week’s Tip of the Hat!

To build or to outsource?

Building an application or creating a process in a library takes time and resources. A major benefit of keeping it local, though, is that libraries have the greatest control over the data collected, stored, and processed by that application or system. Conversely, a major drawback of keeping it local is the sheer number of moving parts to keep track of in the building process. Some libraries have the technical know-how to build their own applications or have the resources to keep a process in house. Keeping track of privacy risks is another matter. Risk assessment and management must be addressed in any system or process that touches patron data, so how can libraries with limited privacy risk assessment or management experience make sure that their local systems and processes mitigate patron privacy risks?

Libraries have a new resource to help with privacy risk management! The Digital Library Federation’s Privacy and Ethics in Technology Working Group (formerly known as the Technologies of Surveillance Working Group) published “A Practical Guide to Performing a Library User Data Risk Assessment in Library-Built Systems“. This 28-page guide provides best practices and practical strategies in conducting a data risk assessment, including:

  • Classifications of library user data and privacy risk
  • A table of common risk areas, including probability, severity, and mitigation strategies
  • Practical steps to mitigate data privacy risks in the library, ranging from policy to data minimization
  • A template for readers to conduct their own user data inventory and risk assessment

This guide joins the other valuable resources produced by the DLF Privacy and Ethics in Technology Working Group:

The group also plans to publish a set of guidelines around vendor privacy in the coming months, so be sure to bookmark https://wiki.diglib.org/Privacy_and_Ethics_in_Technology and check back for any updates!

That Little Driver’s License Card…

Welcome to this week’s Tip of the Hat!

A driver’s license card is the first document many people use to prove their identity, be it at work, or the bank, or the airport. The card has key information needed for organizations and institutions: name, date of birth, address, photo, and the illustrious driver’s license number. Driver’s license cards can be a convenient form of identification, but it can also be a convenient way for your patrons’ identities to be stolen if your library is not careful in its handling of the card’s information.

As part of the library card registration process, many public libraries require some form of identification with a current address to confirm the patron’s home address. These libraries almost always accept driver’s license cards as one form of identification. But what do libraries do with the information on the card? Some record the driver’s license number in the patron record, while others take a photocopy scan of the card (yes, this has happened!). Several libraries use specially programmed barcode scanners to automatically populate the fields in the patron record from the information provided from the driver’s license barcode.

Each method carries its level of risk to the library patron’s privacy. Storing driver’s license numbers in the patron record or other places can open the patron up to identify theft if the library’s systems or physical spaces are compromised. There are various ways to compromise a physical or electronic space. We are familiar with the story of a person breaking into the system to steal information, but sometimes it is a staff person who steals the information. We also can’t forget that a leak is as damaging as a breach – sometimes staff leave the patron record up on the screen at public service desks, or a report printout is left on a desk for anyone to see or take.

Overall, the best way to mitigate the risk of a breach or leak of driver’s license numbers is to not collect or store driver’s license numbers. In the collection stage of the patron data lifecycle, we decide what data to collect. The data you collect should be tied to a specific, demonstrated business need at the point of collection. If you are collecting driver’s license numbers as a way to verify patrons and addresses, what are the business needs for collecting and storing that number in the patron record? You can achieve the same business need by other means, including creating a process of validating the patron record information with the identification without recording additional personal information in the record. Another consideration is that while driver’s license cards are a convenient form of identification, the card might have a name that the patron no longer uses and might have other outdated or incorrect information, including address information if the state does not mail a new card when there is an address change. Finally, not all patrons have driver’s license cards, and your patron registration policies and procedures need to accommodate this reality.

Even if you don’t collect or store the driver’s license number, there are still ways in which the library might inadvertently collect more patron information than they need from the card. Scanning driver’s license barcodes to auto-populate patron registration forms and records can save time in data entry, but be aware that these barcodes carry much more information than what is presented on the card, including gender and even Social Security Numbers. The software that you use to scan the barcodes should only record the information needed for the patron form and not store the additional information in the barcode. Your software vendor should have information about how they treat this extra data; if they do not, then the vendor product is a potential security risk for the library and the patrons which needs to be addressed with the vendor.

No matter how your library handles driver’s license cards, your library should be actively reviewing privacy practices on a regular basis. In 2019, the Contra Costa County Library System decided to stop collecting driver’s license numbers and purged existing numbers from their patron records. This decision came just at the right moment – the library system suffered a ransomware attack at the beginning of 2020. While recent reports state that no personal data was compromised, the risk of identity theft to library patrons would have been much greater if the driver’s license numbers were still stored at the library. In short, it’s never too late to review policies and procedures around patron address verification at your library!

A New Privacy Framework For You

Welcome to this week’s Tip of the Hat!

The National Institute of Standards and Technology recently published version 1.0 of their Privacy Framework. The purpose of the framework is to create a holistic approach to manage privacy risks in an organization. The Framework is different from other standards in such that the goal is not full compliance with the Framework. Instead, the Framework encourages organizations to design a privacy program that best meets the current realities and needs of the organization and key stakeholders, such as customers.

The Framework structure is split into three parts:

  • The Core is the activities and outcomes for protecting privacy in an organization. These are broken down by Function, Category, and Subcategory. For example:
    • Identify-P (the P is there to differentiate from NIST’s Cybersecurity Framework) is a Function in which the organization is developing an organizational awareness of privacy risks in their data processing practices.
    • A Category of the Identify-P Function is Inventory and Mapping, which is taking stock of various systems and processes.
    • The Subcategories of the Category are what you would expect from a data inventory: what data is being collected where, when, how, by who, and why.
  • The Profile plays two roles – it can represent the current privacy practices of an organization, as well as a target set of practices for which the organization can aim for. A Current Profile lists the current Functions, Categories, and Subcategories the organization is currently doing to manage privacy risks. The Target Profile helps businesses figure out what Functions, Categories, and Subcategories should be in place to best protect privacy and to mitigate privacy risk.
  • The Implementation Tiers are a measurement of how the organization is doing in terms of managing privacy risk. There are four Tiers in total, ranging from minimal to proactive privacy risk management. Organizations can use their Current Profile to determine which Tier describes their current operations. Target Profiles can be developed with the desired Tier in mind.

Why should libraries care about this framework? Libraries, like other organizations, have a variety of risks to manage as part of their daily operations. Privacy risks come in a variety of shapes and sizes, from collecting more data than operationally necessary and not restricting sharing of patron data with vendors to lack of clear communications with staff about privacy-related policies and procedures. Some organizations deal with privacy risks through privacy risk assessments (or privacy impact assessments). The drawback is that the assessments are best suited for focusing on specific parts of an organization and not the organization itself.

The Privacy Framework provides a way for organizations to manage privacy risks on an organizational level. The Framework takes the same approach to privacy as Privacy by Design (PbD) by making privacy a part of the entire process or project. The Framework can be integrated into existing organizations, which is by design – one of the criticisms of PbD is the complications of trying to implement it in existing projects and processes. The flexibility of the Framework can mean that different types of libraries – school, academic, public, and special – can create Profiles that both address the realities of their organization as well as creating Target Profiles that incorporate standards and regulations specific for their library. School libraries can address the risks and needs surrounding student library data as presented in FERPA, while public libraries can identify and mitigate privacy risks facing different patron groups in their community. The Framework also allows for the creation of Subcategories to cover any gaps specific to an industry or organization not covered by the existing Framework, which gives libraries added flexibility to address library industry-specific needs and risks.

The flexibility of the Framework is a strength for organizations looking for a customized approach to organizational privacy risk management. This same flexibility can also be a drawback for libraries looking for a more structured approach. The Framework incorporates other NIST standards and frameworks, which can help ease apprehension of those looking for more structure. Nonetheless, libraries that want to explore risk management and incorporate privacy into their organization should give NIST Privacy Framework some consideration.

Who Knows, Who Decides, and Who Decides Who Decides

Welcome to this week’s Tip of the Hat!

Shoshana Zuboff’s book The Age of Surveillance Capitalism provides a comprehensive overview of the commodification of personal information in the digital age. Surveillance capitalism is a specific form of capitalism that focuses on using personal data to predict and control user behavior. Zuboff’s analysis of surveillance capitalism centers around three questions:

  • Who knows?
  • Who decides?
  • Who decides who decides?

In the book, Zuboff provides some context to the questions:

The first question is “Who knows?” This is a question about the distribution of knowledge and whether one is included or excluded from the opportunity to learn. The second question is “Who decides?” This is a question about authority: which people, institutions, or processes determine who is included in learning, what they are able to learn, and how they are able to act on their knowledge. What is the legitimate basis of that authority? The third question is “Who decides who decides?” This is a question about power. What is the source of power that undergirds the authority to share or withhold knowledge?

Zuboff offers answers to these three questions in her book: “As things currently stand, it is the surveillance capitalist corporations that know. It is the market form that decides. It is the competitive struggle among surveillance capitalists that decides who decides.” While the current prognosis is grim according to Zuboff’s analysis, the three questions are a powerful tool in which one can discover the underlying power structures of a particular organization or culture.

An interesting thought exercise involves applying these three questions to the library. On a lower level, the data lifecycle provides some answers to “Who knows?” concerning access to patron data as well as the publication and disclosure of data in reports, data sets, and so on to third parties. The “Who decides?” question goes beyond the data lifecycle and ventures into the realm of data governance, where decisions as to who decides the data practices of the library are made. However, the answer goes beyond data governance. Library use of third-party tools and services in collecting or processing patron data bring these third parties into the realm of “Who knows?” as well as “Who decides?” The third-party can adjust their tools or products according to what best serves their bottom line, as well as providing a tool or product that they can market to libraries. Third parties decide what products to put out to the market, and libraries decide which products meet their needs. Both parties share authority, which leads this thought experiment closer to Zuboff’s analysis of the market as the decider.

That brings us to the third question, “Who decides who decides?” Again, our thought experiment starts to blend in with Zuboff’s answer to the same question. There is indeed a struggle between vendors competing in a niche market that has limited funds. We would be remiss, though, if we just left our analysis pointing to competition between third parties in the market. Part of what is driving the marketplace and the tools and services offered within are libraries themselves. Libraries are pressured to provide data for assessment and outcomes to those who directly influence budgets and resources. Libraries also see themselves as direct competitors to Google, Amazon, and other commercial companies that openly engage in surveillance capitalism. Instead of rejecting the methods used by these companies, libraries have to some extent adopted the practices of these perceived market competitors to keep patron using library services. A library on this path could find themselves upholding surveillance capitalism’s grasp in patrons’ lives.

Fitting this thought experiment into one newsletter does not give the questions the full attention they deserve, but this gives us a place to start thinking about how the library shares some of the same traits and qualities found in surveillance capitalism. Data from patron activities can provide valuable insight into patron behaviors, creating personalized library services where yet more data can be collected and analyzed for marketing purposes. It’s no surprise that data analytics and customer relationship management systems have taken off in the library market in recent years – libraries believe that there is a power that comes with these tools that otherwise wouldn’t be accessible through other means. Nonetheless, that belief is influenced by surveillance capitalists.

Decided for yourself – give Zuboff’s book a read (or listen for the audiobook) and use the three questions as a starting point for when you investigate your library’s role in the data economy.

Shining a Light on Dark Data

Welcome to this week’s Tip of the Hat!

Also, welcome to the first week of Daylight Savings Time in most of the US! To celebrate the extra hour of daylight in the morning (we at LDH are early birds), we will shed light on a potential privacy risk at your organization – dark data.

The phrase “dark data” might conjure up images of undiscovered data lurking in the dark back corner of a system. It could also bring to mind a similar image of the deep web where the vast amount of data your organization has is hidden to the majority of your staff, with only a handful of staff having the skills and knowledge to find this data.

The actual meaning of the phrase is much less dramatic. Dark data refers to collected data that is not used for analysis or other organizational purposes. This data can appear in many places in an organization: log files, survey results, application databases, email, and so on. The business world views dark data as an untapped organizational asset that will eventually serve a purpose, but for now, it just takes up space in the organization.

While the reality of dark data is less exciting than the deep web, the potential privacy issues of dark data should be taken seriously. The harm isn’t that the organization doesn’t know what it’s collecting – dark data is not unknown data. One factor that leads to dark data in an organization is the “just in case” rationale used to justify data collection. For example, a project team might set up a new web application to collect patron demographic information such as birth date, gender, and race/ethnicity not because they need the data right now, but because that data might be needed for a potential report or analysis in the future. Not having the data when the need arises means that you could be out on important insights and measures that could sway decision-makers and the future of operations. It is that fear of not having that data, or data FOMO, that drives this collection of dark data.

When you have dark data that is also patron or other sensitive data, you put your organization and patrons at risk. Data sitting in servers, applications, files, and other places in your organization are subject to being leaked, breached, or otherwise subject to unauthorized access by others. This data is also subject to disclosure by judicial subpoenas or warrants. If you choose to collect dark data, you choose to collect a toxic asset that will only become more toxic over time, as the risk of a breach, leak, or disclosure increases. It’s a matter of when, not if, the dark data is compromised.

Dark data is a reality at many organizations in part because it’s very easy to collect without much thought. The strategies in minimizing the harms that come with dark data require some forethought and planning; however, once operationalized, these strategies can be effective in reducing the dark data footprint in your organization:

  • Tying data collection to demonstrated business needs – When you are deciding what data to collect, be it through a survey, a web application, or even your system logs, what data can be tied back to a demonstrated business need? Orienting your data collection decisions to what is needed now for operational purposes and analysis shifts the mindset away from “just in case” collection to what data is absolutely needed.
  • Data inventories – Sometimes dark data is collected and stored and falls off the radar of your organization. Conducting regular data inventories of your organization will help identify any potential dark data sets for review and action.
  • Retention and deletion policies – Even if dark data continues to persist after the above strategies, you have one more strategy to mitigate privacy risks. Retention policies and proper deletion and disposal of electronic and physical items can limit the amount of dark data sitting in your organization.

The best strategies to minimize dark data in your organization happens *before* you collect the data. Asking yourself why you need to collect this data in the first place and looking at the system or web application to see what data is collected by default will allow you to identify potential dark data and prevent its collection.

You Say Security, I Say Privacy…

Welcome to this week’s Tip of the Hat!

You might have seen the words “security” and “privacy” used interchangeably in articles, blog posts, and other areas of discussion surrounding protecting sensitive data. Sometimes that interchange of words further complicates already complex matters. A recent article by Steve Touw explores the confusion surrounding encryption and redaction methods in the CCPA. Touw breaks down encryption and redaction to their basic components which shows that each method ultimately lives in two different worlds: encryption in the security world, and redaction in the realm of privacy.

But aren’t privacy and security essentially the same thing, which is the means of protecting an asset (in our case, data)? While both arguably have the same goal in protecting a particular asset, privacy and security are different in the way in which they approach risk assessment and evaluation. In the scope of information management:

Security pertains to actions that protect organizational assets, including both personal and non-personal data.

Privacy pertains to the handling, controlling, sharing, and disposal of personal data.

Security and privacy do share key concepts and concerns, including appropriate use, confidentiality, and access to organizational assets (including personal data). Nonetheless, implementing security practices doesn’t necessarily guarantee privacy; a quote that makes the rounds in privacy professional groups is “You can have security without privacy, but you cannot have privacy without security.”

An example of the above quote comes from when you log into a system or application. Let’s use staff access to the integrated library system for this example. A login allows you to control which staff can access the ILS. Assigning individual logins to staff members and ensuring that only those logins can access the staff functions in the ILS is a security measure. This security measure protects patron data from being inappropriately accessed by other patrons, or others looking for that data. On that point of using security to protect privacy, so far, so good.

Once we get past the login, though, we come to a potential privacy issue. You have staff logins, which prevent unauthorized access to patron data by the public, but what about unauthorized access to patron data by your own staff? Not every staff member needs to have access to patron data in order to perform their daily duties. By leaving staff logins to have free reign over what they can access in the ILS database, you are at risk of violating patron privacy even though you have security measures in place to limit system access to staff members. To mitigate this risk, another security measure can be used – assigning who can access what through role or group level access controls. Most ILSes have a basic level of role-based access controls where systems administrators can assign the lowest level of access needed for each role, and applying these roles consistently will limit the instances of unauthorized access to data by staff.

All the security measures in the world, nonetheless, will not mitigate the risk of privacy harm to your patrons if your ILS is collecting highly sensitive data in the first place! These security measures don’t prevent you from collecting this type of data. This is where privacy policies and determining what data needs to be collected to meet operational needs come into play. If you don’t collect the data, the data cannot be breached or leaked.

It’s clear from this example that both privacy and security have parts to play in protecting patron privacy. Understanding these parts – where they overlap, and where they diverge – will help you through building and maintaining a robust set of data privacy and security practices throughout your organization.