Stop Collecting Data About Your Patrons’ Gender Identity

A four-way stop sign in front of snow-covered tree branches.
Image source: https://www.flickr.com/photos/ben_grey/4383358421/ (CC BY-SA 2.0)

tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity.

Longer tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity for library workers to do their daily work.

Nuanced tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity 99% of the time, and in that 1% where the data is required, you’re probably doing more harm than good in your collection methods.

This post is brought to you by yet another conversation about including gender identity data in patron records. Libraries collected this data on their patrons for decades; it’s not uncommon to have a “gender” field in the patron record of many integrated library systems and patron-facing vendor services and applications. But why collect this data in the first place?

Two explanations that come up are that gender identity data can be used for marketing to patrons and for reading recommendations. However, these explanations do not account for the problem of relying on harmful gender stereotypes. Take the belief that boys are reluctant readers, for example. Joel A. Nichols wrote about his experience as a children’s librarian and how libraries do more harm than help in adopting this belief:

These efforts presume that some boys are not achieving well in school because teachers and librarians (who are mostly women) are offering them books that are not interesting to them (because they are boys). I find this premise illogical and impracticable, in particular because I am queer: the things that were supposed to interest boys did not necessarily interest me, and the things that were supposed to interest girls sometimes did. Additionally, after years of working in children’s departments, I found over and over again that lots of different things interested lots of different kids. In my experience, it was the parents that sometimes asked for “boy books” or “girl books.” The premise that boys need special “boy” topics shortchanges librarians and the children themselves, and can alienate kids who are queer or genderqueer.

This collection of patron data can be used to harm patrons in other ways, such as library staff misgendering and harassing patrons based on the patron’s gender identity. A recent example comes from the 2019 incident where library staff repeatedly misgendered a minor patron when she was with her parent to sign up for her library card. While the library decided to stop collecting gender identity data on library card applications as a result of the incident, the harm done cannot be remedied as easily as changing the application form.

The ALA Rainbow Round Table recommends that libraries do not collect gender identity data from patrons unless absolutely needed. Since the recommendation in 2015, several libraries evaluated their collection of gender identity data only to find that they were not using that data. Collecting data for “just in case” opens library patrons to additional harm if the library suffers a data breach. If there is no demonstrated business need for a data point, do not collect that data point.

In the rare case that your library absolutely must collect data about the gender identity of your patrons (such as a requirement to report on aggregated patron demographic data for a grant-funded project), care must be taken in collecting this data to mitigate additional harms through alienation and exclusion.  The Rainbow Round Table recommends the Williams Institute’s report “Best Practices for Asking Questions to Identify Transgender and Other Gender Minority Respondents on Population-Based Surveys” as a guide to collecting such data. The Williams Institute has also created a short guide to create survey questions around gender identity. Here are more resources that can guide respectful demographic data collection:

Again, the resources above are only for the rare case that your library absolutely must collect this data from your patrons. Libraries considering collecting gender identity data must review the rationale behind the collection. A patron should not be required to tell the library their gender identity to use the library’s collections and services. Even the act of collecting this data can harm and disenfranchise patrons.

tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity.

A Quick Data Privacy Check-in for The New Year

A small orange and white kitten sits on an Apple floppy drive, while a picture of a gray cat is displayed on an Apple monitor.
Image source: https://www.flickr.com/photos/50946938@N03/5957820087/ (CC BY 2.0)

Welcome to 2021! We hope that everyone had a restful holiday break. There might be some changes to your work environment for the new year that could affect the privacy and security of your patrons’ data. Let’s start this year off with a quick (and gentle) check-in.

Smart devices

Smartwatches, smart speakers, smart TVs – what new internet-enabled smart device has taken residence in your home, office, or even on your person? You might not know that these devices eavesdrop on your conversations and, in some instances, eavesdrop on what you type. If you are working with a patron or talking with a colleague that includes patron information, what smart devices are in listening range that weren’t before the new year?

Depending on the device, you might be able to prevent eavesdropping; however, other devices might not have this option. Disconnecting the internet from the device is also an option, but this might be more of a hassle than a help. The one sure way to stop a device from eavesdropping is to remove it from listening range, or, better yet, disconnecting the device from its power source.

Computers and mobile devices

A new year could mean a new computer or mobile device. If this is you, and if you are using a personal computer or mobile device for working with patrons or patron data, don’t forget to do the following while setting up your new device:

  • Install antivirus software (depending on your organization, you might have access to free or discounted software)
  • Install the VPN client provided by your organization
  • Install privacy-preserving tools and browser extensions
  • Enable auto-updates for the operating system and any applications installed on the device
  • Review the privacy and security settings for your operating system:
    • Mac and iOS devices – Apple recently published a document listing security and privacy settings on all Apple devices. The tl;dr summary by Lifehacker is a good resource if you’re not sure where to begin
    • Android – Computerworld’s guide to Android privacy is long but worthwhile if you want a list of actions to take based on the level of privacy you want on your device. Also, visit Google’s Data Privacy Settings and Controls page to change your Google account privacy settings (because now is a good time as any to review Google settings).

Evergreen recommendations

Even if you didn’t get a new smart device or computer for the holidays, here are a few actions you can do with any device to start the new year right by protecting your and your patrons’ privacy:

Take a few moments this week to review privacy settings and risks – a moment of prevention can prevent a privacy breach down the road.

Patron Privacy Support: Holiday Edition

An orange cat looking at a laptop screen and pawing a mouse tracking pad.
Image source: https://www.flickr.com/photos/25473210@N00/421211549 (CC BY 2.0)

Black Friday and Cyber Monday have come and gone, but there are still plenty of opportunities to buy the last-minute gift to mark the end of a rough year. Patrons who might have gone to the library to ask for help setting up their new tech gadget will still find their way to the library help desk via chat, email, or phone. Other patrons might come to the help desk with questions from researching which tech gadgets to gift to others (or to themselves!). Why not use this time to do a bit of privacy instruction?

For patrons wondering what to buy – Mozilla’s *privacy not included is an excellent starting point for researching tech gifts that connect to the internet. The guide contains information about data privacy and security for each product and even warns you if a particular product doesn’t meet a minimum security standard.

For patrons who are shopping online – Even though most of our lives have shifted to online thanks to the pandemic, patrons might not have online safety and privacy in mind while shopping online. Account privacy settings, passwords, credit cards, web tracking, digital fingerprinting, phishing emails – the list of vulnerabilities and threats goes on and on. Having a sense of the patron’s threat model will help you determine which guides and resources you can use to help the patron protect their privacy while online. The Virtual Privacy Lab from the San Jose Public Library gives patrons a customizable privacy toolkit they can then use to protect their online privacy and security. You can also send along this short newsletter from SANS about secure online shopping that will help patrons to protect themselves while they shop online.

For patrons setting up their new tech gadget – The patron is excited about their new tech gadget! That is until they can’t figure out how to set it up. This is a great place to introduce privacy-preserving practices found in the Data Detox Kit and in other resources on the Choose Privacy Every Day site to set up their devices to protect their privacy and security right when they start using the gadget.

Last, an evergreen reminderdo not buy or gift an Amazon Ring.

No matter the gadget question or help request this holiday season, there’s always an opportunity to give the gift of privacy to patrons through sharing ways to help them protect their data. While this year might prove a challenge to provide the same level of support at the information or help desk, the above online resources make meeting that challenge a little easier for both the patron and for library staff. Happy shopping and tech support-ing!

FYI – New Newsletter Privacy Policy

Today (as in an hour before publishing our post!) MailPoet announced that it has been acquired by WooCommerce. LDH uses MailPoet for our weekly newsletter mailings. We will be reviewing the new Privacy Policy for the app to decide if we should continue to use the app. While we do not currently use any of the analytics features on MailPoet, we will need to determine if this acquisition means a change in data collection and processing with the third-party vendor. LDH will announce any changes to the newsletter app or other updates in a future post. If you have any questions in the meantime, please feel free to email us.

The Threat Within

A headshot of Chadwick Jason Seagraves with text overlay: 'Anonymous Comrades Collective - Doxer Gets Doxed: "Proud Boy" Chadwick Jason Seagraves of NCSU'

People sometimes ask what keeps privacy professionals up at night. What is that one “worst-case scenario” that we dread? Personally, one of the scenarios hanging over my head is insider threat – when a library employee, vendor, or another person who has access to patron data uses that data to harm patrons. A staff person collecting patron addresses, birthdays, and names to steal the patrons’ identities is an example of insider threat. Another example is a staff person accessing another staff’s patron records to obtain personal information to harass or stalk the staff member.

Last week, an IT employee at NCSU was doxed as a local leader of a white supremacist group. This person, who worked IT for the libraries in the past, doxed individuals, including students in his own university, to harass and, in some cases, incite violence toward the people being doxed. As an IT employee, this person most likely had unchecked access to students, staff, and faculty personal information. It wouldn’t be a stretch to say that he still had access to patron information, given his connections to the library and his IT staff position.

Libraries spend a lot of time and attention worrying about external threats to patron privacy: vendors, law enforcement, even other patrons. We forget that sometimes the greatest threat to patron privacy works at the library. Library workers who have access to patron data – staff, administration, board members, volunteers – can exploit patrons through the use of their data for financial gain in the case of identity theft or harm them through searching for specific library activity, checkouts of certain materials, or even names or other demographic information with the intent to harass or assault. The reality is that there might not be many barriers, if at all, to stop library workers from doing so.

The good news is that there are ways to mitigate insider threat in the library, but the library must be proactive in implementing these strategies for them to be the most effective:

Practice data minimization – only collect, use, and retain data that is necessary for business operations. If you don’t collect it, it can’t be used by others with the intent to harm others.

Implement the Principle of Least Privilege – who has access to what data and where? Use roles and other access management tools to provide staff (and applications!) access to only the data that is absolutely needed to perform their intended duty or function.

Regularly review internal access to patron data ­­– set up a scheduled review of who has what access to patron data. When an employee or other library worker/affiliate changes roles in the organization or leaves the library, develop and implement policies and procedures in revoking or changing access to patron data at the time of the role change or departure.

Confidentiality Agreements For Library Staff, Volunteers, and Affiliates – your privacy and confidentiality policy should make it clear to staff that patrons have the right to privacy and confidentiality while using library resources and services. Some libraries go further in ensuring patron privacy by using confidentiality agreements. These confidentiality agreements state the times when patron data can be access and the acceptable uses for patron data. Violation of the agreement can lead to immediate termination of employment. Here are some examples of confidentiality agreements to start your drafting process:

Regularly train and discuss about privacy  – ensure that everyone who is involved with the library – staff, volunteers, board members, anyone that might potentially access patron data as part of their role with the library – is up to date on current patron privacy and confidentiality policies and procedures. This is also an opportunity to include training scenarios that involve insider threat to generate discussion and awareness of this threat to patron privacy.

A note about IT staff, be it internal library IT staff or an external IT department (campus IT, city government IT, or another form of organizational IT) – Do not automatically assume that IT staff are following privacy/security standards and policy just because they are IT. Now is the time to discuss with your IT connections about their current access is and what is the minimum they need for daily operations. However, even if the IT department practices good security and privacy hygiene (such as making sure they follow the Principle of Least Privilege), any IT staff member who works with the library in any capacity must also sign a confidentiality agreement and be included in training sessions at the very minimum.

A data inventory is a good place to start if you are not sure who has access to what data in the library. The PLP Data Privacy Best Practices for Libraries project has several templates and resources to help with creating a data inventory, assessing privacy risks, and practical actions libraries can take in reducing the risk of an insider threat.

Libraries serve everyone. We serve patrons who are already at high risk for harassment and violence. Libraries must do their part in mitigating the risk that insider threat creates for our patrons who depend on the library for resources and support. Otherwise, we become one more threat to our patrons’ privacy and potentially their lives or the lives of their loved ones.

Just Published – Data Privacy Best Practices Toolkit for Libraries

Welcome to this week’s Tip of the Hat!

Today we’re happy to announce the publication of the Data Privacy Best Practices Toolkit for Libraries. This toolkit is part of the Data Privacy Best Practices Training for Libraries project, an LSTA-funded collaborative project between the Pacific Library Partnership and LDH focusing on teaching libraries the basics of data privacy. This introduction into data privacy in libraries serves as a guide for both administration and front-line workers, providing practical advice and knowledge in protecting patron data privacy.

The cover page for Data Privacy Best Practices Toolkit for Libraries: A Guide for Managing and Protecting Patron Data.

What does the toolkit cover? The topics range from the data lifecycle and managing vendor relationships to creating policies and procedures to protect patron privacy. The toolkit covers specific privacy concerns in the library, including law enforcement requests, surveillance, and data analytics. We also get to meet Mel and Rafaël, two library patrons who have unique privacy issues that libraries need to consider when thinking about patron privacy.  At the end of the toolkit is an extensive resource section with library privacy scholarship, professional standards, and regulations for further reading.

This toolkit is part of a larger group of resources, including templates and examples libraries can use to develop contract addendums, privacy policies and procedures, and data inventories and privacy risk assessments. In short, there are a lot of resources that are freely available for you to use in your library! Please let us know if you have any questions about the project resources.

Finally, stay tuned – the project is going into its second year, focusing on “train the trainer” workshops for both data privacy and cybersecurity. We’ll keep you updated as more materials are published!

NaNoWriMo: Data Privacy Edition

A Siamese cat sitting in front of an open laptop computer.
‘Tis the season for all things writing. Your cat might have some opinions about that… Source: https://www.flickr.com/photos/cedwardmoran/4179761302/

Welcome to this week’s Tip of the Hat!

Today marks the second day of NaNoWriMo – National Novel Writing Month. For years many aspiring (and established) writers spend countless hours writing to reach the goal of a 50,000-word manuscript. If you do the math, you would have to write about 1700 words a day to reach the goal! Novels are the primary genre for NaNoWriMo, but that hasn’t stopped others from taking the idea of a writing month and using it for other genres. For example, this month is also AcWriMo, or Academic Writing Month, for academics who need to buckle down to write that research book or article.

With November being the month of writing, why not join in the fray with writing about data security and privacy? Our recent Cybersecurity Awareness Month posts discussed the importance of interactive and engaging training, so the question now is how you can build a data security and privacy training that won’t put staff to sleep, or worse, demotivate them from taking proactive privacy and security measures to protect patron data. One way to create engaging training is to use stories and scenarios. Drawing from real-world examples is a start, but the challenge is turning that example into a scenario where training participants are invested in addressing the problems presented in the story. Here are a few tips to help you with the writing process!

Characters – who are the major players in the scenario? Staff person, patron, vendor, random person who comes off the street, the cat who keeps sneaking into the library building? Once you have the characters, what roles do they play? What are their motivations? Why do they do the things they do or think the way they think?

So many questions, even for a short scenario! Take a page from User Experience (UX) and create personas to help with the character-building process. Even a shortlist of who they are, what motivates them, what they want, and what they know can help hone the scenario narrative as well as introduce common types of motivations, knowledge/skill levels, and different types of threat actors or people that might face additional privacy risks to training attendees. 

If you need more inspiration for characters, may I introduce you to Alice and Bob and their crypto-friends?

Story – Your real-world examples or the case studies you learn from others are two good places to start. That shouldn’t stop you from exploring building scenarios from scratch! Or perhaps you would like to modify the real-world examples into a scenario that would be a better fit for the training you’re developing. One concept to explore for your scenario is threat modeling, or identifying potential weaknesses at the library (systems, procedures, policies, etc.), who or what might take advantage of the weakness, and what can be done to either avoid or mitigate the threat. The threat modeling process can uncover a complex web of threats and vulnerabilities that interact with each other. On the other hand, it could lead to valuable conversations with trainees about how one vulnerability can create a ripple effect if exploited, or how a threat actor isn’t always acting with malicious intent. Sometimes the most dangerous threat actors are not aware that they are putting data privacy at risk such as a staff person with good intentions sharing patron data without knowledge of patron privacy procedures. 

Visual aids – What’s a story without visual aids? You might not have the resources or acting chops to create scenario videos, but there are always pictures to give life to your characters and scenarios. Luckily, there are several Creative Commons licensed resources to choose from:

You can also search for CC-licensed photos on Flickr and Creative Commons.

There are a lot more you can do with building scenarios for your data privacy and security trainings, but these three areas will hopefully get you started down the path of becoming an accomplished author… of training scenarios 😉 Enjoy your writing journey, and good luck!

Tracking the Trackers: Blacklight

Welcome to this week’s Tip of the Hat!

Visiting a website almost always means that you will be tracked. Be it a cookie, or a script, or even an access log on the server that hosts the site, you will leave some sort of data trail for folks to collect, analyze, and use. However, it’s becoming increasingly difficult to track all the ways (pun semi-intended) a website is keeping tabs on you. What trackers should you be worried about the most? Which trackers should you allow in your browser? Are there any trackers that might track you even when you leave the site?

The Markup published Blacklight, the latest tool in the suite of tracker detection tools that allow users to discover the many ways a website is tracking users and collecting data in the process. In all, Blacklight reports on major tracking methods, including cookies, ad trackers, Facebook tracking, and Google Analytics. Blacklight also checks to find out if the website is taking your digital fingerprint on top of logging your keystrokes or session. The creators of the tool blogged about their development process, for those who want to nitty-gritty technical details on the development of the tool and how it works.

One unique feature of Blacklight is giving the user the ability to find out how a website tracks without having to visit the website. This is nothing new for folks who can write a script; however, Blacklight makes this process much easier to achieve for the majority of users who are otherwise visiting website after website to investigate how each website is tracking them. One example would be libraries performing privacy audits or reviews on library or vendor websites. Instead of having to potentially expose the worker to various tracking methods while auditing or dealing with different browsers and their settings during the auditing/testing process, the worker can work from a list of URLs and stay on one tab in their browser of choice.

There are some drawbacks if libraries want to use Blacklight as their main tracker detection tool. As mentioned above, Blacklight tracks major tracking methods, but the resulting report does not give much information beyond if Blacklight found something. Let’s take Hoopla for example. We entered the main site URL – www.hoopladigital.com – and Blacklight visited a random page…

A screenshot explaining how Blacklight accessed the Hoopla homepage, including two screenshots of the mobile version of the Hoopla home page and their privacy policy.
The irony of the random page chosen is not lost on us.

This is what Blacklight found:

  • Three ad trackers
  • Facebook tracking
  • Google Analytics cross-site tracking
  • Session logging (as well as possible keystroke logging)

However, the report only tells the user that these trackers are present. There is no information in the report about how to prevent session logging or blocking ad trackers. Instead, the user will need to go elsewhere for that information. The tool creators did create a post for users wondering what to do with the results, but this information is not front and center in the report.

Another drawback is that several library vendor URLS might not be able to be checked due to proxy or access restrictions. Let’s say you want to test https://web-a-ebscohost-com.ezproxy.spl.org/ehost/search/basic?vid=1&sid=e58a91f5-4f12-4648-991f-4bdc9ff8f94b%40sdc-v-sessmgr01 – the link to access an EBSCO database for a local public library. Blacklight will try to visit the website but will be stopped at the EZproxy login page every time. There is a possible way to work around this limitation by taking the source code from the two Blacklight Github repositories and reworking the code to allow for authentication during the testing process. However, it might be simpler for some libraries to visit the individual site with tracking detection and blocking browser add-ons, such as Privacy Badger, DuckDuckGo Privacy Essentials, and Ghostery.

Despite these drawbacks, Blacklight is useful in illustrating the prevalence of tracking on major websites. Library workers might use Blacklight alongside other tracking detection tools for privacy audits, provided that the library workers know the next steps in interpreting the results, such as comparing what they found to the privacy policy of the vendor or library to determine if the policy reflects reality. The tool would also be a welcomed addition to any digital literacy and privacy programming for patrons to demonstrate how websites can track users, even when a user leaves the website. Blacklight will most likely have updates and new features since the code is freely available, so it might be that some of these drawbacks will be addressed in an update down the road. But enough talking – take Blacklight out for a spin! First destination – your library’s homepage. 😉

Teaching Privacy in Information Literacy Sessions

Welcome to this week’s Tip of the Hat!

Summer is over, and for many library workers, the start of the fall season means an uptick of library instruction sessions and programs. Academic and school library workers who already face the challenge of creating and teaching “one-shot” instructional sessions have the added challenge of moving these sessions online instruction during a pandemic. With this move to online comes the increased use of learning management systems and other online tools and applications that collect, process, and share student data. This increase in use translates into an increased risk to student privacy, particularly while interacting with the library’s online services and programs, and this risk might not be readily apparent to students who are facing many stressors and challenges in their first few weeks into the new school year.

Navigating “one-shot” library instruction sessions or other short interactions between the library and the student is not easy; however, these instruction sessions and interactions also present the opportunity to raise awareness about data privacy and security. One way to take advantage of this opportunity is to move away from the mindset of approaching data privacy in library instructional sessions as “yet-one-more-thing” to teach in an already packed session. That’s not an easy task for anyone, even for those of us who are privacy advocates.

In their article “Privacy literacy instruction practices in academic libraries: Past, present, and possibilities“, Sarah Hartman-Caverly and Alexandria Chisholm surveyed academic library workers and their experiences incorporating privacy into their instructional sessions. Out of 80 respondents, over one-third reported not including privacy topics in their library instruction sessions. Even those who include privacy topics in their instruction were not satisfied with privacy instruction at their institutions, with the majority being neutral or somewhat dissatisfied. This dissatisfaction stems from a variety of factors, with 80% of 55 respondents (n=44) stating that they do not have enough instructional time to cover privacy. This is the reality of many library instructors overall and requires a radical departure of how libraries traditionally deliver library instruction to students, as well as working with faculty and staff in developing and delivering this instruction.

What caught our attention at LDH is the second factor that almost 62% of survey respondents (n=34) identified as to why they are dissatisfied with privacy instruction – “Privacy is not a priority learning outcome for IL sessions”. What can make privacy a priority, then? Again, this requires a radical departure of how libraries approach information literacy (IL), but it also requires an examination of the priorities of the individual library as well as the professional frameworks library workers use to inform their approach to IL and pedagogy. While ALA’s Library Bill of Rights explicitly states privacy as a patron right, the ACRL Framework for Information Literacy for Higher Education only includes one mention of privacy concerning “issues related to privacy and the commodification of personal information.” Privacy is much more than the commodification of personal information, but the Framework does not reflect this reality. The lack of guidance in the Framework, as well as the dearth of concrete case studies of privacy in IL in the LIS literature noted by Hartman-Caverly and Chisholm, leave IL instructors little to work within a time where privacy instruction is more vital than ever.

Hartman-Caverly and Chisholm give their readers some guidance in their privacy literacy case study as well as their recommendations for addressing the barriers noted by survey respondents. The literature review of the article is another resource to glean strategies in bringing privacy into IL practices.

For those who are still struggling in thinking about how to incorporate privacy into an already packed lesson plan, think about this – what library resources and apps are you teaching to your students? Library systems and applications, particularly third-party apps and resources, also collect, process, and share patron data. Talking about digital data privacy and security in the context of using library services and resources can be one way to introduce students to privacy literacy while educating patrons about the library’s privacy practices. This approach to privacy literacy in “one-shot” instructional sessions can be strengthened by offering patron data privacy services such as the services provided by Cornell University; nonetheless, using the library’s own resources and tools when talking about privacy is a start for library instructors who are short on time.

So You Want to Work With Patron Data… De-identification Basics

Welcome to this week’s Tip of the Hat! This week’s post is a “back to basics” about de-identification and patron data. Why? After reading a recent article published in the Code4Lib Journal where patron data was not de-identified before combining it with external data sets, now’s a good time as any to remind library workers about de-identification. [1]

De-identification Definitions

Before we talk about de-identification, we must talk about anonymization and the differences between the two:

  • De-identification is when you remove the connection between the data and any identifiable individual in the real world. Sometimes de-identified datasets have a unique identifier replacing personally identifiable information (PII) to data points, which is then called pseudonymization.

De-identification provides a way for some to work with data to track individual trends with a reduced risk of re-identification and other privacy risks. Why “for some” and “reduced”? We’ll get into the whys of the issues with de-identification later in this post.

De-identification Method Basics

PII comes in two forms: data about a person and data about a person’s activities that can be linked back to the person. The methods and level of work needed to sufficiently de-identify patron data depend on the type of PII in the data set. The methods commonly used to de-identify PII include truncation, obfuscation, and aggregation.

  • Obfuscation moves the reference point of the data up a few levels of granularity. An example is using a birth year or age instead of the person’s full birth date.
  • Truncation strips the raw data to a small subsection general enough that it cannot be easily connected to an identifiable person. A real-world example of truncation is HIPAA’s guidance on physical address de-identification, truncating the address to the first three digits of the zip code.
  • Aggregation further groups individual data points creating a more generalized data set. Going back to the obfuscation example, individual ages can be aggregated into age ranges.

There are more methods to de-identify data, some of which can get quite complex, such as differential privacy. The three methods mentioned above, nonetheless, are some of the more accessible de-identification methods available to libraries.

Before You De-identify…

Remember in the first section that we mentioned that de-identification only works for some data sets and only reduces privacy risk? There are two main reasons why this is:

  1. De-identification does not protect outliers in data or for small population data sets. There are equations (more) and properties that can help you determine if your dataset cannot be re-identified, but for most libraries, de-identification is not possible due to the type or size of the data set they wish to deidentify.
  2. De-identified data can still be re-identified through the use of external data sets, particularly if the data in the de-identified dataset was not properly de-identified. An evergreen example is the AOL data set that retained identifying data in the search queries, even though AOL scrubbed identifying data about the searcher.

It is possible to have a de-identified data set of patron data, but the process is not fool-proof. De-identification requires multiple sample de-identification processes and analysis in determining the risk of how easy it is to reconnect the data to an individual.

Overall, de-identification is a tool to help protect patron privacy, but it should not be the only privacy tool used in the patron data lifecycle. The most effective privacy tools and methods in the patron data lifecycle are the questions you ask at the beginning of the lifecycle:

  • Why are you collecting this data?
  • Does this reason tie to a demonstrated business need?
  • Are there other ways you can achieve the business need without collecting high-risk patron PII?

If you want to learn more about de-identification and privacy risks, check out the resources below:

[1] The article contains additional privacy and security concerns that we will not cover in this post, including technical, administrative, and ethical concerns.

Summer Homework – Understanding Your State’s Library Privacy Law

Welcome to this week’s Tip of the Hat!

Have you always dreamed of spending countless hours reading legal regulations and reviews? If so, you might be suited for legal life! Reading laws is probably not high on your list of things to do; nonetheless, it’s always good to know how to navigate the text of a legal regulation when you are researching what laws could apply to you or to the third parties that you do business with. Even though we’re not lawyers, knowing how to read legal regulation text enables people to have more productive conversations with legal staff.

Here are three questions that can help you start understanding a law or statute:

  1. Who is covered by this law?
    • Does your state library privacy law cover only for publicly-funded libraries, or does the scope include other types of libraries, no matter the funding source? Does it include third parties acting on behalf of the library?
  2. What types of information (and what uses of information) are covered?
    • What does the law mean when it says “patron data”? Are there any definitions or descriptions of specific data points covered by the law?
  3. What exactly is required or prohibited?
    • In particular, what exemptions are listed in the law?

You might not be able to answer all the questions depending on what law you choose to study. However, not being able to answer a question might be a topic of discussion with legal staff, particularly around the specifics of who is within the scope of the law. There’s also the question of preemption between different governmental levels of legal regulation (or even within the same level of government). Sometimes a lower government’s law is stricter than a higher government’s law, but if the higher government’s law states that their law preempts any laws from lower governments, then you are not bound to follow the lower government’s law in that specific matter.

Now it’s time to take what you learned and put it into practice. Find your state’s library privacy law and read the law while trying to answer the questions above. Let us know if these questions help you through the legal text! Don’t be afraid to let us know if this exercise brings up more questions than it answers – we’ll do our best in addressing them, or at least help you prepare in asking these questions to your legal staff.

[Legal questions source: Swire, Peter, and DeBrae Kennedy-Mayo. (2018). U.S. Private-Sector Privacy: Law and Practice for Information Privacy Professionals, 2nd ed.]