An Audacity Postmortem for the Library World

A black silhouette of a condenser microphone against a white background with a blue audio wave track spanning across the middle of the background.
Image source: https://www.flickr.com/photos/187045112@N03/50135316221/ (CC BY 2.0)

It seemed so long ago – last week at this time, LDH was logging back into the online world only to find yelling. Lots of yelling. Why were so many people yelling in our timeline? What did someone in the library world do this time to set people off?

It turns out that the source of the outrage wasn’t located in the library world but instead in the open source community. Users of the popular audio editor Audacity loudly objected to the recently updated privacy policy, claiming that the new language in the policy violates the existing license of the software and turns Audacity into spyware. Even after clarification about the new language from Audacity, several users took the current Audacity code to start their own Audacity-like software projects that wouldn’t be subject to the new policy language. This created its own issues – one project maintainer was attacked after a targeted harassment campaign after they objected to the offensive name of another project.

The Audacity debacle continues; nevertheless, are a couple of lessons that libraries can take away from this mess.

Privacy Notices and Your Patrons

We will start at the privacy notice. In the privacy world, a privacy notice differs from a privacy policy. The latter is an internal document, and the former being a document published to the public. In part, a privacy notice informs the public about your privacy policies/practices and what rights the public has regarding their privacy. The language changes to the privacy notice carry several possible points of failure, which we encountered with the Audacity example. A comment thread in the language clarification post identifies some of the significant issues with how Audacity went about the changes:

“I think what a lot of people are also taking issue with… is that these major, scary-sounding changes are popping up seemingly out of nowhere without any sense of community consultation. Right now, I think people feel caught off-guard yet again and are frustrated that the maintainers aren’t demonstrating that they care about what the broader community thinks of their decisions.”

What can libraries take away from this?

  • Write for your audience – Privacy notices are notoriously riddled with legal language that many in the general public are not equipped to navigate or interpret. Your privacy notice can’t skip the vetting process by your legal staff, but you can avoid confusion by using language that is appropriate for your audience. This includes limiting library and legal jargon or creating summaries or explanations for specific points in the notice to understand more detailed or longer sections of the notice. Twitter’s use of summaries and lists in their privacy notice is one example of writing to the audience. In addition, don’t forget to write the notice in the major languages of your audience. Everyone in your community deserves to know what’s going on with their privacy at the library.
  • Involve your audience – The earlier quote from an Audacity community member demonstrates what can happen when key stakeholders are left out of critical decision-making processes. How is your library working with patrons in the creation and review of the privacy notice? Asking patrons to review notices is one way to involve patrons, but involving patrons throughout the entire process of creating and reviewing a privacy notice can reveal hidden or overlooked privacy issues and considerations at the library.
  • Communicate to your audience – What do you do when you publish a change in the privacy notice? Your patrons should not be caught off guard with a significant change to the notice. Luckily, your library already has many of the tools needed to tell your patrons about important updates, from your library’s news blog or newsletter to in-library physical signage and flyers. Website alerts are also an option if used judiciously and designed well – a website popup, while tempting, can be easily clicked away without reading the popup text.

Open Source Software and Privacy Expectations

We’ll go ahead and get this out of the way – open source software is not inherently more private and secure than its proprietary counterparts. OSS can be private and secure, but not all OSS is designed with high privacy and security standards by default. One of the primary reasons why so many in the Audacity community were upset with the changes is their assumption that OSS would not engage in data collection and tracking. However, several other popular OSS projects engage in collecting some level of user data, such as collecting data for crash reporting. Having other major OSS players collect user data doesn’t automatically make this practice okay. Instead, the practice reminds those who make software decisions for their libraries that OSS projects should be subject to the same rigorous data privacy and security review as proprietary products.

A strength of OSS is the increased level of control users have over the data in the software – libraries who have the in-house skills and knowledge can modify OSS to increase the level of privacy and security of patron data in those systems. The library OSS community can provide privacy-preserving options for libraries. Other libraries have already shared their experiences adopting privacy-preserving OSS at the library, such as Matomo Analytics and Tor. Ultimately, libraries who want to protect patron privacy must choose any software that might touch patron data with care and with the same level of scrutiny regardless of software licensing status.

Stop Collecting Data About Your Patrons’ Gender Identity

A four-way stop sign in front of snow-covered tree branches.
Image source: https://www.flickr.com/photos/ben_grey/4383358421/ (CC BY-SA 2.0)

tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity.

Longer tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity for library workers to do their daily work.

Nuanced tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity 99% of the time, and in that 1% where the data is required, you’re probably doing more harm than good in your collection methods.

This post is brought to you by yet another conversation about including gender identity data in patron records. Libraries collected this data on their patrons for decades; it’s not uncommon to have a “gender” field in the patron record of many integrated library systems and patron-facing vendor services and applications. But why collect this data in the first place?

Two explanations that come up are that gender identity data can be used for marketing to patrons and for reading recommendations. However, these explanations do not account for the problem of relying on harmful gender stereotypes. Take the belief that boys are reluctant readers, for example. Joel A. Nichols wrote about his experience as a children’s librarian and how libraries do more harm than help in adopting this belief:

These efforts presume that some boys are not achieving well in school because teachers and librarians (who are mostly women) are offering them books that are not interesting to them (because they are boys). I find this premise illogical and impracticable, in particular because I am queer: the things that were supposed to interest boys did not necessarily interest me, and the things that were supposed to interest girls sometimes did. Additionally, after years of working in children’s departments, I found over and over again that lots of different things interested lots of different kids. In my experience, it was the parents that sometimes asked for “boy books” or “girl books.” The premise that boys need special “boy” topics shortchanges librarians and the children themselves, and can alienate kids who are queer or genderqueer.

This collection of patron data can be used to harm patrons in other ways, such as library staff misgendering and harassing patrons based on the patron’s gender identity. A recent example comes from the 2019 incident where library staff repeatedly misgendered a minor patron when she was with her parent to sign up for her library card. While the library decided to stop collecting gender identity data on library card applications as a result of the incident, the harm done cannot be remedied as easily as changing the application form.

The ALA Rainbow Round Table recommends that libraries do not collect gender identity data from patrons unless absolutely needed. Since the recommendation in 2015, several libraries evaluated their collection of gender identity data only to find that they were not using that data. Collecting data for “just in case” opens library patrons to additional harm if the library suffers a data breach. If there is no demonstrated business need for a data point, do not collect that data point.

In the rare case that your library absolutely must collect data about the gender identity of your patrons (such as a requirement to report on aggregated patron demographic data for a grant-funded project), care must be taken in collecting this data to mitigate additional harms through alienation and exclusion.  The Rainbow Round Table recommends the Williams Institute’s report “Best Practices for Asking Questions to Identify Transgender and Other Gender Minority Respondents on Population-Based Surveys” as a guide to collecting such data. The Williams Institute has also created a short guide to create survey questions around gender identity. Here are more resources that can guide respectful demographic data collection:

Again, the resources above are only for the rare case that your library absolutely must collect this data from your patrons. Libraries considering collecting gender identity data must review the rationale behind the collection. A patron should not be required to tell the library their gender identity to use the library’s collections and services. Even the act of collecting this data can harm and disenfranchise patrons.

tl;dr – Your library doesn’t need to collect data about your patrons’ gender identity.

That Little Driver’s License Card…

Welcome to this week’s Tip of the Hat!

A driver’s license card is the first document many people use to prove their identity, be it at work, or the bank, or the airport. The card has key information needed for organizations and institutions: name, date of birth, address, photo, and the illustrious driver’s license number. Driver’s license cards can be a convenient form of identification, but it can also be a convenient way for your patrons’ identities to be stolen if your library is not careful in its handling of the card’s information.

As part of the library card registration process, many public libraries require some form of identification with a current address to confirm the patron’s home address. These libraries almost always accept driver’s license cards as one form of identification. But what do libraries do with the information on the card? Some record the driver’s license number in the patron record, while others take a photocopy scan of the card (yes, this has happened!). Several libraries use specially programmed barcode scanners to automatically populate the fields in the patron record from the information provided from the driver’s license barcode.

Each method carries its level of risk to the library patron’s privacy. Storing driver’s license numbers in the patron record or other places can open the patron up to identify theft if the library’s systems or physical spaces are compromised. There are various ways to compromise a physical or electronic space. We are familiar with the story of a person breaking into the system to steal information, but sometimes it is a staff person who steals the information. We also can’t forget that a leak is as damaging as a breach – sometimes staff leave the patron record up on the screen at public service desks, or a report printout is left on a desk for anyone to see or take.

Overall, the best way to mitigate the risk of a breach or leak of driver’s license numbers is to not collect or store driver’s license numbers. In the collection stage of the patron data lifecycle, we decide what data to collect. The data you collect should be tied to a specific, demonstrated business need at the point of collection. If you are collecting driver’s license numbers as a way to verify patrons and addresses, what are the business needs for collecting and storing that number in the patron record? You can achieve the same business need by other means, including creating a process of validating the patron record information with the identification without recording additional personal information in the record. Another consideration is that while driver’s license cards are a convenient form of identification, the card might have a name that the patron no longer uses and might have other outdated or incorrect information, including address information if the state does not mail a new card when there is an address change. Finally, not all patrons have driver’s license cards, and your patron registration policies and procedures need to accommodate this reality.

Even if you don’t collect or store the driver’s license number, there are still ways in which the library might inadvertently collect more patron information than they need from the card. Scanning driver’s license barcodes to auto-populate patron registration forms and records can save time in data entry, but be aware that these barcodes carry much more information than what is presented on the card, including gender and even Social Security Numbers. The software that you use to scan the barcodes should only record the information needed for the patron form and not store the additional information in the barcode. Your software vendor should have information about how they treat this extra data; if they do not, then the vendor product is a potential security risk for the library and the patrons which needs to be addressed with the vendor.

No matter how your library handles driver’s license cards, your library should be actively reviewing privacy practices on a regular basis. In 2019, the Contra Costa County Library System decided to stop collecting driver’s license numbers and purged existing numbers from their patron records. This decision came just at the right moment – the library system suffered a ransomware attack at the beginning of 2020. While recent reports state that no personal data was compromised, the risk of identity theft to library patrons would have been much greater if the driver’s license numbers were still stored at the library. In short, it’s never too late to review policies and procedures around patron address verification at your library!

Shining a Light on Dark Data

[Author’s note – this posts uses the term “Dark Data” which is an outdated term. Learn more about the problem with the term’s use of “dark” at Intuit’s content design manual.]

Welcome to this week’s Tip of the Hat!

Also, welcome to the first week of Daylight Savings Time in most of the US! To celebrate the extra hour of daylight in the morning (we at LDH are early birds), we will shed light on a potential privacy risk at your organization – dark data.

The phrase “dark data” might conjure up images of undiscovered data lurking in the dark back corner of a system. It could also bring to mind a similar image of the deep web where the vast amount of data your organization has is hidden to the majority of your staff, with only a handful of staff having the skills and knowledge to find this data.

The actual meaning of the phrase is much less dramatic. Dark data refers to collected data that is not used for analysis or other organizational purposes. This data can appear in many places in an organization: log files, survey results, application databases, email, and so on. The business world views dark data as an untapped organizational asset that will eventually serve a purpose, but for now, it just takes up space in the organization.

While the reality of dark data is less exciting than the deep web, the potential privacy issues of dark data should be taken seriously. The harm isn’t that the organization doesn’t know what it’s collecting – dark data is not unknown data. One factor that leads to dark data in an organization is the “just in case” rationale used to justify data collection. For example, a project team might set up a new web application to collect patron demographic information such as birth date, gender, and race/ethnicity not because they need the data right now, but because that data might be needed for a potential report or analysis in the future. Not having the data when the need arises means that you could be out on important insights and measures that could sway decision-makers and the future of operations. It is that fear of not having that data, or data FOMO, that drives this collection of dark data.

When you have dark data that is also patron or other sensitive data, you put your organization and patrons at risk. Data sitting in servers, applications, files, and other places in your organization are subject to being leaked, breached, or otherwise subject to unauthorized access by others. This data is also subject to disclosure by judicial subpoenas or warrants. If you choose to collect dark data, you choose to collect a toxic asset that will only become more toxic over time, as the risk of a breach, leak, or disclosure increases. It’s a matter of when, not if, the dark data is compromised.

Dark data is a reality at many organizations in part because it’s very easy to collect without much thought. The strategies in minimizing the harms that come with dark data require some forethought and planning; however, once operationalized, these strategies can be effective in reducing the dark data footprint in your organization:

  • Tying data collection to demonstrated business needs – When you are deciding what data to collect, be it through a survey, a web application, or even your system logs, what data can be tied back to a demonstrated business need? Orienting your data collection decisions to what is needed now for operational purposes and analysis shifts the mindset away from “just in case” collection to what data is absolutely needed.
  • Data inventories – Sometimes dark data is collected and stored and falls off the radar of your organization. Conducting regular data inventories of your organization will help identify any potential dark data sets for review and action.
  • Retention and deletion policies – Even if dark data continues to persist after the above strategies, you have one more strategy to mitigate privacy risks. Retention policies and proper deletion and disposal of electronic and physical items can limit the amount of dark data sitting in your organization.

The best strategies to minimize dark data in your organization happens *before* you collect the data. Asking yourself why you need to collect this data in the first place and looking at the system or web application to see what data is collected by default will allow you to identify potential dark data and prevent its collection.