That Little Driver’s License Card…

Welcome to this week’s Tip of the Hat!

A driver’s license card is the first document many people use to prove their identity, be it at work, or the bank, or the airport. The card has key information needed for organizations and institutions: name, date of birth, address, photo, and the illustrious driver’s license number. Driver’s license cards can be a convenient form of identification, but it can also be a convenient way for your patrons’ identities to be stolen if your library is not careful in its handling of the card’s information.

As part of the library card registration process, many public libraries require some form of identification with a current address to confirm the patron’s home address. These libraries almost always accept driver’s license cards as one form of identification. But what do libraries do with the information on the card? Some record the driver’s license number in the patron record, while others take a photocopy scan of the card (yes, this has happened!). Several libraries use specially programmed barcode scanners to automatically populate the fields in the patron record from the information provided from the driver’s license barcode.

Each method carries its level of risk to the library patron’s privacy. Storing driver’s license numbers in the patron record or other places can open the patron up to identify theft if the library’s systems or physical spaces are compromised. There are various ways to compromise a physical or electronic space. We are familiar with the story of a person breaking into the system to steal information, but sometimes it is a staff person who steals the information. We also can’t forget that a leak is as damaging as a breach – sometimes staff leave the patron record up on the screen at public service desks, or a report printout is left on a desk for anyone to see or take.

Overall, the best way to mitigate the risk of a breach or leak of driver’s license numbers is to not collect or store driver’s license numbers. In the collection stage of the patron data lifecycle, we decide what data to collect. The data you collect should be tied to a specific, demonstrated business need at the point of collection. If you are collecting driver’s license numbers as a way to verify patrons and addresses, what are the business needs for collecting and storing that number in the patron record? You can achieve the same business need by other means, including creating a process of validating the patron record information with the identification without recording additional personal information in the record. Another consideration is that while driver’s license cards are a convenient form of identification, the card might have a name that the patron no longer uses and might have other outdated or incorrect information, including address information if the state does not mail a new card when there is an address change. Finally, not all patrons have driver’s license cards, and your patron registration policies and procedures need to accommodate this reality.

Even if you don’t collect or store the driver’s license number, there are still ways in which the library might inadvertently collect more patron information than they need from the card. Scanning driver’s license barcodes to auto-populate patron registration forms and records can save time in data entry, but be aware that these barcodes carry much more information than what is presented on the card, including gender and even Social Security Numbers. The software that you use to scan the barcodes should only record the information needed for the patron form and not store the additional information in the barcode. Your software vendor should have information about how they treat this extra data; if they do not, then the vendor product is a potential security risk for the library and the patrons which needs to be addressed with the vendor.

No matter how your library handles driver’s license cards, your library should be actively reviewing privacy practices on a regular basis. In 2019, the Contra Costa County Library System decided to stop collecting driver’s license numbers and purged existing numbers from their patron records. This decision came just at the right moment – the library system suffered a ransomware attack at the beginning of 2020. While recent reports state that no personal data was compromised, the risk of identity theft to library patrons would have been much greater if the driver’s license numbers were still stored at the library. In short, it’s never too late to review policies and procedures around patron address verification at your library!

Shining a Light on Dark Data

Welcome to this week’s Tip of the Hat!

Also, welcome to the first week of Daylight Savings Time in most of the US! To celebrate the extra hour of daylight in the morning (we at LDH are early birds), we will shed light on a potential privacy risk at your organization – dark data.

The phrase “dark data” might conjure up images of undiscovered data lurking in the dark back corner of a system. It could also bring to mind a similar image of the deep web where the vast amount of data your organization has is hidden to the majority of your staff, with only a handful of staff having the skills and knowledge to find this data.

The actual meaning of the phrase is much less dramatic. Dark data refers to collected data that is not used for analysis or other organizational purposes. This data can appear in many places in an organization: log files, survey results, application databases, email, and so on. The business world views dark data as an untapped organizational asset that will eventually serve a purpose, but for now, it just takes up space in the organization.

While the reality of dark data is less exciting than the deep web, the potential privacy issues of dark data should be taken seriously. The harm isn’t that the organization doesn’t know what it’s collecting – dark data is not unknown data. One factor that leads to dark data in an organization is the “just in case” rationale used to justify data collection. For example, a project team might set up a new web application to collect patron demographic information such as birth date, gender, and race/ethnicity not because they need the data right now, but because that data might be needed for a potential report or analysis in the future. Not having the data when the need arises means that you could be out on important insights and measures that could sway decision-makers and the future of operations. It is that fear of not having that data, or data FOMO, that drives this collection of dark data.

When you have dark data that is also patron or other sensitive data, you put your organization and patrons at risk. Data sitting in servers, applications, files, and other places in your organization are subject to being leaked, breached, or otherwise subject to unauthorized access by others. This data is also subject to disclosure by judicial subpoenas or warrants. If you choose to collect dark data, you choose to collect a toxic asset that will only become more toxic over time, as the risk of a breach, leak, or disclosure increases. It’s a matter of when, not if, the dark data is compromised.

Dark data is a reality at many organizations in part because it’s very easy to collect without much thought. The strategies in minimizing the harms that come with dark data require some forethought and planning; however, once operationalized, these strategies can be effective in reducing the dark data footprint in your organization:

  • Tying data collection to demonstrated business needs – When you are deciding what data to collect, be it through a survey, a web application, or even your system logs, what data can be tied back to a demonstrated business need? Orienting your data collection decisions to what is needed now for operational purposes and analysis shifts the mindset away from “just in case” collection to what data is absolutely needed.
  • Data inventories – Sometimes dark data is collected and stored and falls off the radar of your organization. Conducting regular data inventories of your organization will help identify any potential dark data sets for review and action.
  • Retention and deletion policies – Even if dark data continues to persist after the above strategies, you have one more strategy to mitigate privacy risks. Retention policies and proper deletion and disposal of electronic and physical items can limit the amount of dark data sitting in your organization.

The best strategies to minimize dark data in your organization happens *before* you collect the data. Asking yourself why you need to collect this data in the first place and looking at the system or web application to see what data is collected by default will allow you to identify potential dark data and prevent its collection.