data lifecycle Archives - Page 2 of 2 - LDH Consulting Services

January 13, 2020July 29, 2020

Who Knows, Who Decides, and Who Decides Who Decides

Welcome to this week’s Tip of the Hat!

Shoshana Zuboff’s book The Age of Surveillance Capitalism provides a comprehensive overview of the commodification of personal information in the digital age. Surveillance capitalism is a specific form of capitalism that focuses on using personal data to predict and control user behavior. Zuboff’s analysis of surveillance capitalism centers around three questions:

Who knows?
Who decides?
Who decides who decides?

In the book, Zuboff provides some context to the questions:

The first question is “Who knows?” This is a question about the distribution of knowledge and whether one is included or excluded from the opportunity to learn. The second question is “Who decides?” This is a question about authority: which people, institutions, or processes determine who is included in learning, what they are able to learn, and how they are able to act on their knowledge. What is the legitimate basis of that authority? The third question is “Who decides who decides?” This is a question about power. What is the source of power that undergirds the authority to share or withhold knowledge?

Zuboff offers answers to these three questions in her book: “As things currently stand, it is the surveillance capitalist corporations that know. It is the market form that decides. It is the competitive struggle among surveillance capitalists that decides who decides.” While the current prognosis is grim according to Zuboff’s analysis, the three questions are a powerful tool in which one can discover the underlying power structures of a particular organization or culture.

An interesting thought exercise involves applying these three questions to the library. On a lower level, the data lifecycle provides some answers to “Who knows?” concerning access to patron data as well as the publication and disclosure of data in reports, data sets, and so on to third parties. The “Who decides?” question goes beyond the data lifecycle and ventures into the realm of data governance, where decisions as to who decides the data practices of the library are made. However, the answer goes beyond data governance. Library use of third-party tools and services in collecting or processing patron data bring these third parties into the realm of “Who knows?” as well as “Who decides?” The third-party can adjust their tools or products according to what best serves their bottom line, as well as providing a tool or product that they can market to libraries. Third parties decide what products to put out to the market, and libraries decide which products meet their needs. Both parties share authority, which leads this thought experiment closer to Zuboff’s analysis of the market as the decider.

That brings us to the third question, “Who decides who decides?” Again, our thought experiment starts to blend in with Zuboff’s answer to the same question. There is indeed a struggle between vendors competing in a niche market that has limited funds. We would be remiss, though, if we just left our analysis pointing to competition between third parties in the market. Part of what is driving the marketplace and the tools and services offered within are libraries themselves. Libraries are pressured to provide data for assessment and outcomes to those who directly influence budgets and resources. Libraries also see themselves as direct competitors to Google, Amazon, and other commercial companies that openly engage in surveillance capitalism. Instead of rejecting the methods used by these companies, libraries have to some extent adopted the practices of these perceived market competitors to keep patron using library services. A library on this path could find themselves upholding surveillance capitalism’s grasp in patrons’ lives.

Fitting this thought experiment into one newsletter does not give the questions the full attention they deserve, but this gives us a place to start thinking about how the library shares some of the same traits and qualities found in surveillance capitalism. Data from patron activities can provide valuable insight into patron behaviors, creating personalized library services where yet more data can be collected and analyzed for marketing purposes. It’s no surprise that data analytics and customer relationship management systems have taken off in the library market in recent years – libraries believe that there is a power that comes with these tools that otherwise wouldn’t be accessible through other means. Nonetheless, that belief is influenced by surveillance capitalists.

Decided for yourself – give Zuboff’s book a read (or listen for the audiobook) and use the three questions as a starting point for when you investigate your library’s role in the data economy.

November 4, 2019April 19, 2021

Shining a Light on Dark Data

[Author’s note – this posts uses the term “Dark Data” which is an outdated term. Learn more about the problem with the term’s use of “dark” at Intuit’s content design manual.]

Welcome to this week’s Tip of the Hat!

Also, welcome to the first week of Daylight Savings Time in most of the US! To celebrate the extra hour of daylight in the morning (we at LDH are early birds), we will shed light on a potential privacy risk at your organization – dark data.

The phrase “dark data” might conjure up images of undiscovered data lurking in the dark back corner of a system. It could also bring to mind a similar image of the deep web where the vast amount of data your organization has is hidden to the majority of your staff, with only a handful of staff having the skills and knowledge to find this data.

The actual meaning of the phrase is much less dramatic. Dark data refers to collected data that is not used for analysis or other organizational purposes. This data can appear in many places in an organization: log files, survey results, application databases, email, and so on. The business world views dark data as an untapped organizational asset that will eventually serve a purpose, but for now, it just takes up space in the organization.

While the reality of dark data is less exciting than the deep web, the potential privacy issues of dark data should be taken seriously. The harm isn’t that the organization doesn’t know what it’s collecting – dark data is not unknown data. One factor that leads to dark data in an organization is the “just in case” rationale used to justify data collection. For example, a project team might set up a new web application to collect patron demographic information such as birth date, gender, and race/ethnicity not because they need the data right now, but because that data might be needed for a potential report or analysis in the future. Not having the data when the need arises means that you could be out on important insights and measures that could sway decision-makers and the future of operations. It is that fear of not having that data, or data FOMO, that drives this collection of dark data.

When you have dark data that is also patron or other sensitive data, you put your organization and patrons at risk. Data sitting in servers, applications, files, and other places in your organization are subject to being leaked, breached, or otherwise subject to unauthorized access by others. This data is also subject to disclosure by judicial subpoenas or warrants. If you choose to collect dark data, you choose to collect a toxic asset that will only become more toxic over time, as the risk of a breach, leak, or disclosure increases. It’s a matter of when, not if, the dark data is compromised.

Dark data is a reality at many organizations in part because it’s very easy to collect without much thought. The strategies in minimizing the harms that come with dark data require some forethought and planning; however, once operationalized, these strategies can be effective in reducing the dark data footprint in your organization:

Tying data collection to demonstrated business needs – When you are deciding what data to collect, be it through a survey, a web application, or even your system logs, what data can be tied back to a demonstrated business need? Orienting your data collection decisions to what is needed now for operational purposes and analysis shifts the mindset away from “just in case” collection to what data is absolutely needed.
Data inventories – Sometimes dark data is collected and stored and falls off the radar of your organization. Conducting regular data inventories of your organization will help identify any potential dark data sets for review and action.
Retention and deletion policies – Even if dark data continues to persist after the above strategies, you have one more strategy to mitigate privacy risks. Retention policies and proper deletion and disposal of electronic and physical items can limit the amount of dark data sitting in your organization.

The best strategies to minimize dark data in your organization happens *before* you collect the data. Asking yourself why you need to collect this data in the first place and looking at the system or web application to see what data is collected by default will allow you to identify potential dark data and prevent its collection.

June 3, 2019July 29, 2020

You Say Security, I Say Privacy…

Welcome to this week’s Tip of the Hat!

You might have seen the words “security” and “privacy” used interchangeably in articles, blog posts, and other areas of discussion surrounding protecting sensitive data. Sometimes that interchange of words further complicates already complex matters. A recent article by Steve Touw explores the confusion surrounding encryption and redaction methods in the CCPA. Touw breaks down encryption and redaction to their basic components which shows that each method ultimately lives in two different worlds: encryption in the security world, and redaction in the realm of privacy.

But aren’t privacy and security essentially the same thing, which is the means of protecting an asset (in our case, data)? While both arguably have the same goal in protecting a particular asset, privacy and security are different in the way in which they approach risk assessment and evaluation. In the scope of information management:

Security pertains to actions that protect organizational assets, including both personal and non-personal data.

Privacy pertains to the handling, controlling, sharing, and disposal of personal data.

Security and privacy do share key concepts and concerns, including appropriate use, confidentiality, and access to organizational assets (including personal data). Nonetheless, implementing security practices doesn’t necessarily guarantee privacy; a quote that makes the rounds in privacy professional groups is “You can have security without privacy, but you cannot have privacy without security.”

An example of the above quote comes from when you log into a system or application. Let’s use staff access to the integrated library system for this example. A login allows you to control which staff can access the ILS. Assigning individual logins to staff members and ensuring that only those logins can access the staff functions in the ILS is a security measure. This security measure protects patron data from being inappropriately accessed by other patrons, or others looking for that data. On that point of using security to protect privacy, so far, so good.

Once we get past the login, though, we come to a potential privacy issue. You have staff logins, which prevent unauthorized access to patron data by the public, but what about unauthorized access to patron data by your own staff? Not every staff member needs to have access to patron data in order to perform their daily duties. By leaving staff logins to have free reign over what they can access in the ILS database, you are at risk of violating patron privacy even though you have security measures in place to limit system access to staff members. To mitigate this risk, another security measure can be used – assigning who can access what through role or group level access controls. Most ILSes have a basic level of role-based access controls where systems administrators can assign the lowest level of access needed for each role, and applying these roles consistently will limit the instances of unauthorized access to data by staff.

All the security measures in the world, nonetheless, will not mitigate the risk of privacy harm to your patrons if your ILS is collecting highly sensitive data in the first place! These security measures don’t prevent you from collecting this type of data. This is where privacy policies and determining what data needs to be collected to meet operational needs come into play. If you don’t collect the data, the data cannot be breached or leaked.

It’s clear from this example that both privacy and security have parts to play in protecting patron privacy. Understanding these parts – where they overlap, and where they diverge – will help you through building and maintaining a robust set of data privacy and security practices throughout your organization.