Last Minute Panic: A CCPA Update

Welcome to this week’s Tip of the Hat!

We hate to break it to you, but there are only a few weeks left in 2019. Do you know what that means? That’s right – only a few more weeks before the California Consumer Privacy Act comes into effect. A lot has happened since our first newsletter about the CCPA in March, so let’s take some time to catch everyone up on the need-to-knows about CCPA as we head into 2020.

Everything and nothing have changed

Lawmakers introduced almost 20 amendments in the past few months in the State Legislature, ranging from grammatical edits to substantial changes to the CCPA. In the end, only a handful of amendments were signed by the state governor, all of which do not substantially change the core of CCPA. There are now a few exceptions to CCPA with the amendments, such as employee data, but that’s the extent to the changes introduced into the Act going into 2020.

However, this doesn’t mean that we won’t see some of the stalled or dead amendments come back in the next legislative session. Expect additional amendments in the coming year, including new amendments that might affect regulation and scope of the Act.

What you need to know about regulation and enforcement

In October 2019, the California Attorney General office published a draft set of regulations of how their office will enforce CCPA. While the public comment period is open until December 6th, many businesses are taking the regulations as their new playbook in preparing for CCPA compliance.

“Household” dilemma

The problematic definition of “personal information” remains… problematic. The amendment that sought to remove “household” from the definition stalled in the State Legislature. The regulations address the handling of household information to a small extent. If someone requests access to personal information, including household information, the business has the option to give aggregated data if they cannot verify the identity of the requester.

Again, this broad definition has ramifications regarding patrons requesting information from library vendors. Libraries should work with library vendors in reviewing confidentiality and privacy policies and procedures and discuss the possible impact this definition will have on patron privacy.

Hello, COPPA!

One of the major elements of CCPA is the regulations surrounding collecting and processing personal information from anyone under 16 years of age. CCPA requires businesses to get affirmative authorization from anyone 13 years old up to 16 years old before the business can sell their personal information. To comply with the new requirement, many businesses might now have to collect or otherwise verify the age of the online user. This leads into the realm of the Children’s Online Privacy Protection Act (COPPA) – now that the business has actual knowledge of the online user’s age, more businesses could be subject to liability under COPPA.

This could lead to another tricky conversation for libraries – library vendors who fall under CCPA collecting additional patron data for compliance. Collecting and processing patron data is sometimes unavoidable due to operational needs, but it’s still worthwhile to ensure that the data is properly secured, processed, and deleted.

Do Not Track, for real this time

Do your browsers on your library public computers have “Do Not Track” turned on by default, or have other browser plugins that prevent tracking by third parties? If not, here’s another reason to do so – the regulations state that “If a business collects personal information from consumers online, the business shall treat user-enabled privacy controls, such as a browser plugin or privacy setting or other mechanism, that communicate or signal the consumer’s choice to opt-out of the sale of their personal information as a valid request…” So get installing those privacy plugins already!

Do we have to comply with CCPA?

It depends on who the “we” is in this question. As of now, most California libraries are most likely out of the scope of CCPA (though, as Joshua Metayer pointed out, the CCPA gives no guidance as to what is considered a for “profit” business). Library vendors will most likely have to comply if they do business in California. Some businesses are trying to keep CCPA compliance strictly to CA residents by setting up a separate site for California, while other businesses, such as Microsoft, plan to give all US residents the same rights CA residents have under CCPA.

We’ve only covered a section of what’s all going on with CCPA – there’s still a lively debate as to what is all entailed by the definition of “sale” in regards to personal information which is a newsletter in itself! We also could have an entire newsletter on CCPA 2.0, which is slated to be on the November 2020 ballot. California continues to be a forerunner in privacy law in the US, and the next year will prove to be an important one not only for everyone under the scope of CCPA but for other states looking to implement their CCPA-like state law.

Beyond Web Cookies: The Ways Google Tracks Your Users

Welcome to this week’s Tip of the Hat!

Earlier we discussed the basics of web cookies, including the cookies used in tracking applications such as Google Analytics. However, there are many ways Google can track your online behavior even when you block Google Analytics cookies and avoid using Google Chrome. Because Google provides applications and infrastructure for many web developers to use on their sites, it’s extremely hard to avoid Google when you are browsing the Web.

An example of this is Google Fonts. The LDH website uses a font provided by the service. To use the font, the following code is inserted into the web page HTML code:

link href=”https://fonts.googleapis.com/css?family=Open+Sans&display=swap” rel=”stylesheet”

For those who are not familiar with HTML code, the above line is instructing the web page to pull in the font style from the external fonts.googleapis.com site. The FAQ question about user privacy describes the data exchanged between our site and the Google Font API service. The exact data mentioned in the FAQ is limited to the number of requests for the specific font family and the font file itself. On the surface, the answer seems reasonable, though there is always the possibility of omission of detail in the answer.

This isn’t to say that other Google services provide the same type of assurance, though. In Vanderbilt University Professor Douglas C. Schmidt’s research study about how Google tracks users, many other Google services that collect data that can be tied back to individuals. Schmidt’s study leans heavily toward tracking through mobile devices, but the study does cover how users can be tracked even through the exclusive use of non-Google products thanks to the pervasiveness of third-party tracking and services that feed data back to Google.

We covered some ways that you can avoid being tracked by Google as a web user in our earlier newsletter, including browser add-ons that block cookies and other trackers. Some of the same add-ons and browsers block other ways that Google tracks web users. Still, there is the same question that we brought up in the earlier newsletters – what can web developers and web site owners do to protect the privacy of their users?

First, take an audit of the Google products and API services you’re currently using in your web sites and applications. The audit is easy when you’re using widgets or integrate Google products such as Calendar and Docs into your site or application. Nonetheless, several Google services can fly under the radar if you don’t know where to look. You can make quick work out of trying to find these services by using a browser plugin such as NoScript or Privacy Badger to find any of the domain URLs listed under the Cookies section in Google’s Privacy and Terms site. Any of the domains listed there have the potential to collect user data.

Next, determine the collection and processing of user data. If you are integrating Google Products into your application or website, examine the privacy and security policies on the Google Product Privacy Guide. APIs are another matter. Some services are good in documenting what they do with user data – for example, Google Fonts has documentation that states that they do not collect personal data. Other times, Google doesn’t explicitly state what they are collecting or processing for some of its API services. Your best bet is to start at the Google APIs Terms of Service page if you cannot find a separate policy or terms of service page for a specific API service. There are two sections, in particular, to pay attention to:

  • In Section 3: Your API Clients, Google states that they may monitor API use for quality, improvement of services, and verify that you are compliant within the terms of use.
  • In Section 5: Content, use of the API grants Google the “perpetual, irrevocable, worldwide, sublicensable, royalty-free, and non-exclusive license to Use content submitted, posted, or displayed to or from the APIs”. While not exclusively a privacy concern, it is worth knowing if you are passing personal information through the API.

All of that sounds like using any Google service means that user tracking is going to happen no matter what you do. For the most part, that is a possibility. You can find alternatives to Google Products such as Calendar and Maps, but what about APIs and other services? Some of the APIs hosted by Google can be hosted on your server. Take a look at the Hosted Libraries page. Is your site or application using any libraries on the list? You can install those libraries on your server from the various home sites listed on the page. Your site or application might be a smidge slower, but that slight slowness is worth it when protecting user privacy.

Thank you to subscriber Bobbi Fox for the topic suggestion!

Shining a Light on Dark Data

[Author’s note – this posts uses the term “Dark Data” which is an outdated term. Learn more about the problem with the term’s use of “dark” at Intuit’s content design manual.]

Welcome to this week’s Tip of the Hat!

Also, welcome to the first week of Daylight Savings Time in most of the US! To celebrate the extra hour of daylight in the morning (we at LDH are early birds), we will shed light on a potential privacy risk at your organization – dark data.

The phrase “dark data” might conjure up images of undiscovered data lurking in the dark back corner of a system. It could also bring to mind a similar image of the deep web where the vast amount of data your organization has is hidden to the majority of your staff, with only a handful of staff having the skills and knowledge to find this data.

The actual meaning of the phrase is much less dramatic. Dark data refers to collected data that is not used for analysis or other organizational purposes. This data can appear in many places in an organization: log files, survey results, application databases, email, and so on. The business world views dark data as an untapped organizational asset that will eventually serve a purpose, but for now, it just takes up space in the organization.

While the reality of dark data is less exciting than the deep web, the potential privacy issues of dark data should be taken seriously. The harm isn’t that the organization doesn’t know what it’s collecting – dark data is not unknown data. One factor that leads to dark data in an organization is the “just in case” rationale used to justify data collection. For example, a project team might set up a new web application to collect patron demographic information such as birth date, gender, and race/ethnicity not because they need the data right now, but because that data might be needed for a potential report or analysis in the future. Not having the data when the need arises means that you could be out on important insights and measures that could sway decision-makers and the future of operations. It is that fear of not having that data, or data FOMO, that drives this collection of dark data.

When you have dark data that is also patron or other sensitive data, you put your organization and patrons at risk. Data sitting in servers, applications, files, and other places in your organization are subject to being leaked, breached, or otherwise subject to unauthorized access by others. This data is also subject to disclosure by judicial subpoenas or warrants. If you choose to collect dark data, you choose to collect a toxic asset that will only become more toxic over time, as the risk of a breach, leak, or disclosure increases. It’s a matter of when, not if, the dark data is compromised.

Dark data is a reality at many organizations in part because it’s very easy to collect without much thought. The strategies in minimizing the harms that come with dark data require some forethought and planning; however, once operationalized, these strategies can be effective in reducing the dark data footprint in your organization:

  • Tying data collection to demonstrated business needs – When you are deciding what data to collect, be it through a survey, a web application, or even your system logs, what data can be tied back to a demonstrated business need? Orienting your data collection decisions to what is needed now for operational purposes and analysis shifts the mindset away from “just in case” collection to what data is absolutely needed.
  • Data inventories – Sometimes dark data is collected and stored and falls off the radar of your organization. Conducting regular data inventories of your organization will help identify any potential dark data sets for review and action.
  • Retention and deletion policies – Even if dark data continues to persist after the above strategies, you have one more strategy to mitigate privacy risks. Retention policies and proper deletion and disposal of electronic and physical items can limit the amount of dark data sitting in your organization.

The best strategies to minimize dark data in your organization happens *before* you collect the data. Asking yourself why you need to collect this data in the first place and looking at the system or web application to see what data is collected by default will allow you to identify potential dark data and prevent its collection.