Cookies, Tracking, and You: Part 2

Welcome to this week’s Tip of the Hat! We covered the basics of web cookies in Part One, including tracking and what users can do to protect their online privacy and not be tracked by these not-so-delicious cookies. Part Two focuses on the site owners who use tracking products to serve up those cookies to their users.

A Necessary Evil (Cookie)?

Many site owners use web analytics products to assess the effectiveness of an online service or site. These products can measure not only site visits but also how visitors get to the site, including search terms in popular search engines. Other products can track and visualize the “flow” or “path” through the site: where users enter the site (landing page), how users navigate between pages on the site, and what page users end their site visit (exit page).

Web analytics products provide site metrics that can help assess the current site and determine the next steps in investigating potential site issues, such as developing usability testing for a particular area of the site where the visitor flow seems to drop off dramatically. Yet, products such as Google Analytics collect personal information by default, creating user profiles that are accessible to you, the company, and whoever the company decides to share the data. Libraries try to limit data collection in other systems such as the ILS to prevent such a setup, so it shouldn’t be any different for web analytics products.

Protecting Site User Privacy

There are a few ways libraries can protect patron privacy while collecting site data. Most libraries can do at least one or two of these strategies, while other strategies might require negotiation with vendors or external IT departments.

User consent and opt-out

Many sites nowadays have banners and popups notifying visitors that the site uses cookies. You can thank the EU for all of these popups, including the ePrivacy Directive (the initial law that prompted all those popups) and GDPR. US libraries such as Santa Cruz Public Library and Stanford University Libraries [1] have either adopted the popup or otherwise provided information to opt-out of being tracked while on their site. The major drawback to this approach, as one study points out, is that these popups and pages can be meaningless to users, or even confuse them. If you decide to go this route, user notification needs to be clear and concise and user consent needs to be explicit.

Use a product other than Google Analytics

Chances are, your server is already keeping track of site visits. Install AWStats and you’ll find site visit counts, IP addresses, dates and times of visits, search engine keyword results, and more just from your server logs.

(Which, BTW, do you know what logs are kept by your server and what data they are collecting?)

Several web analytics products provide site data without compromising user privacy by default. One of the more popular products is Matomo, formerly Piwik, which is used by several libraries. Cornell University Library wrote about their decision to move to Piwik and the installation process, and other libraries are already running Matomo or are starting to make the migration. You can find more information about privacy-focused analytics products in the Action Handbook from the National Forum of Web Privacy and Web Analytics. Many of these products allow you to control what data is being collected, as well as allow you to install and host the product on a local server.

If you must use Google Analytics

There are times where you can’t avoid GA. Your vendor or organization might use GA by default. You might not have the resources to use another analytics product. While this is not the optimal setup, there are a couple of ways to protect user privacy, including telling GA to anonymize IP addresses and turning off data sharing options. Again, you can find a list of recommended actions in the Action Handbook. You might also want to read Eric Hellman’s posts about GA and privacy in libraries, as well as how library catalogs leak searches to Amazon via cookies.

Protecting patron privacy while they use your library’s online services doesn’t necessarily mean prohibiting any data collection, or cookies for that matter. Controlling what data is collected by the web analytics product and giving your patrons meaningful information about your site’s cookie use are two ways in which you can protect patron privacy and still have data for assessing online services.

[1] Hat tip to Shana McDanold for the Stanford link!

Cookies, Tracking, and You: Part 1

Welcome to this week’s Tip of the Hat!

LDH would like to let our readers know that in the eternal feud between Team Cookie and Team Brownie, we are firmly on Team Brookie.
A pan of brookies cut into bars, with two bars missing. One bar sits on top of the other bars.
But that doesn’t mean we don’t appreciate a good cookie!
A plate of honey nut cookies.
Unfortunately, not all cookies are as tasty as the ones above, and some we actively want to avoid if we want to keep what we do online private. One such cookie is the web cookie.

Web Cookie 101

You probably encountered the terms browser cookie, HTTP cookie, and web cookie when you read articles about cookies and tracking, and they all refer to the same thing. A web cookie is data sent from a website and stored in the user browser, such as Edge, Chrome, or Firefox. Web cookies come in many different flavors including cookies that keep you signed into a website, remember your site preferences, and what you put in your shopping cart when you were doing some online shopping at 2 am. Some cookies only last until you close your browser (session cookies) and some will stick around after you close and reopen your browser (persistent cookies). A website can have cookies from the site owner (first-party cookies) and cookies from other sites (third-party cookies). Yep, you read that right – the site that you’re visiting may have other sites tracking you, even if you don’t visit those other sites.

However, you don’t need a third-party cookie for a site to track you. Chances are that you’ve been tracked when you are browsing the Web by web analytics products such as Google Analytics. What does that all entail, and how does it affect your privacy online?

Tracking Cookies and Privacy

Many web analytics products use cookies to collect data from site visitors. Google Analytics, for example, collects user IP addresses, user device information (such as browser and OS), network information, geolocation, if the user is a returning or new site visitor, and user behavior on the site itself. A site owner can build a user profile of your activity on their website based on this information alone, but Google Analytics doesn’t stop there. Google Analytics also generates demographic reports for site owners. Where do they get this demographic data from? Cookies, for the most part. This is a feature that site owners have to turn, but the option is there if the owner wants to build a more complete user profile.

(Let’s not think about how many libraries might have this feature turned on, lest you want to stress-eat a batch of cookies in one sitting.)

This is one example of how cookies can compromise user privacy. There are other examples out there, including social media sites and advertising companies using cookies to collect user information. Facebook is notorious for tracking users on other sites and even tracking users who do not have a Facebook account. If there’s a way to track and collect user data, there’s a web site that’s doing it.

Using Protection While Browsing The Web

Web users have several options in blocking tracking cookies. The following guides and resources can help you set up a more private online browsing experience:

You can also test out your current browser setup with Panopticlick from the EFF to find out if your browser tracker blocker settings are set up correctly.

Stay Tuned…

But why do users have to do all the work? Where do site owners come into protecting their users’ privacy? Next week, we’ll switch to the site owners’ side and talk about cookies: what can you do to collect data responsibly, regulations around web cookies, and resources and examples from the library world. For now, go get a real-world cookie while you wait!

Silent Librarian and Tracking Report Cards

Welcome to this week’s Tip of the Hat! We at LDH survived the full moon on the Friday the 13th, though our Executive Assistant failed to bring donuts into the office to ward off bad luck. Unfortunately, several universities need more than luck against a widespread cyberattack that has a connection to libraries.

This attack, called Cobalt Dickens or Silent Librarian, relies on phishing to gain access to university systems. The potential victims receive a spoofed email from the library stating that their library account is expired, followed by instructions to click on a link to reactivate the account by entering their account information on a spoofed library website. With this attack happening at the beginning of many universities’ semesters, incoming students and faculty might click through without giving a second thought to the email.

We are used to having banking and other commercial sites be the subject of spoofing by attackers to obtain user credentials. Nonetheless, Silent Librarian reminds us that libraries are not exempt from being spoofed. Silent Librarian is also a good prompt to review incident response policies and procedures surrounding patron data leaks or breaches with your staff. Periodic reviews will help ensure that policies and procedures reflect the changing threats and risks with the changing technology environment. Reviews can also be a good time to review incident response materials and training for library staff, as well as reviewing cybersecurity basics. If a patron calls into the library about an email regarding their expired account, a trained staff member has a better chance in preventing that patron falling for the phishing email which then better protects library systems from being accessed by attackers.

We move from phishing to tracking with the release of a new public tool to assess privacy on library websites. The library directory on Marshall Breeding’s Library Technology Guides site is a valuable resource, listing thousands of libraries in the world. Each listing has basic library information, including information about the types of systems used by the library, including specific products such as the integrated library system, digital repository, and discovery layer. Each listing now includes a Privacy and Security Report Card that grades the main library website on the following factors:

  • HTTPS use
  • Redirection to an encrypted version of the web page
  • Use of Google Analytics, including if the site is instructing GA to anonymize data from the site
  • Use of Google Tag Manager, DoubleClick, and other trackers from Google
  • Use of Facebook trackers
  • Use of other third-party services and trackers, such as Crazy Egg and NewRelic

You can check what your library’s card looks like by clicking on the Privacy and Security Report button on the individual library page listing. In addition to individual statistics, you can view the aggregated statistics at https://bit.ly/ltg-https-report. The majority of public library websites are HTTPS, which is good news! The number of public libraries using Google Analytics to collect non-anonymized data, however, is not so good news. If you are one of those libraries, here are a couple of resources to help you get started in addressing this potential privacy risk for your patrons: