Cross-posted from the Center for the Protection of Intellectual Property (CPIP) Blog.
After my last post discussing the necessity for notice-and-staydown to help copyright owners with the never-ending game of whack-a-mole under the DMCA, I was asked to clarify how this would work for Google Search in particular. The purpose of my post was to express the need for something better and the hope that fingerprinting technologies offer. But, admittedly, I did not do a good job of separating out how notice-and-staydown would work for hosting platforms as compared to search engines. I think the whack-a-mole problem with hosting sites is indeed different than with search engines, and while fingerprinting may work well for the former, it’s probably ill-suited for the latter.
It’s clear enough how fingerprinting technologies can be applied to hosting platforms, and the simple fact is that they are already being deployed. YouTube uses its own proprietary technology, Content ID, while other platforms, such as Facebook and SoundCloud, use Audible Magic. These technologies create digital fingerprints of content that are then compared to user-uploaded content. When there’s a match, the copyright owner can choose to either allow, track, mute, monetize, or block the uploaded content.
There isn’t a lot of publicly-available information about how accurate these fingerprinting technologies are or how widely copyright owners utilize them. We do know from Google’s Katherine Oyama, who testified to Congress in early 2014, that “more than 4,000 partners” used Content ID at the time and that copyright owners had “‘claimed’ more than 200 million videos on YouTube” with the technology. She also acknowledged that “Content ID is not perfect, sometimes mistakenly ascribing ownership to the wrong content and sometimes failing to detect a match in a video.” Despite these imperfections, the scale of which she didn’t spell out, YouTube continues to offer Content ID to copyright owners.
Oyama also indicated that Content ID does not “work for a service provider that offers information location tools (like search engines and social networks) but does not possess copies of all the audio and video files that it links to.” This scenario is clearly different. When a site hosts content uploaded by its users, it can easily match those uploads to the content it’s already fingerprinted. When a site links to content that’s hosted elsewhere, it may not be possible to analyze that content in the same way. For example, the linked-to site could simply prevent automated crawling. Of course, not all sites block such crawling, but more would probably start doing so if fingerprinting were used in this way.
For Google Search, notice-and-staydown would likely not depend upon fingerprinting technology. Instead, the changes would come from: (1) delisting rogue sites, (2) promoting legitimate content, (3) improving auto-complete, and (4) ceasing to link to the very links that have already been taken down. These suggestions are not anything new, but it’s clear that Google has not done all it can to make them effective. This is not to say that improvements haven’t been made, and Google is to be commended for the work that it’s done so far. But it can and should do more.
Sticking with the example of The Hateful Eight from my prior post, it’s easy to see how Google Search promotes piracy. Using a fresh installation of Chrome so as not to skew the results, I need only type “watch hat” into Google Search before its auto-complete first suggests I search for “watch hateful 8 online.” After following this suggestion, the first seven results are links to obviously-infringing copies of the film. The first link points to the film at the watchmovie.ms site. A quick glance at that site’s homepage shows that it offers numerous (if not only) films that are still in theaters, including Spectre, Star Wars: The Force Awakens, Creed, and The Hateful Eight. In other words, Google’s first search result from its first suggested search term points me to an illicit copy of the film on a site that’s obviously dedicated to infringement.
I’ve never heard of watchmovie.ms, so I checked its WHOIS data. The site was registered on October 14th of last year, and Google’s Transparency Report indicates that it started receiving takedown notices for it just a few days later. To date, Google has received 568 requests to remove 724 infringing links to watchmovie.ms, but its search engine dutifully continues to crawl and index “about 39,000 results” at the site. And, for reasons I simply cannot fathom, Google prefers to send me to that pirate site rather than point me to Google Play (or to any number of other sites) where I can pre-order the film and “watch hateful 8 online” legally.
Making matters worse, at the bottom of the first page of search results for “watch hateful 8 online,” Google links to four DMCA takedown notices it received from copyright owners that resulted in five links being removed from the first page of results. These four notices, in turn, contain a combined total of 499 illicit links to The Hateful Eight that Google has already taken down. This truly boggles the mind. Google takes down five infringing links from one page of search results, consistent with the DMCA, but then it links to those five links along with 494 more such links. And these linked-to links are even better for infringers, since they’ve been vetted by Google as being worthy of taking down.
As the producer of The Hateful Eight, Richard Gladstein, relayed to The Hollywood Reporter, Google told him that it is “not in a position to decide what is legal and what is illegal online.” This is a cop out. In other venues, Google contends that it’s doing a lot to fight piracy. It submitted comments to the U.S. Intellectual Property Enforcement Coordinator this past October explaining how “changes made to [its] algorithm have been highly effective in demoting sites receiving a high number of takedown notices.” This shows that it is indeed in a position to determine what is “illegal online” and to take action against pirate sites. But simply demoting these sites is not enough—they should be delisted altogether.
Everyone knows that The Pirate Bay is a pirate site, yet Google indexes “about 914,000 results” from just one of its domains. As of today, Google has received 187,540 requests to remove 3,628,242 links to that domain, yet Google doesn’t choose to delist the site from its results. Nor does it even appear to be demoting it. The top three search results for “thepiratebay hateful 8” are links to infringing copies of the film on The Pirate Bay. It’s clear that these links are infringing, yet Google makes copyright owners continue playing whack-a-mole for even the most obvious infringements.
This isn’t how the DMCA is supposed to work. Congress even envisioned this whack-a-mole scenario with search engines when it wrote the DMCA. The legislative history provides: “If, however, an Internet site is obviously pirate, then seeing it may be all that is needed for the service provider [i.e., search engine or other information location tool] to encounter a ‘red flag.’ A provider proceeding in the face of such a ‘red flag’ must do so without the benefit of a safe harbor.” The Pirate Bay is “obviously pirate,” and Google knows as much even without the 3.6 million takedown notices it’s received. It knows the same thing about lots of pirate sites, including the other domain names contained in its list of greatest hits.
Google could be doing more to help copyright owners with the whack-a-mole problem, but so far, it’s only taken a few baby steps. And when defending its refusal to delist obvious pirate sites, Google contends that it’s defending freedom of speech:
[W]hole-site removal sends the wrong message to other countries by favoring over-inclusive private censorship over the rule of law. If the U.S. embraces such an overbroad approach to address domestic law violations (e.g., copyright), it will embolden other countries to seek similar whole-site removal remedies for violations of their laws (e.g., insults to the king, dissident political speech). This would jeopardize free speech principles, emerging services, and the free flow of information online globally and in contexts far removed from copyright.
Delisting The Pirate Bay from Google Search isn’t about favoring “censorship over the rule of law.” It’s about Google favoring the rule of law over blatant criminal infringement and doing its part to be a good citizen in the digital economy where it plays no small role. The comparison of the conduct of criminal infringers to the speech of political dissidents rings hollow, and delisting the most obvious and egregious sites does not threaten free speech. Google already claims to demote pirate sites, yet that doesn’t “jeopardize free speech principles.” Neither will delisting them. And as long as Google consciously decides to index known pirate sites with its search engine, the whack-a-mole problem will only continue unabated for copyright owners.