Published online by Cambridge University Press: 12 June 2014
It has become increasingly common for a reader to follow a URL cited in a court opinion or a law review article, only to be met with an error message because the resource has been moved from its original online address. This form of reference rot, commonly referred to as ‘linkrot’, has arisen from the disconnect between the transience of online materials and the permanence of legal citation, and will only become more prevalent as scholarly materials move online. The present paper*, written by Jonathan Zittrain, Kendra Albert and Lawrence Lessig, explores the pervasiveness of linkrot in academic and legal citations, finding that more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information. In light of these results, a solution is proposed for authors and editors of new scholarship that involves libraries undertaking the distributed, long-term preservation of link contents.
1 For example, The Bluebook style guide for legal citation says: “The Bluebook requires the use and citation of traditional printed sources when available, unless there is a digital copy of the source available that is authenticated . . . .” The Bluebook: A Uniform System of Citation R. 18.2, at 165 (Columbia Law Review Ass'n et al. eds., 19th ed. 2010).
2 The Hiberlink and Memento project team at Los Alamos National Lab helpfully distinguishes between the two phenomena—a useful distinction that we import. See Robert Sanderson, Mark Phillips, & Herbert Van de Sompel, Analyzing the Persistence of Referenced Web Resources with Memento, Arxiv (May 17, 2011, 7:21 PM), http://arxiv.org/abs/1105.3459, archived at http://perma.cc/0ee5QbGfp5F.
3 E.g., Davis, Helane E., Keeping Validity in Cite: Web Resources Cited in Select Washington Law Reviews, 2001–03, 98 law libr. j. 639 (2006)Google Scholar; Liebler, Raizel & Liebert, June, Something Rotten in the State of Legal Citation: The Life Span of a United States Supreme Court Citation Containing an Internet Link (1996–2010), 15 Yale J.L. & Tech. 273 (2013)Google Scholar; Rumsey, Mary, Runaway Train: Problems of Permanence, Accessibility, and Stability in the Use of Web Sources in Law Review Citations, 94 Law Libr. J. 27 (2002)Google Scholar; Koehler, Wallace, A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence, 9 Information Research, (Jan. 2004)Google Scholar, http://informationr.net/ir/9-2/paper174.html, archived at http://perma.cc/8767-F7NG; Markwell, John & Brooks, David W., “Link Rot” Limits the Usefulness of Web-based Educational Materials in Biochemistry and Molecular Biology, 31 Biochemistry & Molecular Biology Educ. 69 (2003)CrossRefGoogle Scholar, available at http://onlinelibrary.wiley.com/doi/10.1002/bmb.2003.494031010165/full, archived at http://perma.cc/N969-86A4.
4 WebCite, http://www.webcitation.org, archived at http://perma.cc/0p7xfMNg8Kf.
5 Rumsey, supra note 3, at 32, 34–35.
6 Id. at 35. Rumsey defines working links as links that take a viewer to the document or take a viewer to a list where the document appears. Id. at 31.
7 Koehler, supra note 3; Markwell & Brooks, supra note 3, at 70–71.
8 “Link Rot” and Legal Resources on the Web: A 2013 Analysis, Chesapeake Digital Preservation Group (2013), http://cdm16064.contentdm.oclc.org/ui/custom/default/collection/default/resources/custompages/reportsandpublications/2013LinkRotReport.pdf (last visited Jan. 15, 2014); Liebler & Liebert, supra note 3, at 297–99.
9 Overview, Chesapeake Digital Preservation Group, http://cdm16064.contentdm.oclc.org/cdm/about#overview (last visited Jan. 15, 2014), archived at http://perma.cc/0L5yFmvwjaS; see also Rhodes, Sarah, Breaking Down Link Rot: The Chesapeake Project Legal Information Archive's Examination of URL Stability, 102 Law Libr. J. 581 (2010).Google Scholar
10 Rhodes, supra note 9, at 582.
11 Id.
12 Id.
13 “Link Rot” and Legal Resources on the Web: A 2013 Analysis, supra note 8.
14 Id.
15 All Collections, Chesapeake Digital Preservation Group, http://cdm16064.contentdm.oclc.org/cdm/search/collection (last visited Jan. 15, 2014), archived at http://perma.cc/0SvYRpDG26n.
16 See “Link Rot” and Legal Resources on the Web: A 2013 Analysis, supra note 8.
17 Liebler & Liebert, supra note 3, at 298.
18 One less important additional factor is that our work was limited to resources available on the open Internet, whereas the Liebler and Liebert work was interested in citation more generally.
19 Liebler & Liebert, supra note 3, at 294.
20 See the article submission policies of each of the journals: Submissions, Harv. I. Rev., http://www.harvardlawreview.org/submissions.php (last visited Jan. 15, 2014), archived at http://perma.cc/42FG-NGWE; Submissions, Harv. Hum. Rts. J., http://harvardhrj.com/about/submissions (last visited Jan. 15, 2014), archived at http://perma.cc/8EAA-U5UH; Submissions, Harv. J.L. & Tech., http://jolt.law.harvard.edu/submissions (last visited Jan. 15, 2014), archived at http://perma.cc/JVM5-WCMD.
21 See, e.g., The Bluebook: A Uniform System of Citation R. 16, at 146 (Columbia Law Review Ass'n et al. eds., 19th ed. 2010).
22 At the time that we pulled data, the HLR did not include URLs for sources that were accessible in print, like New York Times articles. JOLT uses parallel citations to print available sources, as does HRJ.
23 Roy T. Fielding et al., Hypertext Transfer Protocol —HTTP/1.1, RFC2616, World Wide Web Consortium, http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html (last visited Jan. 15, 2014), archived at http://perma.cc/QP8S-8HJN.
24 The term “soft 404” was explained extensively in an earlier paper on web decay. See Ziv Bar-Yossef, et al., Sic Transit Gloria Telae: Towards an Understanding of the Web's Decay, Proc. 13th Int'l Conf. on World Wide Web 329 (2004).
25 See Appendix 1 for a list of HTTP status code meanings. “OPEN,” which is not an HTTP status code, means the server did not return anything.
26 Articles, Harv. J.L. & Tech., http://jolt.law.harvard.edu/articles (last visited Jan. 15, 2014), archived at http://perma.cc/D73W-9AWB.
27 About, Harv. l. Rev., http://www.harvardlawreview.org/about.php (last visited Jan. 15, 2014), archived at http://perma.cc/8MCP-F6PX.
28 About, Harv. Hum. Rts. J., http://harvardhrj.com/about (last visited Jan. 15, 2014), archived at http://perma.cc/0QMWnM4Lhxs.
29 Court Listener, https://www.courtlistener.com (last visited Nov. 24, 2013), archived at http://perma.cc/0FXzJ8DpvKs.
30 518 U.S. 727 (1996).
31 See, e.g., Frank McCown, Catherine C. Marshall & Michael L. Nelson, Why Web Sites Are Lost (and How They're Sometimes Found), Comm. Acm, Sept. 2009, at 141.
32 E.g. Prosecutor v. Rajic, Indictment (Int'l Crim. Trib. For the Former Yugoslavia Aug. 23, 1995), https://web.archive.org/web/20070528065139/http://www.un.org/icty/indictment/english/raj-ii950829e.htm (last visited Jan. 15, 2014).
33 E.g. United Nations International Criminal Tribunal for the former Yugoslavia, http://www.un.org/icty/indictment/english/raj-ii950829e.htm (last visited Jan. 15, 2014).
34 For a list of the major print primary sources for the Nuremberg Trials, see Nuremberg Trials Resources, Harv. l. School Libr. Nuremberg Trials Project, http://nuremberg.law.harvard.edu/php/docs_swi.php?DI=1&text=bibliogr (last updated Feb. 2003), archived at http://perma.cc/ZKD7-DYCC.
35 When readers visit the link, they find a page that says “Aren't you glad you didn't cite to this webpage in the Supreme Court Reporter at Brown v. Entertainment Merchants Association, 131 S.Ct. 2729, 2749 n.14 (2011). If you had, like Justice Alito did, the original content would long since have disappeared and someone else might have come along and purchased the domain in order to make a comment about the transience of linked information in the internet age.” 404 Error—File Not Found, http://ssnat.com/, archived at http://perma.cc/0gwuqRxEJJW.
36 Scott Althaus & Kalev Leetaru, Airbrushing History, American Style, Cline Center for Democracy (Nov. 25, 2008), http://www.clinecenter.illinois.edu/airbrushing_history, archived at http://perma.cc/G8PW-798L.
37 129 S. Ct. 1800, 1836 (2009) (Breyer, J., dissenting).
38 See, e.g., Lee F. Peoples, The Citation of Blogs in Judicial Opinions, 13 Tul. J. Tech. & Intell. Prop. 39, 73.
39 Of course, conscientious website owners can take steps to prevent it. For example, when moving to a new URL scheme or website organization, owners can keep old links with archived previous versions of pages, or make the redirection process transparent. Realizing that government-published materials may be widely cited, governments creating new URL schemes should be especially careful to preserve the accessibility of older materials.
40 See Benjamin J. Keele, What if Law Journal Citations Included Digital Object Identifiers?, (Mar. 18, 2010) (unpublished manuscript) available at http://dx.doi.org/10.2139/ssrn.1577074; Susan Lyons, Persistent Identification of Electronic Documents and the Future of Footnotes, 97 Law Libr. J. 681 (2005).
41 This distinguishes The Bluebook and legal citation from many of the other citation styles in other fields, which allow DOIs. In fact, the APA style requires the use of DOIs if available. See Publication Manual of the American Psychological Association (6th ed. 2010); The Chicago Manual of Style § 14.6 (16th ed. 2010).
42 The Wayback Machine: FAQ, Internet Archive, http://archive.org/about/faqs.php#The_Wayback_Machine (last visited Jan. 15, 2014), archived at http://perma.cc/0V2j3ibrkrG (“Why isn't the site I'm looking for in the archive?: Some sites may not be included because the automated crawlers were unaware of their existence at the time of the crawl. It's also possible that some sites were not archived because they were password protected, blocked by robots.txt, or otherwise inaccessible to our automated systems. Siteowners might have also requested that their sites be excluded from the Wayback Machine. When this has occurred, you will see a ‘blocked error’ message. When a site is excluded because of robots.txt you will see a ‘robots.txt query exclusion error’ message.”).
43 See Adding Time to the Web, Memento, http://mementoweb.org/ (last visited Jan. 15, 2014), archived at http://perma.cc/09Z5S1xWjLH; see also H. Van de Sompel, HTTP Framework for Time-Based Access to Resource States, Memento (Dec. 2013), http://www.mementoweb.org/guide/rfc/ID/, archived at http://perma.cc/0XcKmZfbQat.
44 See Herbert Van de Sompel, Martin Klein, Robert Sanderson & Michael Nelson, Thoughts on Referencing, Linking, Reference Rot, Memento, http://mementoweb.org/missing-link/ (last visited Jan. 15, 2014), archived at http://perma.cc/DUB4-VNYM.
45 See Archive-It—Learn More, Internet Archive, https://archive-it.org/learn-more/ (last visited Jan. 15, 2014), archived at http://perma.cc/W3T9-ZSH3.
46 WebCite Consortium FAQ, WebCite, http://www.webcitation.org/faq (last visited Jan. 15, 2014), archived at http://perma.cc/0jRLzTskc8o.
47 See WebCite, http://www.webcitation.org/ (last visited Jan. 15, 2014), archived at http://perma.cc/0p7xfMNg8Kf.
48 Archive.is, http://archive.is/ (last visited Jan. 15, 2014), archived at http://perma.cc/0yezTLau6VK.
49 See the Archive.is frequently asked questions page, which states, in part, “[Archive.is] is privately funded; there are no complex finances behind it. It may look more or less reliable compared to startup-style funding or a university project, depending on which risks are taken into account. My death can cause interruption of service, but something like new market conditions or changing head of a department cannot.” FAQ, Archive.is, http://archive.is/faq.html (last visited Jan. 15, 2014), archived at http://perma.cc/0A72qhQbNAE.
50 This process will permit sites archived by Perma to take down allegedly copyright-infringing or defamatory material while allowing librarians to provide it to potential readers with due care.
51 See Memento, supra note 43; Chrome Web Store–Memento Time Travel, https://chrome.google.com/webstore/detail/memento/jgbfpjledahoajcppakbgilmojkaghgm (last visited Jan. 21, 2014), archived at http://perma.cc/P6GP-GJZQ (describing and linking to the Memento for Chrome extension that allows for page retrieval); Hvdsomp, Memento Extension for Chrome: A Preview (Sept. 9, 2013), http://www.youtube.com/watch?v=WtZHKeFwjzk (demonstrating the use of the Memento for Chrome extension).
52 Excerpted from Fielding et al., supra note 23.
53 J. Franks et al., HTTP Authentication: Basic and Digest Access Authentication, Internet Engineering Task Force (June 1999), http://tools.ietf.org/pdf/rfc2617.pdf, archived at http://perma.cc/5TMQ-64KF.