Exclusionary rules? The politics of protocols

doi:10.4324/9780203962541-34

ABSTRACT

The international network of computers has posed a series of challenges to legislators worldwide.1 Attempts to regulate, censor, and otherwise police the internet face complex decentered and distributed architectures that often present multiple opportunities for unregulated communication and networking. The internet is, however, not without its historical and technological forms of governance, and since its days as a wing of the U.S. military (ARPANET) it continues to be deﬁned and reﬁned. This chapter is thus concerned with the set of technological rules, standards, and protocols that provide for common functions and software platforms on the internet. This digital commons is distinct, however, in its

particular form of techno-governmentality. In part because of the rapid development and deployment of the internet-historically speaking-the network’s standards continue to be overseen by a set of engineers and computer scientists who ﬁrst initiated its common protocols, namely TCP/IP.2 What results is a complex mix of self-regulatory ethics deﬁned by university researchers, research and development (R&D) departments from the new media sector, and public sector policymakers-many of whom routinely move in and out of these three spheres. While much has been written about

new regulatory bodies charged with overseeing the global governance of the internet (Kahin and Keller, 1997; Mueller,

2002), in addition to the controversies that such bodies have of late been adjudicating (internet addresses, standards, regulations, and protocols) (Pare, 2003; Galloway, 2004), studies of the internet’s distinct technological forms of governmentality (in and through code) remain underdeveloped. To limit our understanding of internet governance to such institutions as the Internet Society, the Internet Engineering Task Force, or the World Wide Web Consortium (W3), would signiﬁcantly downplay the synergistic forces that have come to produce other internet conventions that similarly attempt to regulate practices of internet connectivity and networking. This chapter focuses on one such convention, robots.txt exclusion commands, to outline the contours of internet governmentality on the peripheries of the regulatory bodies. Exclusion commands oﬀer both famil-

iar yet unique perspectives on debates over internet governance and politics. To start, the commands are meant to exclude web content from internet search engines, a practice that raises questions about security, censorship, and the representativeness of search engine databases-all issues that have been dealt with at length by the aforementioned bodies. Robots.txt commands were also, at one time, subject to review by the Internet Society, though to date the convention has not been adopted as a formal protocol by the Engineering Taskforce. The point being of course that the exclusion commands serve a governmental role without having been formally recognized as such through the internet’s governmental bodies. This chapter begins with a historical

and technical overview of robots.txt commands, making note of its relationship to industry insiders/engineers, the protocol governance process through the Internet Society, and most importantly the rather banal language used to frame the need for-and functionality of-such

commands. The chapter then focuses on the broader public articulation and rationale of robots.txt commands made in response to a political controversy. Latour (2005) makes a compelling argument that studies of social systems should begin by “feeding oﬀ controversies” in an eﬀort to locate its central actors, and discursive characteristics, formats, reach, and intensity, in lieu of assuming a priori the legitimacy and centrality of traditional political institutions. Since the internet is such a dispersed, content rich environment, however, we see in the example of robots.txt commands that information controversies often erupt at the very highest level, for the simple fact that they expose contradictions in traditional, hierarchical centers of government. Information controversies are made more broadly public (to less “wired” worlds), in other words, as mass mediated controversies. Furthermore, information controversies, as we shall see, are also often articulated as political controversies, particularly on the web where libertarian ethics still prevail in certain circles. We therefore begin by mapping the

political controversy that erupted on the internet over the White House’s use of robots.txt exclusion commands to reportedly keep content related to Iraq from being included in search engine databases. The controversy refutes the fallacy that data cleaning and formatting are simply attempts at making information retrieval more relevant, useful, and aesthetically pleasing. Rather, the chapter argues that robots.txt commands serve to expand proprietary spaces and ideologies of the web, even where no explicit forms of security-or password protected domainsexist. The remainder of the chapter focuses

on Google, both as governmental archive and self-regulatory space. Earlier in the chapter it is noted that the shear paucity of public information on robots.txt exclusion

commands has ampliﬁed the monopolistic tendencies of Google’s ranking of information on this topic. The “inventor” of robots.txt commands dominates the topranked Google pages on the topic. Moreover, in addition to centralizing and amplifying the language of the informal protocol’s inventor, Google has also sought to develop new tools to yet again highlight the unruly and unmanageable robots.txt commands and ﬁles. The chapter concludes with a discussion of how Google’s own web management systems and software have incorporated the robot exclusion convention in an eﬀort to increasingly standardize-and make search-engine ready-the formatting of web content via web management tools.