ABSTRACT

So-called ‘Big Data’ analysis has emerged as a new research field only a few years ago. It is seen as a “step change” in the production of the “scope” of knowledge about a “given phenomenon” (Schroeder, 2014:6) and as a “computational turn” of social sciences (Boyd & Crawford, 2012). While some consider these new research magnitudes a ‘fascinating’ field, others feel that the metrics of all sorts of data footprints are merely a side effect of the ubiquity of digital communication and not more than a ‘buzz.’ The term ‘Big Data’ relates to the analysis of code figurations within magnitudes of ‘high volume,’ ‘velocity,’ and ‘high variety.’ Algorithm codes are aggregated by software tools operating across strings of either ‘fluid’ networks, such as ‘live’ networks, or ‘static’ sites, such as fixed public or corporate sector data sets. Big Data studies in social sciences and communication studies are often based on fluid platforms, i.e., mainly commercial network sites such as Google or Twitter. A typical sample size consists of at least one million ‘units.’ One of the few transnational comparative Big Data studies that investigates the use of ‘emoticons’ posted in tweets is based on a file of 549 GB and a data set of 1,755,925,520 tweets, produced by 55 million users worldwide (Park, Baek, & Cha, 2014). This example reveals not only a gigantic ‘horizontal’ magnitude of ‘units,’ aggregated by a metrical logic, but also a ‘vertical’ magnitude, as each of these ‘units’ also contain sets of magnitude of—what we might call—‘data demographics.’ For instance, each tweet ‘unit’ contains “the corresponding time stamp” in addition to “user information,” which consists of “the number of tweets, followers and followees, as well as the start date of the account and the geo-location” (Park et al., 2014:339). Data demographics, rarely addressed in debates, include significant scales of ‘privacy’ information, contact networks, and names within multilayered digital individualized footprints. The combination of these horizontal and vertical magnitudes alone makes it almost impossible to produce meaningful research outputs. It is 114therefore not surprising that Big Data studies mainly rely on methods to reduce the complexity of these unprecedented magnitudes. Results are produced along parameters of descriptive statistics of network patterns.