Chapter 1 Introduction

Caminante, son tus huellas
el camino y nada más;
Caminante, no hay camino,
se hace camino al andar.

Al andar se hace el camino,
y al volver la vista atrás
se ve la senda que nunca
se ha de volver a pisar.

Caminante no hay camino
sino estelas en la mar.

Wanderer, it is your footprints
winding down, and nothing more;
wanderer, no roads lie waiting,
roads you make as you explore.

Step by step your road is charted,
and behind your turning head
lies the path that you have trodden,
not again for you to tread.

Wanderer, there are no roadways,
only wakes upon the sea.

Proverbios y cantares, Antonio Machado

\(~\)

“But although, as a matter of history, statistical mechanics owes its origin to investigations in thermodynamics, it seems eminently worthy of an independent development, both on account of the elegance and simplicity of its principles, and because it yields new results and places old truths in a new light in departments quite outside of thermodynamics” wrote Josiah W. Gibbs in the preface of his seminal book published in 1902, Elementary principles in statistical mechanics [1]. Yet, starting from that point might fool the reader into thinking that science progresses in eureka steps, isolated sparks of inspiration only attainable by geniuses. In reality, however, its progress is much more continuous than discrete. As Picasso said, inspiration exists, but it has to find you working.

XIX The birth of statistical mechanics

We shall begin this thesis with the work published by Carnot in 1824 [2]. The industrial revolution had brought steam engines all around Europe, completely reshaping the fabric of society. Yet, in Carnot’s words, “their theory is very little understood, and the attempts to improve them are still directed almost by chance”¹. Although clear efforts had been done in the pursuit of understanding the science behind what was yet to be named as thermodynamics, Carnot’s work is usually regarded as the starting point of modern thermodynamics. The book was, however, slightly overlooked until 1834 when it laid on the hands of Clapeyron, who found its ideas “fertile and incontestable” [4]. In fact, it was Clapeyron the one that used the pressure-volume diagram (developed, in turn, by Watt in the late XVIII century) to represent the Carnot cycle, an image that is nowadays etched into the memory of every student of thermodynamics.

Subsequently, the ideas behind thermodynamics were developed mainly by Clausius and Kelvin, with the indispensable insights provided by experiments such as the ones carried out by Joule. Actually, it was Clausius the one that coined the term entropy in 1865 [5] although, interestingly, he had obtained that same quantity a few years before, in 1854, but did not realize its full potential [6]. Albeit in a different shape, this concept will be one of the cornerstones of chapter 2.

To continue discussing the path followed by Clausius we need to bring two more theories to the table: the kinetic theory of gases and the theory of probability. The former theory stated that a gas was composed by many tiny particles or atoms, whose movement was responsible for the observed pressure and temperature of the gas. This theory had been proposed a century ago, in 1738, by Daniel Bernoulli, although it did not attract much attention at the time [7]. The theory of probability, on the other hand, had received several contributions along the years, being the one by Daniel’s uncle, Jacob Bernoulli, one of the most well known (for instance, Bernoulli trials or the Bernoulli distribution are named after him). Probability had been regarded for some time as something mainly related to gambling, but step by step it started to loose that negative connotation when scientists in the XVIII century introduced it into error theory for data analysis. The process was then firmly established with the works by Gauss in the early 1800s and by the middle of the century it was already common in physics textbooks².

This leads us to the year 1859. Clausius had published a work about molecular movement for the kinetic theory of gases that, at first glance, implied that molecules could move freely in space. This was criticized by some scientists as, if it were true, they wondered why clouds of tobacco smoke extended slowly rather than quickly filling up the whole room. Clausius regarded that objection legitimate and further developed his theory to account for “how far on an average can the molecule move, before its centre of gravity comes into the sphere of action of another molecule” [9]. In other words, he calculated what we know today as the mean free path. Furthermore, he introduced the concept of average speeds and random impacts. As we will see, this work was fundamental for the development of statistical physics.

St John’s College, in Cambridge, started a scientific competition in 1855 whose solution was to be delivered by the end of 1857. The problem was to explain the composition of Saturn’s rings, something that had puzzled scientists for over 200 years. Laplace had already shown that a solid ring would be unstable, but nevertheless the examiners proposed three hypothesis, the rings could be: solid, liquid or composed of many separate pieces of matter. The only contestant that submitted a proposal was James C. Maxwell, who showed that the only stable solution was the last one, granting him the prize [10]. Interestingly, he claimed in his solution that collisions between those pieces were possible, but that he was unable to calculate them [11]. But then, just three years later, in 1860, he derived an equation that today is regarded as the origin of statistical mechanics, the Maxwell distribution of velocities, obtained precisely by calculating collisions between particles [12]. In that three years period only one thing had change, the paper by Clausius in 1859, which he actually cites at the beginning of his paper. Notwithstanding his great achievement, it seems clear that the spark was not isolated, but came from a burning wick.

At this point we need to add a new scientist to the group, Ludwig E. Boltzmann. The scientific career of Boltzmann started in 1866, when he tried to give an analytical proof of the second law of thermodynamics. His actual accomplishment was quite modest. Yet, two years later, in 1868, he changed his approach and started a series of lengthy memoirs where he extended the results from Maxwell resulting into the full development of statistical mechanics [13]. The fundamental work by Boltzmann was, in turn, expanded by Gibbs in the classical treatise of 1902 that opened this chapter [1] (although some of his ideas had been already proposed by Boltzmann, they had been slightly overlooked by his colleges, see [14] for a discussion on why this might have happened). The development of statistical mechanics would culminate in 1905 with the work by Einstein on the Brownian motion, regarded by Born as the final proof for physicists “of the reality of atoms and molecules, of the kinetic theory of heat, and of the fundamental part of probability in the natural laws” [15].

It should be noted, however, that one of the key elements of the theory introduced by the aforementioned scientists was the concept of ensemble, that we will further describe in chapter 2. In spite of its importance, their use raised some mathematical problems that they could not solve, “but like the good physicists they were, they assumed that everything was or could be made all right mathematically and went on with the physics” [16]. Some years later this lead to the subject of ergodic theory which we will not address in this thesis.

XIX.1 Meanwhile, in the social sciences

The XIX century can be also acknowledged as the century when social science was born. There were multiple factors leading to such enterprise, of which we might highlight: the social changes induced by the industrial revolution, the standardization of statistical approaches beyond physics and the publication of Darwin’s On the origin of Species. Some of the ideas developed in these areas, perhaps surprisingly, echoed in the own development of physics in the XX century.

To give a brief overview, we shall start with Quetelet’s view of statistics. Back in the beginning of the XIX century, statistics was mostly restricted to the calculation of errors in astronomy. In 1823 Quetelet traveled to Paris to study astronomical activities, and he became impassioned by the subject of probability. Since then, he went on to put it to practical use in the study of the human body, in an attempt to find the average man. This led to the creation of the Body Mass Index, which we still use today [17]. In subsequent years he developed his ideas further and applied statistics not only to the human body but also to states and even to the social body, i.e. the aggregation of the whole human race. We find particularly interesting that, as one of the precursors of social sciences, he believed that it was possible to find laws for the social body “as fixed as those which govern the heavenly bodies: [like in] physics, where the freewill of man is entirely effaced, so that the work of the Creator may predominate without hindrance. The collection of these laws, which exist independently of time and of the caprices of man, form a separate science, which I have considered myself entitled to name social physics”³.

Another important (for our interests) branch of science that started in this century is the mathematical study of demography. In the beginning of the XIX century it was claimed that surnames were being lost (particularly among the nobility). Francis Galton, a famous statistician (and cousin of Darwin), thought that this could be addressed mathematically and put it as an open problem for the readers of Educational times in 1873. The proposed solutions did not please him, so he joined another mathematician, Henry W. Watson, and together developed the theory that later came to be known as the Galton-Watson process [20]. Their theory, based on generating functions, was a novel way of tackling the study of demography, specially for not being deterministic. This seemingly ingenuous problem is credited as the origin of the theory of branching processes [21], which in turn was very important in the development of graph theory, epidemiology and the theory of point processes, as we shall see in chapters 2, 3 and 4 respectively. Furthermore, this problem, together with the great impact that the book of his cousin had on him, lead him to the study of the infamous “cultivation of race” or eugenics [22]. Lastly, in 1906 he went back to the roots of statistics and performed the first experiment on collective intelligence to which we shall further return in chapter 4.

XX The century of Big Science

The enormous expenditures for research and development during World War II brought a revolution in the physical sciences [23]. For instance, branching processes and Monte Carlo methods, which we will use throughout this thesis, were developed at that time. Branching processes (term coined by Kolmogorov [24]), following the path started by Galton in the previous century, were used to study neutron chain reactions and cosmic radiation [25]. In turn, Monte Carlo methods were a tool used to study several stochastic processes. In particular, we will use these methods in chapters 3 and 4 (see [26] for a nice review of their history and the origin of the name).

Both branching processes and Monte Carlo methods are intimately related to percolation processes, proposed by Broadbent and Hammersley in 1956. Initially, the idea was to study the diffusion of a fluid in a medium but focusing on the medium, in contrast to common diffusion processes that used to focus on the fluid, in order to design better gas masks for coal miners (Broadbent received support from the British Coal Utilisation Research Association during his PhD⁴). Interestingly, though, they gave examples of which problems could be tackled with this formalism and included the spreading of a disease in an orchad [28]. In recent years these processes have then been applied to the study of disease spreading and network percolation, as we will see in chapter 3.

But there was also space for fundamental research. Using the information theory established by Shannon [29], Jaynes proposed in 1957 that statistical physics could be derived from an information point of view [30]. Rather than deriving the theory from dynamical arguments, he argued that the objective of statistical physics was to infer which probability distribution was consistent with data while having the least possible bias respect to all other degrees of freedom of the system. In this sense, the entropy would be a measure of the information about the system, so that maximizing the entropy would be equivalent to maximizing the ignorance subject to the data that is known to be true. This view was not unanimously embraced, as some scientists believed that this definition depends on the observer, which goes against the fact that entropy is a definite physical quantity that can be measured in the laboratory [31].

This debate is still open today. For instance, in the book by Callen that is widely used to teach thermodynamics in physics courses, this view of statistical mechanics is deeemed to be “a subjective science of prediction”. Instead, he proposes the more common view of entropy as a measure of “objective” disorder [32]. Yet, if one goes to the original work by Bridgman, published in 1943, where he introduces the notion of disorder, he claims that the definition is “anthropomorphic” and not absolute [33]. Others, like Ben-Naim, claim that the problem lays in the history of the development of statistical mechanics. As we saw in @ref(sec:chap1_statmech), this discipline started from thermodynamics. Hence, entropy was defined as a quantity of heat divided by temperature, yielding units of energy over temperature. If entropy, instead, refers to information, it has to be a dimensionless quantity. The problem, he argues, derives from the fact that the concept of temperature was developed in a pre-atomistic era. Once Maxwell identified the temperature with the kinetic energy of atoms, the own definition of temperature could have been changed into the units of energy. In such case, heat over temperature would result in a dimensionless quantity, making it much easier to be accepted as a measure of information. Furthermore, he claims that most people have misinterpreted the own concept of information in this context, see [34]. In spite of these controversies, in chapter 2 we will follow Jaynes’ definition to be able to apply the formalism of statistical physics to graphs.

Jaynes’ proposal is one of the first hints about the usefulness of statistical physics outside the classical realm of physics. In particular, the framework of statistical physics has shown to be quite useful in a new branch of science that started to develop during the 1960s, complex systems.

XX.1 More is Different

This section is named after the famous paper by P. W. Anderson, where he claimed that the reductionist approach followed by physicisits up to that moment had to be revisited [35]. He argued that obtaining information about the fundamental components of a system did not mean that you could then understand the behavior of the whole. Instead, he proposed that systems were arranged in hierarchies so that each upper level had to convey to the rules of the lower level, but it could also exhibit its own laws that could not be inferred from the ones of the fundamental constituents. His book More and Different offers some insights about the reasons that led him to write that paper⁵, as well as a glimpse of how condensed matter physics was born from the ashes of the Second Wold War [37].

This view of systems as a multiscale arrangement grew during the 1960s and 1970s lead by several observations that would end up composing what we know as complex systems. For instance, in condensed matter physics the intereset in disordered systems (such as spin glasses or polymer networks) started to increase. The expertise obtained with these models, allowed for the study of collective behavior of completely different systems but whose components were also heterogeneous, such as in biological systems [38].

Another example of results that composed the early theory of complex systems is chaos theory. Despite the pioneering work by Poincarè in the late XIX century, chaos theory was mainly developed in the middle of the XX century [39]. One of its main starting points was the study by Lorenz in 1963 on hydrodynamics [40]. In that study, he showed that a simple nonlinear differential system of equations, meant to reproduce weather dynamics, exhibited wildly different evolution with very similar starting conditions. In other words, he had discovered chaos. His study had a huge impact in the community because not only it opened the door to a whole new area of research, but also in the particular case of weather prediction signaled that maybe long term predictions were not attainable, as perfect knowledge of the initial conditions was not achievable. He summarized that statement saying “if the theory were correct, one flap of a sea gull’s wings would be enough to alter the course of the weather forever” (the sea gull was turned into a butterfly latter on for aesthetic reasons) [41].

From that point forward, the study of nonlinear systems explode. During the 1970s mathematicians, physicists, biologists, chemists, physiologists, ecologists… found a way through disorder. One of those scientists was R. May, a theoretical physicist who initially focused his research on superconductivity. However, he was suddenly trapped by the ideas behind nonlinear equations and their application to population dynamics [42]. In fact he is considered the founder of theoretical ecology in the 1970s, and as we shall see in chapter 3, his contributions to theoretical epidemiology were also outstanding.

In the following decades many more concepts where added to the body of complex systems: critical phenomena, self-similarity, fractals… It was found that several systems from very different fields exhibited similar properties, such as scale free distributions [43]. Unfortunately, their greatest strength is also their main weakness. Complex systems are everywhere, but there is not yet a universal law that can be applied to a wide array of them. For some time, it was proposed that self-sustained criticality (critical phenomena arising independently of the initial condition) could be that holy grial [44] but it turned out not to be the case [45]. Having scientists from so many different backgrounds tackling problems set in such a wide array of fields (from molecular biology to economy, urban studies or disordered systems) without common laws tying them up together, renders even the own definition of a complex system a daunting task. Finding that law, or framework, common to all such systems is still today one of the greatest challenges in the field.

This problem, unifying complex systems, has been addressed from many diverse perspectives. For instance, by the end of the century an international group of ecologists, economists, socials scientists and mathematicians collaborated in the “Resilience project” with the objective of deepen the understanding of linked socio-ecological systems. From that 5-year project they developed the concept of panarchy, in an attempt to remove the rigid top-down nature that is associated with hierarchies [46]. Interestingly, they claimed to be inspired by the work by Simon, but do not mention the work published by Anderson in 1972 that is essentially the same concept, but in the context of physics. In turn, Anderson did not cite Simon in his article, even though he had been discussing on the problem of hierarchies since the 1960s [47]. This is a great example of the fragmentation that ballasts the development of complex systems.

The view of Anderson in this matter is particularly interesting. He argued that physics in the XX century solved problems that had clear hierarchical levels such as atomic theory, electroweak theory or classical hydrodynamics. Consequently, the XXI century should be devoted to building “generalizations that jump and jumble the hierarchies”. Furthermore, he claimed that, by embracing complexity, the “theorist” will no longer be confined by a modifier specifying “physics”, “biology” or “economics” [37].

An arguably more pesimystic view is presented by Newman in his great resource letter for complex systems. He claims that since there is not a general theory of complex systems, and it might never arrive, maybe it should be better to talk about “general theories” as complex systems is not a monolithic body of knowledge. He summarizes this view saying that “complex systems theory it not a novel, but a series of short stories”. [48]. In this thesis we will revise some of those stories.

XX.2 From lattices to networks

Towards the end of the century, however, the initial hype on nonlinear dynamical systems started to decline. It was time to add more details into the models, and consequently more data, although it was a more challenging venture than it might seem. For instance, the common approach of making theoretical predictions and comparing them with experiments is not that straightforward in complex systems, as the own definition of prediction can have very different meanings in chaos or stochastic systems due to the extreme sensibility to the initial conditions. Thus, the comparison with data in these systems had to focus initially on extracting universal patterns rather than going to specific details [49]. A great leap forward in this direction was the introduction of the classical graph theory developed in the middle of the XX century in the shape of networks.

A holistic view of a system implies that its components are no longer isolated and their interactions have to be properly taken into account. Networks represent a particularly useful tool for such endeavor. In order to gauge their importance in complex systems, suffice to say that in the Conference of Complex Systems of 2018 out of over 400 contributions, 60% of them explicitly mention the term “network” in their abstract. However, networks were not originated along complex systems, but way before.

The origin of networks as a tool to study other systems, rather than as a mathematical object of their own, began in 1934. The psychologist Jacob Moreno proposed that the evolution of society might be based on some laws, which he wanted to uncover in order to develop better therapies to treat his patients. In order to do so, he proposed to study communities up to their “social atoms”. He then studied the relations between those social atoms, which in his first work were babies and children in a school. This represented a shift from classical sociological and psychological studies, where the attributes of the actors (a generic term used to refer to the element under study in sociology) used to be more important. Furthermore, he represented the actors in his studies with circles and connected them with lines if they had some relation. These diagrams, which he denominated sociograms, where the first examples of networks [50]. This procedure was mostly forgotten until the 1960s, when it was picked up by sociologists who further developed the theory of networks, with new tools and frameworks⁶.

A particularly interesting example of social research on networks is the well-known study performed by Milgram in 1967 [52]. In his experiment, Milgram sent a package to random individuals with the instructions that, if they wanted to participate, they should send the package to someone they knew that might, in their opinion, new a person that Milgram had chosen, or at least get closer to her. The purpose of the experiment was to determine the number of steps that the package had to take to navigate the social network of the country. Interestingly, he found that the average path length was close to six. These results led to the notion of six degrees of separation and small world that have been part of popular folklore ever since.

Yet, the use of networks remained constrained to the fields of sociology and some areas of mathematics until the end of the century, when research in networks exploded in several fields at the same time. This, however, meant that lot of advances that had been done during decades were not widely known, leading to multiple rediscoveries of the same concepts. For instance, the multiplex networks that we will see in chapter 2 were introduced around 2010, although the term multiplex had been coined in 1955 by the anthropologist Max Gluckman during his studies of judicial processes among the Barotse [53]. Similarly, Park and Newman introduced in 2004 [54] the exponential random graph model that had been already developed in 1981 by Holland and Leinhardt [55]. Nonetheless, it should be noted that they acknowledge the work by Holland and Leinhardt in their paper and present a different formulation of the model. In fact, chapter 2 will mostly be devoted to the formulation by Newman and Park which, in turn, is based in the statistical physics framework proposed by Jaynes that we discussed in earlier.

The first paper of what we might call the “modern” view of networks, was the work by Watts and Strogatz on small-world networks, published in 1998 [56]. In their work, they took three very different networks (the neural network of a worm, the power grid of the western United States and the collaboration network of film actors) and measured their average path length. Surprisingly, they found that the three of them exhibited the same small-world behavior as observed by Milgram 30 years before. Besides, they created a model to explain those networks that interpolated between the well-known lattices and random networks. One may wonder, then, why that paper was so successful if most results were one way or another already known. And the answer, we believe, is data and universality.

Indeed, in sociology most networks analyzed were fairly small, as they were collected manually. The spreading of the internet in the 1990s, however, allowed scientists to share information in an unprecedented way. Even more, larger sets of data could be analyzed and stored. The fact that they showed that three large systems of completely different nature had the same properties was determinant in its success. In fact, those networks were not extremely interesting on their own, they selected them “largely because they were available in a format that could easily be put into a computer as the adjacency matrix of a graph” [57]. But it was clearly a good choice. Thanks to that variety, researchers from many different areas saw small-world networks as a Rorschach test, in which every scientist saw different problems depending on their disciplines [58].

We can summarize this point of view using Stanley et al. words in the seminal paper that started the area of econophysics, “if what we discovered is so straightforward, why was it not done before? [Becuase] a truly gargantuan amount of data analysis was required” [59].

XXI The Information Age

Undoubtedly, we live in the information age. To put it into perspective, while in the small-world paper previously mentioned 3 networks were used, in figure @ref(fig:chap2_nullNetworks) we will compare 1,326 networks that were collected with just a couple of clicks.

Obtaining meaningful information from high-dimensional and noisy data is not an easy task. To achieve this, the limits of the theoretical framework of statistical physics will have to be extended [60]. A large amount of data also means data from very different sources. Hence, we need to combine temporal and spatial scales and nonlinear effects in the context of out of equilibrium systems. Furthermore, it is not only important to extract the information and build appropriate models to increase our knowledge of a given system, but also to develop quantities that might be useful to describe multiple sociotechnical systems at the same time [61].

For instance, in epidemics data from very different scales, from flight data to physical contact patterns can be combined, together with economic and social analysis to produce much more informative spreading models. But mathematical models able to capitalize such data stream are not available yet [62]. The question on whether the information shared in the internet can be used to track epidemic evolution is also open. Google Flu claimed that could predict the evolution of flu using search statistics, but it was shown that it was better at predicting winter than diseases [63].

Interestingly though, one of the areas which might seem would benefit more for having huge amounts of data about human behavior and communications patterns, sociology, has not embraced it yet [64]. This is even more striking given that the precursors of sociology, Quetelet and Comte, as previously discussed, believed in the possibility of addressing social systems in a similar fashion as other experimental sciences, i.e., with data.

There is currently a huge debate in sociology about the impact that this amount of data can have in the own field. For instance, McFarland et al. talk about sociology being subverted to computer science. Their fear is that data might be used only to seek for solutions without explaining why. Moreover, they argue that the scientific culture of both disciplines are completely different. While computer science is characterized by large collaborations, fast review periods and quick development, sociology is slower, with larger review periods, more theory and a more “monastic” science [65]. But they also observe that it is a new opportunity, as data from behaviors that could not be analyzed before is being collected now. In fact, data about new behaviors, which deserve scientific analysis is also being collected, as we will see in chapter 4. Their proposal is to move towards a forensic social science in which applied and theory-driven perspectives are merged.

A similar approach is proposed by Halford and Savage, who fear that big data might corner sociology into a defensive position [66]. Instead, they propose to forget about inductive theory and woven it with data, in what they call symphonic social science. Even more, they believe that the limitations of the data, under proper guidance, can be leveraged. For instance, it is known that most classical psychological experiments are done on WEIRD population (western, educated from industrialized, rich, democracies) [67]. On the other hand, Twitter has a disproportionate number of young, male black and Hispanic users compared to the national population. Thus, it might offer some insights into groups that are underrepresented in some traditional scenarios.

There are clear signs that the interconectedness of society is bringing changes into our sociotechnical systems, even if they are not yet understood. For instance, it has been observed that since the appearance of Google Scholar the citation patterns among scholars have changed [68]. Nowadays, older articles are being cited more commonly than before and, at the same time, non top journals are getting more attention [69]. Still, many sociologists remain unconvinced that the sources of data and methods present something new or claim that instead of studying society, the use of data will lead us to study technology instead. But maybe society and techonology cannot be disentangled anymore, and they have to be addressed together [70]. In terms of Castells, we live in the culture of real virtuality and society is no longer structured over individual actors but around networks [71].

In any case, it is clear that in the XXI century the world will not longer be controlled by those who merely posses the information, but by those who are able to understand it. In Edward O. Wilson words, we are drowning in information, while starving for wisdom [72].

Quote extracted from the English translation published by Thurston in 1897 [3]↩︎
See [8] for an overview of the introduction of probability into physics.↩︎
August Comte, the father of sociology, held a similar view. He proposed that sciences could be arranged in order of generality of their theories and complexity: astronomy, physics, chemistry and physiology. However, there was one type of phenomena yet to be addressed, the “most individual, the most complicated, the most dependent on all others, and therefore […] the latest”, social phenomena. Oddly enough, he coined the term social physics to refer to this new branch of science that had to be affected in part by physiology and, at the same time, by the influence of individuals over each other [18]. However, once he discovered that Quetelet had used the same term, he changed it to sociology [19].↩︎
For a historical review of branching processes in general and their relation to percolation processes see [27]↩︎
Although we need to look for another source for the origin of the own sentece more is different. According to Pietronero [36], Anderson confessed that the paper originated from a sort of resentment that physicists in the field of condensed matter developed with respect to a certain arrogance of the field of elementary particles, who thought that their research was the only true intellectual challenge. Back in those dates the British environmental movement had various slogans such as “small is beautifull” and “more is worse”, from which he drew inspiration.↩︎
See [51] for the history of the development of social network analysis.↩︎