Chapter 5 Conclusions

Do you know the saying ``The whole is greater than the sum of the parts?’’
It is an insane statement. It is a nonsense. But now I believe that it is true.
“Thief of Time”, Terry Pratchett

We began this thesis showing that usually science advances in small steps, rather than in big leaps. This observation is even truer for complex systems, where a unified theory does not exist yet (if it ever does) and all we have is a collection of short stories.

In chapter 2 we focused on studying one of the most important tools used in complex systems, networks. In particular, we addressed the problem of how to create adequate null models for a network as a function of the availability of data. Recall that the most na"ive approximation is to use random graphs, as something to compare the real network against. However, this procedure has two main drawbacks. First, it is possible that the microscopic structure of the network is such that it yields higher order structures not present in random graphs. Although this is clearly valuable information, it might fool us into thinking that the system under consideration evolved to specifically create those structures. Instead, it is possible that they are just a direct consequence of lower order properties, which should then be the focus of our research.

The other main issue is that, as networks can be used in a huge amount of systems with very diverse characteristics, comparing a real network with a completely random graph might lead us to think that the network is stranger than it actually is. For instance, it has been observed that in friendship networks the number of triangles (i.e., if A and B are friends and B and C also, then A and C are friends too) is much larger than in random networks. This is indeed an important property of these networks. Yet, if we were given a new friendship network, we might not be interested in measuring if the number of triangles is larger than expected at random, because we already know that it will probably be. Instead, it might be more enlightening to check if that number is higher than in other friendship networks, or in a null model that reflects the common characteristics of these networks. This example can be extended to a lot of different systems. As a consequence, there is almost one null model for each application that we can think of.

When we wanted to study the anomalies present in the betweenness centrality of transportation networks, we could have created a null model that specifically took into account the characteristics of these systems, such as the population living in each municipality, their size, etc. Instead, we chose to follow the framework inspired by Jaynes of seeing statistical physics as a problem of information, which yielded the exponential random graph model. This more general framework allowed us to determine that the observed anomalies were just a consequence of the weight distribution. Hence, rather than wondering why these networks exhibit those anomalies, the focus should be on studying the mechanisms leading to those weight distributions. Furthermore, this study also highlighted the importance of having the proper amount of data.

We concluded the chapter applying the formalism to create multilayer contact networks using data of both the contact distribution of individuals and their age mixing patterns. We showed how said information can be extracted from real datasets and introduced into the model to generate realistic contact networks. Note that in this case we did not want to create a null model for comparison processes, but rather to build networks in an unbiased way given the available data. Thus, another advantage of the exponential random graph model is that it unifies several problems into one unique framework.

In chapter 3, we focused on the mathematical study of epidemic spreading. We followed the historical development of the field, from the simplest approximation to highly detailed models. Moreover, at each step, we observed the influence of including more data in the models. In the first case, we saw that the challenge of incorporating data is not only restricted to the problem of obtaining it, but that it is also really important to be aware of its characteristics. In particular, we saw that the age contact matrices cannot be na"ively applied to any population, as they already encode implicitly some information about it. Thus, if the population changes, the matrices also have to change.

Next, we created a highly realistic numerical model for the spreading of influenza-like diseases and showed that common theoretical assumptions might not be good enough to capture the complexity of the process. In particular, we observed that the definition of one of the most important quantities in epidemiology, the basic reproduction number, does not successfully capture the real dynamics of the epidemic. The reason was that the mathematical definition relies on several assumptions that are invalidated by data.

On the other hand, the third study was focused on extending the theoretical models of disease spreading in multilayer networks to the case in which the direction of the links is known. We showed that populations in which the underlying network possesses some directionality are more resilient against an epidemic that those that are completely undirected. Admittedly, thanks to online social platforms, it is much easier to obtain this information for social systems. Nevertheless, the basic formulation can be easily adapted to analyze the spreading of information. Hence, our results also imply that the role that platforms in which the communication is undirected can play a very different role from the ones that are directed.

Lastly, we analyzed the consequences of using different epidemic models as a function of data availability, with particular emphasis on the networks that we created at the end of chapter 2. We saw that the more data we have, the better, but that for some applications the simplest models with less data can also provide valuable information. In particular, even though knowledge of the underlying network is crucial to determine the epidemic threshold, information on the age structure of the population is essential for the correct definition of risk groups.

To conclude, in chapter 4, we analyzed two examples of collective social behavior using data extracted from very different sources. First, we showed that Hawkes processes can be effectively used for the analysis of online boards such as Forocoches. Furthermore, we were able to distinguish two different types of activity, one that was independent from the rest of the users and another one in which the social component of the process was indispensable. We finished the chapter studying an online crowd event, Twitch Plays Pok'emon. We saw that despite its unique characteristics, some properties of the online crowd were similar to the ones that offline crowds exhibit signaling, once again, that modern societies are intertwined with the online world.

To sum up, we have overviewed a tiny fraction of the field of complex systems, with special emphasis on the role that new data can have in problems ranging from the most theoretical work to highly realistic computer simulations. We hope that this collection of short stories will show the huge diversity of problems that are still open in the field of complex systems and, at the same time, shed some light on them.

5.1 Future work

There are multiple ways in which the results presented in this thesis can be extended, either to deepen our knowledge of particular systems or to increase our global understanding of complex systems. Some of them have already started, while others are currently just projects.

In chapter 2, we focused on studying the exponential random graph model. Despite its many advantages, it is important to bear in mind that it also has its drawbacks. For instance, the computational resources needed for numerically obtaining the parameters of the model can be quite large, depending on the size and characteristics of the networks. Furthermore, there are currently many researchers, coming from very different fields, who want to use networks but lack the technical background needed to understand this model. For this reason, one of the next steps will be to summarize all the possible null models found in the literature, systematically studying their advantages and disadvantages. The objective of this work will be to provide a reference to those researchers working on complex systems who might not be used to study networks and, hence, are not aware of the pitfalls that simple techniques can have.

Regarding chapter 3, note that the main driver of the four studies was data: (1) how to handle data; (2) theory vs data-driven simulations; (3) improving theories in light of new data; and (4) combining data. Thus, in future works we will continue to explore new data sources, sometimes using them to improve theoretical approaches, at other times to create more realistic simulations in which many different types of data can be combined. For instance, we are currently studying a new dataset which contains information about the daily routines of workers in a hospital, with the objective of devising effective strategies for the reduction of the spreading of health care associated infections.

Lastly, in chapter 4, we saw two examples of online collective behavior. For the case of Forocoches, there is still a lot of work to do. Regarding its dynamics, we can add non-constant background intensities to better characterize the behavior of threads, or explore further the relation between the success of a thread and its content (or the users that participate in it). Furthermore, the data itself can also be used to study the creation and evolution of memes as we hinted, or as a complement for the analysis of events that are currently mainly studied using data from Twitter. On the other hand, the work of Twitch Plays Pok'emon can be considered, for the moment, closed, although it has sprouted some ideas about mimicking the rules of the game in simpler settings, in order to be able to perform controlled experiments on the behavior and organization of crowds. Nevertheless, this chapter has also shown that there are many new research opportunities in online systems, sometimes with connections to the ``classical’’ offline world, but at other times with completely different characteristics. We will be vigilant, and as new data appears and new phenomena are uncovered, we will explore them.