Reproducible science of science at scale: pySciSci

Alexander J. Gates , Albert-László Barabási

Quantitative Science Studies (2023) 4 (3): 700–710

Science of science (SciSci) is a growing field encompassing diverse interdisciplinary research programs that study the processes underlying science. The field has benefited greatly from access to massive digital databases containing the products of scientific discourse—including publications, journals, patents, books, conference proceedings, and grants. The subsequent proliferation of mathematical models and computational techniques for quantifying the dynamics of innovation and success in science has made it difficult to disentangle universal scientific processes from those dependent on specific databases, data-processing decisions, field practices, etc. Here we present pySciSci, a freely available and easily adaptable package for the analysis of large-scale bibliometric data. The pySciSci package standardizes access to many of the most common data sets in SciSci and provides efficient implementations of common and advanced analytical techniques.

The clinical trials puzzle: How network effects limit drug discovery

Kishore Vasan, Deisy Morselli Gysi, Albert-László Barabási

iScience 26, 108361

The depth of knowledge offered by post-genomic medicine has carried the promise of new drugs, and cures for multiple diseases. To explore the degree to which this capability has materialized, we extract meta-data from 356,403 clinical trials spanning four decades, aiming to offer mechanistic insights into the innovation practices in drug discovery. We find that convention dominates over innovation, as over 96% of the recorded trials focus on previously tested drug targets, and the tested drugs target only 12% of the human interactome. If current patterns persist, it would take 170 years to target all druggable proteins. We uncover two network-based fundamental mechanisms that currently limit target discovery: preferential attachment, leading to the repeated exploration of previously targeted proteins; and local network effects, limiting exploration to proteins interacting with highly explored proteins. We build on these insights to develop a quantitative network-based model to enhance drug discovery in clinical trials.

A network-based normalized impact measure reveals successful periods of scientific discovery across discipline

Qing Ke, Alexander J. Gates, and Albert-László Barabási

PNAS 120 (48) e2309378120 (2023)

The impact of a scientific publication is often measured by the number of citations it receives from the scientific community. However, citation count is susceptible to well-documented variations in citation practices across time and discipline, limiting our ability to compare different scientific achievements. Previous efforts to account for citation variations often rely on a priori discipline labels of papers, assuming that all papers in a discipline are identical in their subject matter. Here, we propose a network-based methodology to quantify the impact of an article by comparing it with locally comparable research, thereby eliminating the discipline label requirement. We show that the developed measure is not susceptible to discipline bias and follows a universal distribution for all articles published in different years, offering an unbiased indicator for impact across time and discipline. We then use the indicator to identify science-wide high impact research in the past half century and quantify its temporal production dynamics across disciplines, helping us identifying breakthroughs from diverse, smaller disciplines, such as geosciences, radiology, and optics, as opposed to citation-rich biomedical sciences. Our work provides insights into the evolution of science and paves a way for fair comparisons of the impact of diverse contributions across many fields.

Who Supports American Art Museums? Introducing a New Dataset and Data Sources about Museum Funding

Albert-László Barabási, Louis Shekhtman

Panorama: Journal of the Association of Historians of American Art 9, no. 2 (Fall 2023)

“New Scrutiny of Museum Boards Takes Aim at World of Wealth and Status.” “Warren Kanders Quits Whitney Board after Tear Gas Protests.” “Julie Mehretu Becomes Third Artist Ever to Join Whitney Board.” These are all headlines that have run in the New York Times since 2019.1 Whether condemning how trustees have made their money or celebrating new and diverse perspectives added to boards, they are exemplary of the ways in which the funding of art museums in the United States is, of late, a divisive topic. In many other countries—especially in Europe—governments serve as the main source of support for the arts. In the United States, governmental support largely takes a back seat to funding from private individuals and foundations. Private donors, in particular, play a significant role not only as sources of financial support but also in taking on major governance roles as trustees of institutions.

This funding structure leads to important questions about what roles these donors play in museums and how they influence which works are displayed, institutional priorities, and myriad other issues—in addition to ethical questions about the sources of funds used to support art museums.2 For all the discussion of this topic, however, there is a paucity of data available to inform the conversation. This essay seeks to start rectifying that by showing the ways in which public tax filings of both museums and foundations that donate to museums (often called institutional donors) can create a dataset that allows scholars and cultural commentators to understand better who funds and governs art institutions in the United States. To supplement the tax data, we also use a corpus of museum annual reports that have been published online.

As network scientists, we often seek to bring large datasets to bear on subjects that may not have previously had significant quantitative data available as part of their analytical toolkit.3 We came to the topic of museum funding through another project that used crowdsourced data from the LittleSis database to understand how billionaires and their families were connected to a range of not-for-profits, including arts institutions.4 As figure 1 shows, certain institutions, such as the Museum of Modern Art (MoMA) in New York City and the Kennedy Center in Washington, DC, attract many billionaires, serving as the center of an elite network of wealthy donors, while others, like Pérez Art Museum Miami, are supported by just one billionaire—in this case the billionaire for whom the museum is named. This essay builds on that initial work on studying networks of billionaires and their philanthropic giving by focusing on philanthropic giving to art museums in the United States in particular. In line with Panorama’s focus on American art, we center our attention on the funding of “American art” by using a sample of museums that articulate their support of American art in their mission statements.

Impact of physicality on network structure

Márton Pósfai, Balázs Szegedy, Iva Bačić, Luka Blagojević, Miklós Abért, János Kertész, László Lovász & Albert-László Barabási

Nat. Phys. (2023).

The emergence of detailed maps of physical networks, such as the brain connectome, vascular networks or composite networks in metamaterials, whose nodes and links are physical entities, has demonstrated the limits of the current network science toolset. Link physicality imposes a non-crossing condition that affects both the evolution and the structure of a network, in a way that the adjacency matrix alone—the starting point of all graph-based approaches—cannot capture. Here, we introduce a meta-graph that helps us to discover an exact mapping between linear physical networks and independent sets, which is a central concept in graph theory. The mapping allows us to analytically derive both the onset of physical effects and the emergence of a jamming transition, and to show that physicality affects the network structure even when the total volume of the links is negligible. Finally, we construct the meta-graphs of several real physical networks, which allows us to predict functional features, such as synapse formation in the brain connectome, that agree with empirical data. Overall, our results show that, to understand the evolution and behaviour of real complex networks, the role of physicality must be fully quantified.

Non-Coding RNAs Improve the Predictive Power of Network Medicine

Deisy Morselli Gysi and Albert-László Barabási

PNAS October 31, 2023 120 (45) e2301342120

Network medicine has improved the mechanistic understanding of disease, offering quantitative insights into disease mechanisms, comorbidities, and novel diagnostic tools and therapeutic treatments. Yet, most network-based approaches rely on a comprehensive map of protein–protein interactions (PPI), ignoring interactions mediated by noncoding RNAs (ncRNAs). Here, we systematically combine experimentally confirmed binding interactions mediated by ncRNA with PPI, constructing a comprehensive network of all physical interactions in the human cell. We find that the inclusion of ncRNA expands the number of genes in the interactome by 46% and the number of interactions by 107%, significantly enhancing our ability to identify disease modules. Indeed, we find that 132 diseases lacked a statistically significant disease module in the protein-based interactome but have a statistically significant disease module after inclusion of ncRNA-mediated interactions, making these diseases accessible to the tools of network medicine. We show that the inclusion of ncRNAs helps unveil disease–disease relationships that were not detectable before and expands our ability to predict comorbidity patterns between diseases. Taken together, we find that including noncoding interactions improves both the breath and the predictive accuracy of network medicine.

Quantifying hierarchy and prestige in US ballet academies as social predictors of career success

Yessica Herrera-Guzmán, Alexander J. Gates, Cristian Candia & Albert-László Barabási

Sci Rep 13, 18594 (2023)

In the recent decade, we have seen major progress in quantifying the behaviors and the impact of scientists, resulting in a quantitative toolset capable of monitoring and predicting the career patterns of the profession. It is unclear, however, if this toolset applies to other creative domains beyond the sciences. In particular, while performance in the arts has long been difficult to quantify objectively, research suggests that professional networks and prestige of affiliations play a similar role to those observed in science, hence they can reveal patterns underlying successful careers. To test this hypothesis, here we focus on ballet, as it allows us to investigate in a quantitative fashion the interplay of individual performance, institutional prestige, and network effects. We analyze data on competition outcomes from 6363 ballet students affiliated with 1603 schools in the United States, who participated in the Youth America Grand Prix (YAGP) between 2000 and 2021. Through multiple logit models and matching experiments, we provide evidence that schools’ strategic network position bridging between communities captures social prestige and predicts the placement of students into jobs in ballet companies. This work reveals the importance of institutional prestige on career success in ballet and showcases the potential of network science approaches to provide quantitative viewpoints for the professional development of careers beyond science.

Network medicine framework reveals generic herb-symptom effectiveness of traditional Chinese medicine

Xiao Gan, Zixin Shu, Xinyan Wang, Dengying Yan, Jun Li, Shany Ofaim, Réka Albert, Xiaodong Li, Baoyan Liu, Xuezhong Zhou, and Albert-lászló Barabási

Sci. Adv.9, eadh0215(2023)

Understanding natural and traditional medicine can lead to world-changing drug discoveries. Despite the therapeutic effectiveness of individual herbs, traditional Chinese medicine (TCM) lacks a scientific foundation and is often considered a myth. In this study, we establish a network medicine framework and reveal the general TCM treatment principle as the topological relationship between disease symptoms and TCM herb targets on the human protein interactome. We find that proteins associated with a symptom form a network module, and the network proximity of an herb’s targets to a symptom module is predictive of the herb’s effectiveness in treating the symptom. These findings are validated using patient data from a hospital. We highlight the translational value of our framework by predicting herb-symptom treatments with therapeutic potential. Our network medicine framework reveals the scientific foundation of TCM and establishes a paradigm for understanding the molecular basis of natural medicine and predicting disease treatments.

Philanthropy in art: locality, donor retention, and prestige

Louis Michael Shekhtman & Albert-László Barabási

Sci Rep 13, 12157 (2023)

A significant portion of funding for art comes from foundations, representing a key revenue stream for most art organizations. Little is known, however, about the quantitative patterns that govern art funding, limiting the fundraising efficiency of organizations in need of resources, as well as optimal funding allocation of donors. To address these shortcomings, here we relied on the IRS e-file dataset to identify $36B in grants from 46,643 foundations to 48,766 art recipients between 2010 and 2019, allowing us to quantify donor-recipient relationships in art. We find that philanthropic giving is broadly distributed, following a stable power-law distribution, indicating that some funders give considerably and predictably more than others. Giving is highly localized, with 60% of grants and funds going to recipients in the donor’s state. Furthermore, donors often support multiple local organizations that offer distinct artforms, rather than advancing a particular subarea within art. Donor retention is strong, with nearly 70% of relationships continuing the next year. Finally, we explored the role of institutional prestige in foundation giving, finding that funding does correlate with prestige, with notable exceptions. Our results present the largest and most comprehensive data-driven exploration of giving by foundations to art to date, unveiling multiple insights that could benefit both donors and recipients.

Machine learning prediction of the degree of food processing

Giulia Menichetti, Babak Ravandi, Dariush Mozaffarian & Albert-László Barabási

Nature Communications 14, 2312 (2023)

Despite the accumulating evidence that increased consumption of ultra-processed food has adverse health implications, it remains difficult to decide what constitutes processed food. Indeed, the current processing-based classification of food has limited coverage and does not differentiate between degrees of processing, hindering consumer choices and slowing research on the health implications of processed food. Here we introduce a machine learning algorithm that accurately predicts the degree of processing for any food, indicating that over 73% of the US food supply is ultra-processed. We show that the increased reliance of an individual’s diet on ultra-processed food correlates with higher risk of metabolic syndrome, diabetes, angina, elevated blood pressure and biological age, and reduces the bio-availability of vitamins. Finally, we find that replacing foods with less processed alternatives can significantly reduce the health implications of ultra-processed food, suggesting that access to information on the degree of processing, currently unavailable to consumers, could improve population health.

Improving the generalizability of protein-ligand binding predictions with AI-Bind

Ayan Chatterjee, Robin Walters, Zohair Shafi, Omair Shafi Ahmed, Michael Sebek, Deisy Gysi, Rose Yu, Tina Eliassi-Rad, Albert-László Barabási, & Giulia Menichetti.

Nature Communications 14, 1989 (2023)

Identifying novel drug-target interactions is a critical and rate-limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, here we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Here we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training to improve binding predictions for novel proteins and ligands. We validate AI-Bind predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. AI-Bind is a high-throughput approach to identify drug-target combinations with the potential of becoming a powerful tool in drug discovery.

Accelerating network layouts using graph neural networks

Csaba Both, Nima Dehmamy, Rose Yu & Albert-László Barabási

Nature Communications 14, 1560 (2023)

Graph layout algorithms used in network visualization represent the first and the most widely used tool to unveil the inner structure and the behavior of complex networks. Current network visualization software relies on the force directed layout (FDL) algorithm, whose high computational complexity makes the visualization of large real networks computationally prohibitive and traps large graphs into high energy configurations, resulting in hard-to-interpret “hairball” layouts. Here we use Graph Neural Networks (GNN) to accelerate FDL, showing that deep learning can address both limitations of FDL: it offers a 10 to 100 fold improvement in speed while also yielding layouts which are more informative. We analytically derive the speedup offered by GNN, relating it to the number of outliers in the eigenspectrum of the adjacency matrix, predicting that GNNs are particularly effective for networks with communities and local regularities. Finally, we use GNN to generate a three-dimensional layout of the Internet, and introduce additional measures to assess the layout quality and its interpretability, exploring the algorithm’s ability to separate communities and the link-length distribution. The novel use of deep neural networks can help accelerate other network-based optimization problems as well, with applications from reaction-diffusion systems to epidemics.

Genomics and phenomics of body mass index reveals a complex disease network

Huang, J., Huffman, JE., Huang, Y., Do Valle, I., Assimes, TL., Raghavan, S., Voight, B.F., Liu, C., Barabasi, A.-L., Huang, RDL., Hui, Q., Nguyen, X-M T., Ho, Y.-L., Djousse, L., Lynch, J.A., Vujkovic, M., Techeandjiue, C., Tang, H., Damrauer, SM., Reaven, P.D., Miller, D., Phillips, L.S. Ng, MCY. Graff, M., Haiman, C.A., Loos, RJF., North, KE., Yengo, L., Smith, GD., Saleheen, D., GAziano, JM., Rader, DJ., Tsao, PS., Cho, K., Change, K-M., Wilson, PWF., VA Million Veteran Program, Sun Y.V., O’Donnel, CJ.

Nature Communications 13, 7973 (2022)

Elevated body mass index (BMI) is heritable and associated with many health conditions that impact morbidity and mortality. The study of the genetic association of BMI across a broad range of common disease conditions offers the opportunity to extend current knowledge regarding the breadth and depth of adiposity-related diseases. We identify 906 (364 novel) and 41 (6 novel) genome-wide significant loci for BMI among participants of European (N~1.1 million) and African (N~100,000) ancestry, respectively. Using a BMI genetic risk score including 2446 variants, 316 diagnoses are associated in the Million Veteran Program, with 96.5% showing increased risk. A co-morbidity network analysis reveals seven disease communities containing multiple interconnected diseases associated with BMI as well as extensive connections across communities. Mendelian randomization analysis confirms numerous phenotypes across a breadth of organ systems, including conditions of the circulatory (heart failure, ischemic heart disease, atrial fibrillation), genitourinary (chronic renal failure), respiratory (respiratory failure, asthma), musculoskeletal and dermatologic systems that are deeply interconnected within and across the disease communities. This work shows that the complex genetic architecture of BMI associates with a broad range of major health conditions, supporting the need for comprehensive approaches to prevent and treat obesity.

Research gaps and opportunities in precision nutrition: an NIG workshop report

Bruce Y Lee, José M Ordovás, Elizabeth J Parks, Cheryl A M Anderson, Albert-László Barabási, Steven K Clinton, Kayla de la Haye, Valerie B Duffy, Paul W Franks, Elizabeth M Ginexi, Kristian J Hammond, Erin C Hanlon, Michael Hittle, Emily Ho, Abigail L Horn, Richard S Isaacson, Patricia L Mabry, Susan Malone, Corby K Martin, Josiemer Mattei, Simin Nikbin Meydani, Lorene M Nelson, Marian L Neuhouser, Brendan Parent, Nicolaas P Pronk, Helen M Roche, Suchi Saria, Frank A J L Scheer, Eran Segal, Mary Ann Sevick, Tim D Spector, Linda Van Horn, Krista A Varady, Venkata Saroja Voruganti, Marie F Martinez

The American Journal of Clinical Nutrition, 116, 6

Precision nutrition is an emerging concept that aims to develop nutrition recommendations tailored to different people's circumstances and biological characteristics. Responses to dietary change and the resulting health outcomes from consuming different diets may vary significantly between people based on interactions between their genetic backgrounds, physiology, microbiome, underlying health status, behaviors, social influences, and environmental exposures. On 11–12 January 2021, the National Institutes of Health convened a workshop entitled “Precision Nutrition: Research Gaps and Opportunities” to bring together experts to discuss the issues involved in better understanding and addressing precision nutrition. The workshop proceeded in 3 parts: part I covered many aspects of genetics and physiology that mediate the links between nutrient intake and health conditions such as cardiovascular disease, Alzheimer disease, and cancer; part II reviewed potential contributors to interindividual variability in dietary exposures and responses such as baseline nutritional status, circadian rhythm/sleep, environmental exposures, sensory properties of food, stress, inflammation, and the social determinants of health; part III presented the need for systems approaches, with new methods and technologies that can facilitate the study and implementation of precision nutrition, and workforce development needed to create a new generation of researchers. The workshop concluded that much research will be needed before more precise nutrition recommendations can be achieved. This includes better understanding and accounting for variables such as age, sex, ethnicity, medical history, genetics, and social and environmental factors. The advent of new methods and technologies and the availability of considerably more data bring tremendous opportunity. However, the field must proceed with appropriate levels of caution and make sure the factors listed above are all considered, and systems approaches and methods are incorporated. It will be important to develop and train an expanded workforce with the goal of reducing health disparities and improving precision nutritional advice for all Americans.

Fragmentation of outage clusters during the recovery of power distribution grids

H Wu, X Meng, MM Danziger, SP Cornelius, H Tian, AL Barabási.

Nature Communications 13, 7372 (2022)

The understanding of recovery processes in power distribution grids is limited by the lack of realistic outage data, especially large-scale blackout datasets. By analyzing data from three electrical companies across the United States, we find that the recovery duration of an outage is connected with the downtime of its nearby outages and blackout intensity (defined as the peak number of outages during a blackout), but is independent of the number of customers affected. We present a cluster-based recovery framework to analytically characterize the dependence between outages, and interpret the dominant role blackout intensity plays in recovery. The recovery of blackouts is not random and has a universal pattern that is independent of the disruption cause, the post-disaster network structure, and the detailed repair strategy. Our study reveals that suppressing blackout intensity is a promising way to speed up restoration.

Maximizing Brain Networks engagement via Individualized Connectome-wide Target Search

Menardi, A., Momi, D., Vallesi, A., Barabasi, A.-L., Towlson, E.K., Santarnecchi, E.

Science Direct 15, 6 (2022)


In recent years, the possibility to noninvasively interact with the human brain has led to unprecedented diagnostic and therapeutic opportunities. However, the vast majority of approved interventions and approaches still rely on anatomical landmarks and rarely on the individual structure of networks in the brain, drastically reducing the potential efficacy of neuromodulation.


Here we implemented a target search algorithm leveraging on mathematical tools from Network Control Theory (NCT) and whole brain connectomics analysis. By means of computational simulations, we aimed to identify the optimal stimulation target(s)— at the individual brain level— capable of reaching maximal engagement of the stimulated networks’ nodes.


At the model level, in silico predictions suggest that stimulation of NCT-derived cerebral sites might induce significantly higher network engagement, compared to traditionally employed neuromodulation sites, demonstrating NCT to be a useful tool in guiding brain stimulation. Indeed, NCT allows us to computationally model different stimulation scenarios tailored on the individual structural connectivity profiles and initial brain states.


The use of NCT to computationally predict TMS pulse propagation suggests that individualized targeting is crucial for more successful network engagement. Future studies will be needed to verify such prediction in real stimulation scenarios.

MilkyBase, a database of human milk composition as a function of maternal-, infant- and measurement conditions

Tünde Pacza, Mayara L. Martins, Maha Rockaya, Katalin Müller, Ayan Chatterjee, Albert-László Barabási & József Baranyi

Scientific Data 9, 557 (2022)

This study describes the development of a database, called MilkyBase, of the biochemical composition of human milk. The data were selected, digitized and curated partly by machine-learning, partly manually from publications. The database can be used to find patterns in the milk composition as a function of maternal-, infant- and measurement conditions and as a platform for users to put their own data in the format shown here. The database is an Excel workbook of linked sheets, making it easy to input data by non-computationally minded nutritionists. The hierarchical organisation of the fields makes sure that statistical inference methods can be programmed to analyse the data. Uncertainty quantification and recording dynamic (time-dependent) compositions offer predictive potentials.

Research gaps and opportunities in precision nutrition: an NIH workshop report

Bruce Y Lee, José M Ordovás, Elizabeth J Parks, Cheryl A M Anderson, Albert-László Barabási, Steven K Clinton, Kayla de la Haye, Valerie B Duffy, Paul W Franks, Elizabeth M Ginexi, Kristian J Hammond, Erin C Hanlon, Michael Hittle, Emily Ho, Abigail L Horn, Richard S Isaacson, Patricia L Mabry, Susan Malone, Corby K Martin, Josiemer Mattei, Simin Nikbin Meydani, Lorene M Nelson, Marian L Neuhouser, Brendan Parent, Nicolaas P Pronk, Helen M Roche, Suchi Saria, Frank A J L Scheer, Eran Segal, Mary Ann Sevick, Tim D Spector, Linda B Van Horn, Krista A Varady, Venkata Saroja Voruganti, Marie F Martinez

Amer J Clinical Nutrition, 116, 6 (2022)

Precision nutrition is an emerging concept that aims to develop nutrition recommendations tailored to different people's circumstances and biological characteristics. Responses to dietary change and the resulting health outcomes from consuming different diets may vary significantly between people based on interactions between their genetic backgrounds, physiology, microbiome, underlying health status, behaviors, social influences, and environmental exposures. On January 11–12, 2021, the National Institutes of Health convened a workshop entitled “Precision Nutrition: Research Gaps and Opportunities” to bring together experts to discuss the issues involved in better understanding and addressing precision nutrition. The Workshop proceeded in three parts: Part I covered many aspects of genetics and physiology that mediate the links between nutrient intake and health conditions such as cardiovascular disease, Alzheimer's disease, and cancer. Part II reviewed potential contributors to interindividual variability in dietary exposures and responses such as baseline nutritional status, circadian rhythm/sleep, environmental exposures, sensory properties of food, stress, inflammation, and the social determinants of health. Part III presented the need for systems approaches, with new methods and technologies that can facilitate the study and implementation of precision nutrition, and workforce development needed to create a new generation of researchers. The workshop concluded that much research will be needed before more precise nutrition recommendations can be achieved. This includes better understanding and accounting for variables such as age, sex, ethnicity, medical history, genetics, and social and environmental factors. The advent of new methods and technologies and the availability of considerably more data bring tremendous opportunity. However, the field must proceed with appropriate levels of caution and make sure the factors listed above are all considered, and systems approaches, and methods are incorporated. It will be important to develop and train an expanded workforce with the goal of reducing health disparities and improving precision nutritional advice for all Americans.

Visualizing Novel Connections and Genetic Similarities Across Diseases Using a Network Medicine Based Approach

Ferolito, B., Do Valle, I.F., Gerlovin, H., Costa, L., Casas, JP, Gaziano, J.M., Gagnon, D.R., Begoli, E. B., Barabasi, A.-L., Cho, K.

Scientific Reports 12, 14914 (2022)

Understanding the genetic relationships between human disorders could lead to better treatment and prevention strategies, especially for individuals with multiple comorbidities. A common resource for studying genetic-disease relationships is the GWAS Catalog, a large and well curated repository of SNP-trait associations from various studies and populations. Some of these populations are contained within mega-biobanks such as the Million Veteran Program (MVP), which has enabled the genetic classification of several diseases in a large well-characterized and heterogeneous population. Here we aim to provide a network of the genetic relationships among diseases and to demonstrate the utility of quantifying the extent to which a given resource such as MVP has contributed to the discovery of such relations. We use a network-based approach to evaluate shared variants among thousands of traits in the GWAS Catalog repository. Our results indicate many more novel disease relationships that did not exist in early studies and demonstrate that the network can reveal clusters of diseases mechanistically related. Finally, we show novel disease connections that emerge when MVP data is included, highlighting methodology that can be used to indicate the contributions of a given biobank.

Identification of potent inhibitors of SARS-CoV-2 infection by combined pharmacological evaluation and cellular network prioritization

J.J. Patten, Patrick T. Keiser, Deisy Morselli-Gysi, Giulia Menichetti, Hiroyuki Mori, Callie J. Donahue, Xiao Gan, Italo do Valle, Kathleen Geoghegan-Barek, Manu Anantpadma, RuthMabel Boytz, Jacob L. Berrigan, Sarah H. Stubbs, Tess Ayazika, Colin O’Leary, Sallieu Jalloh, Florence Wagner, Seyoum Ayehunie, Stephen J. Elledge, Deborah Anderson, Joseph Loscalzo, Marinka Zitnik, Suryaram Gummuluru, Mark N. Namchuk, Albert-László Barabási and Robert A. Davey

iScience 25, 9 (2022)

Pharmacologically active compounds with known biological targets were evaluated for inhibition of SARS-CoV-2 infection in cell and tissue models to help identify potent classes of active small molecules and to better understand host-virus interactions. We evaluated 6,710 clinical and preclinical compounds targeting 2,183 host proteins by immunocytofluorescence-based screening to identify SARS-CoV-2 infection inhibitors. Computationally integrating relationships between small molecule structure, dose-response antiviral activity, host target, and cell interactome produced cellular networks important for infection. This analysis revealed 389 small molecules with micromolar to low nanomolar activities, representing >12 scaffold classes and 813 host targets. Representatives were evaluated for mechanism of action in stable and primary human cell models with SARS-CoV-2 variants and MERS-CoV. One promising candidate, obatoclax, significantly reduced SARS-CoV-2 viral lung load in mice. Ultimately, this work establishes a rigorous approach for future pharmacological and computational identification of host factor dependencies and treatments for viral diseases.

Network-medicine framework for studying disease trajectories in U.S. veterans

Do Valle, I.F., Ferolito, B., Gerlovin, H., Costa, L., Demissie, S., Linares, F., Cohen, J., Gagnon, D.R., Gaziano, J.M., Begoli, E., Cho, K., Barabasi, A.-L.

Scientific Reports 12, 12018 (2022)

A better understanding of the sequential and temporal aspects in which diseases occur in patient’s lives is essential for developing improved intervention strategies that reduce burden and increase the quality of health services. Here we present a network-based framework to study disease relationships using Electronic Health Records from > 9 million patients in the United States Veterans Health Administration (VHA) system. We create the Temporal Disease Network, which maps the sequential aspects of disease co-occurrence among patients and demonstrate that network properties reflect clinical aspects of the respective diseases. We use the Temporal Disease Network to identify disease groups that reflect patterns of disease co-occurrence and the flow of patients among diagnoses. Finally, we define a strategy for the identification of trajectories that lead from one disease to another. The framework presented here has the potential to offer new insights for disease treatment and prevention in large health care systems.

Nutrient concentrations in food display universal behaviour

Giulia Menichetti and Albert-László Barabási

Nature Food 3, 75–382 (2022)

Extensive programmes around the world endeavour to measure and catalogue the composition of food. Here we analyse the
nutrient content of the full US food supply and show that the concentration of each nutrient follows a universal single-parameter
scaling law that accurately captures the eight orders of magnitude in nutrient content variability. We show that the universality
is rooted in the biochemical constraints obeyed by the metabolic pathways responsible for nutrient modulation, allowing us to
confirm the empirically observed scaling law and to predict its variability in agreement with the data. We propose that the natu-
ral nutrient variability in food can be quantitatively formalized. This provides a mathematical rationale for imputing missing
values in food composition databases and paves the way towards a quantitative understanding of the impact of food processing
on nutrient balance and health effects.

Dynamics of ranking

Gerardo Iñiguez, Carlos Pineda, Carlos Gershenson, & Albert-László Barabási

Nature Communications 13, 1646 (2022)

Virtually anything can be and is ranked; people, institutions, countries, words, genes. Rankings reduce complex systems to ordered lists, reflecting the ability of their elements to perform relevant functions, and are being used from socioeconomic policy to knowledge extraction. A century of research has found regularities when temporal rank data is aggregated. Far less is known, however, about how rankings change in time. Here we explore the dynamics of 30 rankings in natural, social, economic, and infrastructural systems, comprising millions of elements and timescales from minutes to centuries. We find that the flux of new elements determines the stability of a ranking: for high flux only the top of the list is stable, otherwise top and bottom are equally stable. We show that two basic mechanisms — displacement and replacement of elements — capture empirical ranking dynamics. The model uncovers two regimes of behavior; fast and large rank changes, or slow diffusion. Our results indicate that the balance between robustness and adaptability in ranked systems might be governed by simple random processes irrespective of system details.

Recovery coupling in multilayer networks

Michael M. Danziger & Albert-László Barabási

Nature Communications 13, 955 (2022)

The increased complexity of infrastructure systems has resulted in critical interdependencies between multiple networks—communication systems require electricity, while the normal functioning of the power grid relies on communication systems. These interdependencies have inspired an extensive literature on coupled multilayer networks, assuming a hard interdependence, where a component failure in one network causes failures in the other network, resulting in a cascade of failures across multiple systems. While empirical evidence of such hard failures is limited, the repair and recovery of a network requires resources typically supplied by other networks, resulting in documented interdependencies induced by the recovery process. In this work, we explore recovery coupling, capturing the dependence of the recovery of one system on the instantaneous functional state of another system. If the support networks are not functional, recovery will be slowed. Here we collected data on the recovery time of millions of power grid failures, finding evidence of universal nonlinear behavior in recovery following large perturbations. We develop a theoretical framework to address recovery coupling, predicting quantitative signatures different from the multilayer cascading failures. We then rely on controlled natural experiments to separate the role of recovery coupling from other effects like resource limitations, offering direct evidence of how recovery coupling affects a system’s functionality.

Quantifying NFT‑driven networks in crypto art

Kishore Vasan, Milán Janosov & Albert‑László Barabási

Scientific Reports 12, 2769 (2022)

The evolution of the art ecosystem is driven by largely invisible networks, defined by undocumented interactions between artists, institutions, collectors and curators. The emergence of cryptoart, and the NFT-based digital marketplace around it, offers unprecedented opportunities to examine the mechanisms that shape the evolution of networks that define artistic practice. Here we mapped the Foundation platform, identifying over 48,000 artworks through the associated NFTs listed by over 15,000 artists, allowing us to characterize the patterns that govern the networks that shape artistic success. We find that NFT adoption by both artists and collectors has undergone major changes, starting with a rapid growth that peaked in March 2021 and the emergence of a new equilibrium in June. Despite significant changes in activity, the average price of the sold art remained largely unchanged, with the price of an artist’s work fluctuating in a range that determines his or her reputation. The artist invitation network offers evidence of rich and poor artist clusters, driven by homophily, indicating that the newly invited artists develop similar engagement and sales patterns as the artist who invited them. We find that successful artists receive disproportional, repeated investment from a small group of collectors, underscoring the importance of artist–collector ties in the digital marketplace. These reproducible patterns allow us to characterize the features, mechanisms, and the networks enabling the success of individual artists, a quantification necessary to better understand the emerging NFT ecosystem.

Network medicine framework for identifying drug-repurposing opportunities for COVID-19

Deisy Morselli Gysi, Ítalo do Valle, Marinka Zitnik, Asher Ameli, Xiao Gan, Onur Varol, Susan Dina Ghiassian, J. J. Patten, Robert A. Davey, Joseph Loscalzo, and Albert-László Barabási

PNAS May 11, 2021 118 (19) e2025581118

The COVID-19 pandemic has highlighted the need to quickly and reliably prioritize clinically approved compounds for their potential effectiveness for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. Here, we deployed algorithms relying on artificial intelligence, network diffusion, and network proximity, tasking each of them to rank 6,340 drugs for their expected efficacy against SARS-CoV-2. To test the predictions, we used as ground truth 918 drugs experimentally screened in VeroE6 cells, as well as the list of drugs in clinical trials that capture the medical community’s assessment of drugs with potential COVID-19 efficacy. We find that no single predictive algorithm offers consistently reliable outcomes across all datasets and metrics. This outcome prompted us to develop a multimodal technology that fuses the predictions of all algorithms, finding that a consensus among the different predictive methods consistently exceeds the performance of the best individual pipelines. We screened in human cells the top-ranked drugs, obtaining a 62% success rate, in contrast to the 0.8% hit rate of nonguided screenings. Of the six drugs that reduced viral infection, four could be directly repurposed to treat COVID-19, proposing novel treatments for COVID-19. We also found that 76 of the 77 drugs that successfully reduced viral infection do not bind the proteins targeted by SARS-CoV-2, indicating that these network drugs rely on network-based mechanisms that cannot be identified using docking-based strategies. These advances offer a methodological pathway to identify repurposable drugs for future pathogens and neglected diseases underserved by the costs and extended timeline of de novo drug development.

Network medicine framework shows that proximity of polyphenol targets and disease proteins predicts therapeutic effects of polyphenols

Italo F. do Valle, Harvey G. Roweth, Michael W. Malloy, Sofia Moco, Denis Barron, Elisabeth Battinelli, Joseph Loscalzo & Albert-László Barabási

Nature Food 2, 143–155(2021)

Polyphenols, natural products present in plant-based foods, play a protective role against several complex diseases through their antioxidant activity and by diverse molecular mechanisms. Here we develop a network medicine framework to uncover mechanisms for the effects of polyphenols on health by considering the molecular interactions between polyphenol protein targets and proteins associated with diseases. We find that the protein targets of polyphenols cluster in specific neighbourhoods of the human interactome, whose network proximity to disease proteins is predictive of the molecule’s known therapeutic effects. The methodology recovers known associations, such as the effect of epigallocatechin-3-O-gallate on type 2 diabetes, and predicts that rosmarinic acid has a direct impact on platelet function, representing a novel mechanism through which it could affect cardiovascular health. We experimentally confirm that rosmarinic acid inhibits platelet aggregation and α-granule secretion through inhibition of protein tyrosine phosphorylation, offering direct support for the predicted molecular mechanism. Our framework represents a starting point for mechanistic interpretation of the health effects underlying food-related compounds, allowing us to integrate into a predictive framework knowledge on food metabolism, bioavailability and drug interaction.

A wealth of discovery built on the Human Genome Project — by the numbers

Alexander J. Gates, Deisy Morselli Gysi, Manolis Kellis & Albert-László Barabási

Nature 590, 212-215 (2021)

The 20th anniversary of the publication of the first draft of the human genome offers an opportunity to track how the project has empowered research into the genetic roots of human disease, changed drug discovery and helped to revise the idea of the gene itself.

Here we distill these impacts and trends. We combined several data sets to quantify the different types of genetic element that have been discovered and that generated publications, and how the pattern of discovery and publishing has changed over the years. Our analysis linked together data including RNA transcripts; around 1 million single nucleotide polymorphisms (SNPs); human diseases with documented genetic roots; approved and experimental pharmaceuticals; and scientific publications between 1900 and 2017.

Social network structure and composition in former NFL football players

Amar Dhand, Liam McCafferty, Rachel Grashow, Ian M. Corbin, Sarah Cohan, Alicia J. Whittington, Ann Connor, Aaron Baggish, Mark Weisskopf, Ross Zafonte, Alvaro Pascual-Leone & Albert-László Barabási

Scientific Reports 11, 1630 (2021)

Social networks have broad effects on health and quality of life. Biopsychosocial factors may also modify the effects of brain trauma on clinical and pathological outcomes. However, social network characterization is missing in studies of contact sports athletes. Here, we characterized the personal social networks of former National Football League players compared to non-football US males. In 303 former football players and 269 US males, we found that network structure (e.g., network size) did not differ, but network composition (e.g., proportion of family versus friends) did differ. Football players had more men than women, and more friends than family in their networks compared to US males. Black players had more racially diverse networks than White players and US males. These results are unexpected because brain trauma and chronic illnesses typically cause diminished social relationships. We anticipate our study will inform more multi-dimensional study of, and treatment options for, contact sports athletes. For example, the strong allegiances of former athletes may be harnessed in the form of social network interventions after brain trauma. Because preserving health of contact sports athletes is a major goal, the study of social networks is critical to the design of future research and treatment trials.

Uncovering the genetic blueprint of the C. elegans nervous system

István A. Kovács, Dániel L. Barabási, and Albert-László Barabási

PNAS December 29, 2020 117 (52) 33570-33577

A fundamental question of neuroscience is how the brain wires itself. Here, we propose a modeling framework that explains how cellular connectivity emerges from neuronal identity, allowing us to offer experimentally falsifiable predictions on the genetic encoding of the connectome. The rapid advances in brain science require quantitative frameworks to integrate genetic and connectome information. The proposed model responds to this need, helping us unveil the genetically driven mechanisms that govern the formation of individual links in the brain.

A systematic comprehensive longitudinal evaluation of dietary factors associated with acute myocardial infarction and fatal coronary heart disease

Soodabeh Milanlouei, Giulia Menichetti, Yanping Li, Joseph Loscalzo, Walter C. Willett & Albert-László Barabási

Nature Communications volume 11, Article number: 6074 (2020)

Environmental factors, and in particular diet, are known to play a key role in the development of Coronary Heart Disease. Many of these factors were unveiled by detailed nutritional epidemiology studies, focusing on the role of a single nutrient or food at a time. Here, we apply an Environment-Wide Association Study approach to Nurses’ Health Study data to explore comprehensively and agnostically the association of 257 nutrients and 117 foods with coronary heart disease risk (acute myocardial infarction and fatal coronary heart disease). After accounting for multiple testing, we identify 16 food items and 37 nutrients that show statistically significant association – while adjusting for potential confounding and control variables such as physical activity, smoking, calorie intake, and medication use – among which 38 associations were validated in Nurses’ Health Study II. Our implementation of Environment-Wide Association Study successfully reproduces prior knowledge of diet-coronary heart disease associations in the epidemiological literature, and helps us detect new associations that were only marginally studied, opening potential avenues for further extensive experimental validation. We also show that Environment-Wide Association Study allows us to identify a bipartite food-nutrient network, highlighting which foods drive the associations of specific nutrients with coronary heart disease risk.

Isotopy and energy of physical networks

Yanchen Liu, Nima Dehmamy & Albert-László Barabási

Nature Physics (2020)

While the structural characteristics of a network are uniquely determined by its adjacency matrix, in physical networks, such as the brain or the vascular system, the network’s three-dimensional layout also affects the system’s structure and function. We lack, however, the tools to distinguish physical networks with identical wiring but different geometrical layouts. To address this need, here we introduce the concept of network isotopy, representing different network layouts that can be transformed into one another without link crossings, and show that a single quantity, the graph linking number, captures the entangledness of a layout, defining distinct isotopy classes. We find that a network’s elastic energy depends linearly on the graph linking number, indicating that each local tangle offers an independent contribution to the total energy. This finding allows us to formulate a statistical model for the formation of tangles in physical networks. We apply the developed framework to a diverse set of real physical networks, finding that the mouse connectome is more entangled than expected based on optimal wiring.

Exploring food contents in scientific literature with foodMine

Forrest Hooton, Giulia Menichetti & Albert‐László Barabási

Scientific Reports volume 10, Article number: 16191 (2020)

Thanks to the many chemical and nutritional components it carries, diet critically affects human health. However, the currently available comprehensive databases on food composition cover only a tiny fraction of the total number of chemicals present in our food, focusing on the nutritional components essential for our health. indeed, thousands of other molecules, many of which have well documented health implications, remain untracked. to explore the body of knowledge available on food composition, we built foodMine, an algorithm that uses natural language processing to identify papers from pubMed that potentially report on the chemical composition of garlic and cocoa. After extracting from each paper information on the reported quantities of chemicals, we find that the scientific literature carries extensive information on the detailed chemical components of food that is currently not integrated in databases. finally, we use unsupervised machine learning to create chemical embeddings, finding that the chemicals identified by FoodMine tend to have direct health relevance, reflecting the scientific community’s focus on health-related chemicals in our food.

Science, advocacy, and quackery in nutritional books: an analysis of conflicting advice and purported claims of nutritional best-sellers

Rebecca M. Marton, Xindi Wang, Albert-László Barabási & John P. A. Ioannidis

Palgrave Communications volume 6, Article number: 43 (2020)

Nutritional decisions may be important for health, and yet identifying trustworthy sources of advice can be difficult to achieve. Many people turn to books for nutritional advice, making the contents of these books and the expertise of their authors relevant to public health. Here, the top 100 best-selling books were identified and assessed for both the claims they make in their summaries and the credentials of the authors. Weight loss was a common theme in the summaries of nutritional best-selling books. In addition to weight loss, 31 of the books promised to cure or prevent a host of diseases, including diabetes, heart disease, cancer, and dementia; however, the nutritional advice given to achieve these outcomes varied widely in terms of which types of foods should be consumed or avoided and this information was often contradictory between books. Recommendations regarding the consumption of carbohydrates, dairy, proteins, and fat in particular differed greatly between books. To determine the qualifications of each author in making nutritional claims, the highest earned degree and listed occupations of each author was researched and analyzed. Out of 83 unique authors, 33 had an M.D. or Ph.D degree. Twenty-eight of the authors were physicians, three were dietitians, and other authors held a wide range of jobs, including personal trainers, bloggers, and actors. Of 20 authors who had or claimed university affiliations, seven had a current university appointment that could be verified online in university directories. This study illuminates the range of the incongruous information being dispersed to the public and emphasizes the need for future efforts to improve the dissemination of sound nutritional advice.

Historical comparison of gender inequality in scientific careers across countries and disciplines

Junming Huang, Alexander J. Gates, Roberta Sinatra, and Albert-László Barabási

PNAS March 3, 2020 117 (9) 4609-4616

There is extensive, yet fragmented, evidence of gender differences in academia suggesting that women are underrepresented in most scientific disciplines and publish fewer articles throughout a career, and their work acquires fewer citations. Here, we offer a comprehensive picture of longitudinal gender differences in performance through a bibliometric analysis of academic publishing careers by reconstructing the complete publication history of over 1.5 million gender-identified authors whose publishing career ended between 1955 and 2010, covering 83 countries and 13 disciplines. We find that, paradoxically, the increase of participation of women in science over the past 60 years was accompanied by an increase of gender differences in both productivity and impact. Most surprisingly, though, we uncover two gender invariants, finding that men and women publish at a comparable annual rate and have equivalent career-wise impact for the same size body of work. Finally, we demonstrate that differences in publishing career lengths and dropout rates explain a large portion of the reported career-wise differences in productivity and impact, although productivity differences still remain. This comprehensive picture of gender inequality in academia can help rephrase the conversation around the sustainability of women’s careers in academia, with important consequences for institutions and policy makers.

The exposome and health: Where chemistry meets biology

Roel Vermeulen, Emma L. Schymanski, Albert-László Barabási, Gary W. Miller

Science 24 Jan 2020: 367, 6476, 392-396

Despite extensive evidence showing that exposure to specific chemicals can lead to disease, current research approaches and regulatory policies fail to address the chemical complexity of our world. To safeguard current and future generations from the increasing number of chemicals polluting our environment, a systematic and agnostic approach is needed. The “exposome” concept strives to capture the diversity and range of exposures to synthetic chemicals, dietary constituents, psychosocial stressors, and physical factors, as well as their corresponding biological responses. Technological advances such as high-resolution mass spectrometry and network science have allowed us to take the first steps toward a comprehensive assessment of the exposome. Given the increased recognition of the dominant role that nongenetic factors play in disease, an effort to characterize the exposome at a scale comparable to that of the human genome is warranted.

Understanding the Representation Power of Graph Neural Networks in Learning Graph Topology

Nima Dehmamy, Albert-László Barabási, Rose Yu

NeurIPS 32 2019

To deepen our understanding of graph neural networks, we investigate the representation power of Graph Convolutional Networks (GCN) through the looking glass of graph moments, a key property of graph topology encoding path of various lengths. We find that GCNs are rather restrictive in learning graph moments. Without careful design, GCNs can fail miserably even with multiple layers and nonlinear activation functions. We analyze theoretically the expressiveness of GCNs, concluding that a modular GCN design, using different propagation rules with residual connections could significantly improve the performance of GCN. We demonstrate that such modular designs are capable of distinguishing graphs from different graph generation models for surprisingly small graphs, a notoriously difficult problem in network science. Our investigation suggests that, depth is muchmore influential than width, with deeper GCNs being more capable of learning higher order graph moments. Additionally, combining GCN modules with different propagation rules is critical to the representation power of GCNs.

The unmapped chemical complexity of our diet

Albert-László Barabási, Giulia Menichetti & Joseph Loscalzo

Nature Food 1, 33-37 (2019)

Our understanding of how diet affects health is limited to 150 key nutritional components that are tracked and catalogued by the United States Department of Agriculture and other national databases. Although this knowledge has been transformative for health sciences, helping unveil the role of calories, sugar, fat, vitamins and other nutritional factors in the emergence of common diseases, these nutritional components represent only a small fraction of the more than 26,000 distinct, definable biochemicals present in our food—many of which have documented effects on health but remain unquantified in any systematic fashion across different individual foods. Using new advances such as machine learning, a high-resolution library of these biochemicals could enable the systematic study of the full biochemical spectrum of our diets, opening new avenues for understanding the composition of what we eat, and how it affects health and disease.

A Genetic Model of the Connectome

Dániel L. Barabási, Albert-László Barabási

Neuron 105, 1-11 2019

The connectomes of organisms of the same species show remarkable architectural and often local wiring similarity, raising the question: where and how is neuronal connectivity encoded? Here, we start from the hypothesis that the genetic identity of neurons guides synapse and gap-junction formation and show that such genetically driven wiring predicts the existence of specific biclique motifs in the connectome. We identify a family of large, statistically significant biclique subgraphs in the connectomes of three species and show that within many of the observed bicliques the neurons share statistically significant expression patterns and morphological characteristics, supporting our expectation of common genetic factors that drive the synapse formation within these subgraphs. The proposed connectome model offers a self-consistent framework to link the genetics of an organism to the reproducible architecture of its connectome, offering experimentally falsifiable predictions on the genetic factors that drive the formation of individual neuronal circuits.

Synthetic ablations in the C. elegans nervous system

Emma K. Towlson and Albert-László Barabási

Network Neuroscience 2020, pp. 1–17

Synthetic lethality, the finding that the simultaneous knockout of two or more individually nonessential genes leads to cell or organism death, has offered a systematic framework to explore cellular function, and also offered therapeutic applications. Yet the concept lacks its parallel in neuroscience—a systematic knowledge base on the role of double or higher order ablations in the functioning of a neural system. Here, we use the framework of network control to systematically predict the effects of ablating neuron pairs and triplets on the gentle touch response. We find that surprisingly small sets of 58 pairs and 46 triplets can reduce muscle controllability in this context, and that these sets are localized in the nervous system in distinct groups. Further, they lead to highly specific experimentally testable predictions about mechanisms of loss of control, and which muscle cells are expected to experience this loss.

Nature’s reach: narrow work has broad impact

Alexander J. Gates, Qing Ke, Onur Varol & Albert-László Barabási

Nature 575, 32-34 (2019)

How knowledge informs and alters disciplines is itself an enlightening, and vibrant field. This type of meta research into new findings, insights, conceptual frameworks and techniques is important, among other things, for policymakers who fund research in the hope of tackling society’s most pressing challenges, which inevitably span disciplines.

Since its founding in 1869, Nature has offered a venue for publishing major advances from many fields. To mark its anniversary, we track here how papers cite and are cited across disciplines, using data on tens of millions of scientific articles indexed in Clarivate Analytics’ Web of Science (WoS), a bibliometric database that encompasses many thousands of research journals starting from 1900. We pay particular attention to articles that appeared in Nature. In our view, this snapshot, for all its idiosyncrasies, reveals how scientific work is ever more becoming a mixture of disciplines.

Success in books: predicting book sales before publication

Xindi Wang, Burcu Yucesoy, Onur Varol, Tina Eliassi-Rad, Albert-László Barabási

EPJ Data Science 8: 31 (2019)

Reading remains a preferred leisure activity fueling an exceptionally competitive publishing market: among more than three million books published each year, only a tiny fraction are read widely. It is largely unpredictable, however, which book will that be, and how many copies it will sell. Here we aim to unveil the features that affect the success of books by predicting a book’s sales prior to its publication. We do so by employing the Learning to Place machine learning approach, that can predicts sales for both fiction and nonfiction books as well as explaining the predictions by comparing and contrasting each book with similar ones. We analyze features contributing to the success of a book by feature importance analysis, finding that a strong driving factor of book sales across all genres is the publishing house. We also uncover differences between genres: for thrillers and mystery, the publishing history of an author (as measured by previous book sales) is highly important, while in literary fiction and religion, the author’s visibility plays a more central role. These observations provide insights into the driving forces behind success within the current publishing industry, as well as how individuals choose what books to read.

Network-based prediction of protein interactions

István A. Kovács, Katja Luck, Kerstin Spirohn, Yang Wang, Carl Pollis, Sadie Schlabach, Wenting Bian, Dae-Kyum Kim, Nishka Kishore, Tong Hao, Michael A. Calderwood, Marc Vidal & Albert-László Barabási

Nature Communications 10, Article number: 1240 (2019)

Despite exceptional experimental efforts to map out the human interactome, the continued data incompleteness limits our ability to understand the molecular roots of human disease. Computational tools offer a promising alternative, helping identify biologically significant, yet unmapped protein-protein interactions (PPIs). While link prediction methods connect proteins on the basis of biological or network-based similarity, interacting proteins are not necessarily similar and similar proteins do not necessarily interact. Here, we offer structural and evolutionary evidence that proteins interact not if they are similar to each other, but if one of them is similar to the other’s partners. This approach, that mathematically relies on network paths of length three (L3), significantly outperforms all existing link prediction methods. Given its high accuracy, we show that L3 can offer mechanistic insights into disease mechanisms and can complement future experimental efforts to complete the human interactome.

Network-based prediction of drug combinations

Feixiong Chen, István A. Kovács & Albert László Barabási

Nature Communications 10, 1197 (2019)

Drug combinations, offering increased therapeutic efficacy and reduced toxicity, play an important role in treating multiple complex diseases. Yet, our ability to identify and validate effective combinations is limited by a combinatorial explosion, driven by both the large number of drug pairs as well as dosage combinations. Here we propose a network-based methodology to identify clinically efficacious drug combinations for specific diseases. By quantifying the network-based relationship between drug targets and disease proteins in the human protein–protein interactome, we show the existence of six distinct classes of drug–drug–disease combinations. Relying on approved drug combinations for hypertension and cancer, we find that only one of the six classes correlates with therapeutic effects: if the targets of the drugs both hit disease module, but target separate neighborhoods. This finding allows us to identify and validate antihypertensive combinations, offering a generic, powerful network methodology to identify efficacious combination therapies in drug development.

Taking Census of Physics

Federico Battiston, Federico Musciotto, Dashun Wang, Albert-László Barabási, Michael Szell, and Roberta Sinatra

Nature Reviews Physics 1, 89-97 (2019)

Over the past decades, the diversity of areas explored by physicists has exploded, encompassing new topics from biophysics and chemical physics to network science. However, it is unclear how these new subfields emerged from the traditional subject areas and how physicists explore them. To map out the evolution of physics subfields, here, we take an intellectual census of physics by studying physicists’ careers. We use a large-scale publication data set, identify the subfields of 135,877 physicists and quantify their heterogeneous birth, growth and migration patterns among research areas. We find that the majority of physicists began their careers in only three subfields, branching out to other areas at later career stages, with different rates and transition times. Furthermore, we analyse the productivity, impact and team sizes across different subfields, finding drastic changes attributable to the recent rise in large-scale collaborations. This detailed, longitudinal census of physics can inform resource allocation policies and provide students, editors and scientists with a broader view of the field’s internal dynamics.

The Chaperone Effect in Scientific Publishing

Vedran Sekara, Pierre Deville, Sebastian E. Ahnert, Albert-László Barabási, Roberta Sinatra, and Sune Lehmann

PNAS 115:50, 12603-12607 (2018)

Experience plays a critical role in crafting high-impact scientific work. This is particularly evident in top multidisciplinary journals, where a scientist is unlikely to appear as senior author if he or she has not previously published within the same journal. Here, we develop a quantitative understanding of author order by quantifying this “chaperone effect,” capturing how scientists transition into senior status within a particular publication venue. We illustrate that the chaperone effect has a different magnitude for journals in different branches of science, being more pronounced in medical and biological sciences and weaker in natural sciences. Finally, we show that in the case of high-impact venues, the chaperone effect has significant implications, specifically resulting in a higher average impact relative to papers authored by new principal investigators (PIs). Our findings shed light on the role played by experience in publishing within specific scientific journals, on the paths toward acquiring the necessary experience and expertise, and on the skills required to publish in prestigious venues.

The Universal Decay of Collective Memory and Attention

Cristian Candia, C. Jara-Figueroa, Carlos Rodriguez-Sickert, Albert-László Barabási, and César A. Hidalgo

Nature Human Behavior 3, 82–91 (2019)

Collective memory and attention are sustained by two channels: oral communication (communicative memory) and the physical recording of information (cultural memory). Here, we use data on the citation of academic articles and patents, and on the online attention received by songs, movies and biographies, to describe the temporal decay of the attention received by cultural products. We show that, once we isolate the temporal dimension of the decay, the attention received by cultural products decays following a universal biexponential function. We explain this universality by proposing a mathematical model based on communicative and cultural memory, which fits the data better than previously proposed log-normal and exponential models. Our results reveal that biographies remain in our communicative memory the longest (20–30 years) and music the shortest (about 5.6 years). These findings show that the average attention received by cultural products decays following a universal biexponential function.

A Structural Transition in Physical Networks

Nima Dehmamy, Soodabeh Milanlouei & Albert-László Barabási

Nature 563, pages676–680 (2018)

In many physical networks, including neurons in the brain three-dimensional integrated circuits and underground hyphal networks, the nodes and links are physical objects that cannot intersect or overlap with each other. To take this into account, non-crossing conditions can be imposed to constrain the geometry of networks, which consequently affects how they form, evolve and function. However, these constraints are not included in the theoretical frameworks that are currently used to characterize real networks. Most tools for laying out networks are variants of the force-directed layout algorithm—which assumes dimensionless nodes and links—and are therefore unable to reveal the geometry of densely packed physical networks. Here we develop a modelling framework that accounts for the physical sizes of nodes and links, allowing us to explore how non-crossing conditions affect the geometry of a network. For small link thicknesses, we observe a weakly interacting regime in which link crossings are avoided via local link rearrangements, without altering the overall geometry of the layout compared to the force-directed layout. Once the link thickness exceeds a threshold, a strongly interacting regime emerges in which multiple geometric quantities, such as the total link length and the link curvature, scale with the link thickness. We show that the crossover between the two regimes is driven by the non-crossing condition, which allows us to derive the transition point analytically and show that networks with large numbers of nodes will ultimately exist in the strongly interacting regime. We also find that networks in the weakly interacting regime display a solid-like response to stress, whereas in the strongly interacting regime they behave in a gel-like fashion. Networks in the weakly interacting regime are amenable to 3D printing and so can be used to visualize network geometry, and the strongly interacting regime provides insights into the scaling of the sizes of densely packed mammalian brains.

Quantifying Reputation and Success in Art

Samuel P. Fraiberger, Roberta Sinatra, Magnus Resch, Christoph Riedl, Albert-László Barabási

Science 08 Nov 2018: eaau7224 DOI: 10.1126/science.aau7224

In areas of human activity where performance is difficult to quantify in an objective fashion, reputation and networks of influence play a key role in determining access to resources and rewards. To understand the role of these factors, we reconstructed the exhibition history of half a million artists, mapping out the coexhibition network that captures the movement of art between institutions. Centrality within this network captured institutional prestige, allowing us to explore the career trajectory of individual artists in terms of access to coveted institutions. Early access to prestigious central institutions offered life-long access to high-prestige venues and reduced dropout rate. By contrast, starting at the network periphery resulted in a high dropout rate, limiting access to central institutions. A Markov model predicts the career trajectory of individual artists and documents the strong path and history dependence of valuation in art.

Functional Structures for US state governments

Stephen Kosack, Michele Coscia, Evann Smith, Kim Albrecht, Albert-László Barabási, and Ricardo Hausmann

Proceedings of the National Academy of Sciences Oct 2018, 201803228; DOI: 10.1073/pnas.1803228115


Governments in modern societies undertake an array of complex functions that shape politics and economics, individual and group behavior, and the natural, social, and built environment. How are governments structured to execute these diverse responsibilities? How do those structures vary, and what explains the differences? To examine these longstanding questions, we develop a technique for mapping Internet “footprint” of government with network science methods. We use this approach to describe and analyze the diversity in functional scale and structure among the 50 US state governments reflected in the webpages and links they have created online: 32.5 million webpages and 110 million hyperlinks among 47,631 agencies. We first verify that this extensive online footprint systematically reflects known characteristics: 50 hierarchically organized networks of state agencies that scale with population and are specialized around easily identifiable functions in accordance with legal mandates. We also find that the footprint reflects extensive diversity among these state functional hierarchies. We hypothesize that this variation should reflect, among other factors, state income, economic structure, ideology, and location. We find that government structures are most strongly associated with state economic structures, with location and income playing more limited roles. Voters’ recent ideological preferences about the proper roles and extent of government are not significantly associated with the scale and structure of their state governments as reflected online. We conclude that the online footprint of governments offers a broad and comprehensive window on how they are structured that can help deepen understanding of those structures.

Caenorhabditis elegans and the network control framework—FAQs

Emma K. Towlson, Petra E. Vértes, Gang Yan, Yee Lian Chew, Denise S. Walker, William R. Schafer, and Albert-László Barabási

Phil. Trans. R. Soc. B 373: 20170372

Control is essential to the functioning of any neural system. Indeed, under healthy conditions the brain must be able to continuously maintain a tight functional control between the system’s inputs and outputs. One may therefore hypothesize that the brain’s wiring is predetermined by the need to maintain control across multiple scales, maintaining the stability of key internal variables, and producing behaviour in response to environmental cues. Recent advances in network control have offered a powerful mathematical framework to explore the structure – function relationship in complex biological, social and technological networks, and are beginning to yield important and precise insights on neuronal systems. The network control paradigm promises a predictive, quantitative framework to unite the distinct datasets necessary to fully describe a nervous system, and provide mechanistic explanations for the observed structure and function relationships. Here, we provide a thorough review of the network control framework as applied to Caenorhabditis elegans (Yan et al. 2017 Nature 550,519 –523. (doi:10.1038/nature24056)), in the style of Frequently Asked Questions.We present the theoretical, computational and experimental aspects of network control, and discuss its current capabilities and limitations, together with the next likely advances and improvements. We further present thePython code to enable exploration of control principles in a manner specific to this prototypical organism.This article is part of a discussion meeting issue ‘Connectome to behaviour: modelling C. elegans at cellular resolution’.

Network-based approach to prediction and population-based validation of in silico drug repurposing

Feixiong Cheng, Rishi J. Desai, Diane E. Handy, Ruisheng Wang, Sebastian Schneeweiss, Albert-László Barabási & Joseph Loscalzo

Nature Communicationsvolume 9, Article number: 2691 (2018)

Here we identify hundreds of new drug-disease associations for over 900 FDA-approved drugs by quantifying the network proximity of disease genes and drug targets in the human (protein–protein) interactome. We select four network-predicted associations to test their causal relationship using large healthcare databases with over 220 million patients and state-of-the-art pharmacoepidemiologic analyses. Using propensity score matching, two of four network-based predictions are validated in patient-level data: carbamazepine is associated with an increased risk of coronary artery disease (CAD) [hazard ratio (HR) 1.56, 95% confidence interval (CI) 1.12–2.18], and hydroxychloroquine is associated with a decreased risk of CAD (HR 0.76, 95% CI 0.59–0.97). In vitro experiments show that hydroxychloroquine attenuates pro-inflammatory cytokine-mediated activation in human aortic endothelial cells, supporting mechanistically its potential beneficial effect in CAD. In summary, we demonstrate that a unique integration of protein-protein interaction network proximity and large-scale patient-level longitudinal data complemented by mechanistic in vitro studies can facilitate drug repurposing.

Predicting Perturbation Patterns from the Topology of Biological Networks

Marc Santolini and Albert-Laszlo Barabasi

PNAS | vol. 115 | no. 27 | E6375–E6383

High-throughput technologies, offering an unprecedented wealth of quantitative data underlying the makeup of living systems, are changing biology. Notably, the systematic mapping of the relationships between biochemical entities has fueled the rapid development of network biology, offering a suitable framework to describe disease phenotypes and predict potential drug targets. However, our ability to develop accurate dynamical models remains limited, due in part to the limited knowledge of the kinetic parameters underlying these interactions. Here, we explore the degree to which we can make reasonably accurate predictions in the absence of the kinetic parameters. We find that simple dynamically agnostic models are sufficient to recover the strength and sign of the biochemical perturbation patterns observed in 87 biological models for which the underlying kinetics are known. Surprisingly, a simple distance-based model achieves 65% accuracy. We show that this predictive power is robust to topological and kinetic parameter perturbations, and we identify key network properties that can increase up to 80% the recovery rate of the true perturbation patterns. We validate our approach using experimental data on the chemotactic pathway in bacteria, finding that a network model of perturbation spreading predicts with ∼80% accuracy the directionality of gene expression and phenotype changes in knock-out and overproduction experiments. These findings show that the steady advances in mapping out the topology of biochemical interaction networks opens avenues for accurate perturbation spread modeling, with direct implications for medicine and drug development.

Success In Books: A Big Data Approach to Bestsellers

Burcu Yucesoy, Xindi Wang, Junming Huan, Albert-Laszlo Barabasi

EPJ Data Science 7:7

Reading remains the preferred leisure activity for most individuals, continuing to offera unique path to knowledge and learning. As such, books remain an importantcultural product, consumed widely. Yet, while over 3 million books are published eachyear, very few are read widely and less than 500 make it to the New York Timesbestseller lists. And once there, only a handful of authors can command the lists formore than a few weeks. Here we bring a big data approach to book success byinvestigating the properties and sales trajectories of bestsellers. We find that there areseasonal patterns to book sales with more books being sold during holidays, andeven among bestsellers, fiction books sell more copies than nonfiction books. Generalfiction and biographies make the list more often than any other genre books, and thehigher a book’s initial place in the rankings, the longer the book stays on the list aswell. Looking at patterns characterizing authors, we find that fiction writers are moreproductive than nonfiction writers, commonly achieving bestseller status withmultiple books. Additionally, there is no gender disparity among bestselling fictionauthors but nonfiction, most bestsellers are written by male authors. Finally we findthat there is a universal pattern to book sales. Using this universality we introduce astatistical model to explain the time evolution of sales. This model not onlyreproduces the entire sales trajectory of a book but also predicts the total number ofcopies it will sell in its lifetime, based on its early sales numbers. The analysis of thebestseller characteristics and the discovery of the universal nature of sales patternswith its driving forces are crucial for our understanding of the book industry, andmore generally, of how we as a society interact with cultural products.

Science of Science

Santo Fortunato, Carl T. Bergstrom, Katy Borner, James A. Evans, Dirk Helbing, Stasa Milojevic, Alexander M. Petersen, Filippo Radicchi, Roberta Sinatra, Brian Uzzi, Alessandro Vespignani, Luda Waltman, Dashun Wang, Albert-Laszlo Barabasi

Science 359: 6379 (2018)

The science of science (SciSci) is based on a transdisciplinary approach that uses large data sets to study the mechanisms underlying the doing of science--from the choice of a research problem to career trajectories and progress within a field. In a Review, Fortunato et al. explain that the underlying rationale is that with a deeper understanding of the precursors of impactful science, it will be possible to develop systems and policies that improve each scientist's ability to succeed and enhance the prospects of science as a whole.

The Fundamental Advantages of Temporal Networks

A. Li, S. P. Cornelius, Y.-Y. Liu, L. Wang, A.-L. Barabasi

Science 358:6366, 1042-1046 (2017).

Most networked systems of scientific interest are characterized by temporal links, meaning the network’s structure changes over time. Link temporality has been shown to hinder many dynamical processes, from information spreading to accessibility, by disrupting network paths. Considering the ubiquity of temporal networks in nature, we ask: Are there any advantages of the networks’ temporality? We use an analytical framework to show that temporal networks can, compared to their static counterparts, reach controllability faster, demand orders of magnitude less control energy, and have control trajectories, that are considerably more compact than those characterizing static networks. Thus, temporality ensures a degree of flexibility that would be unattainable in static networks, enhancing our ability to control them.

Network Control Principles Predict Neuron Function in the Caenorhabditis elegans Connectome

G. Yan, P. E. Vertes, E. K. Towlson, Y. L. Chew, S. Walker, W. R. Schafer, A.-L. Barabasi

Nature 550, 519–523 (2017)

Recent studies on the controllability of complex systems offer a powerful mathematical framework to systematically explore the structure–function relationship in biological, social, and technological networks 1, 2, 3. Despite theoretical advances, we lack direct experimental proof of the validity of these widely used control principles. Here we fill this gap by applying a control framework to the connectome of the nematode Caenorhabditis elegans 4, 5, 6, allowing us to predict the involvement of each C. elegans neuron in locomotor behaviours. We predict that control of the muscles or motor neurons requires 12 neuronal classes, which include neuronal groups previously implicated in locomotion by laser ablation 7, 8, 9, 10, 11, 12, 13, as well as one previously uncharacterized neuron, PDB. We validate this prediction experimentally, finding that the ablation of PDB leads to a significant loss of dorsoventral polarity in large body bends. Importantly, control principles also allow us to investigate the involvement of individual neurons within each neuronal class. For example, we predict that, within the class of DD motor neurons, only three (DD04, DD05, or DD06) should affect locomotion when ablated individually. This prediction is also confirmed; single cell ablations of DD04 or DD05 specifically affect posterior body movements, whereas ablations of DD02 or DD03 do not. Our predictions are robust to deletions of weak connections, missing connections, and rewired connections in the current connectome, indicating the potential applicability of this analytical framework to larger and less well-characterized connectomes.

The Elegant Law that Governs Us All

A.-L. Barabasi

Science 357:6347 (2017)

A physicist probes a phenomenon seen in cells, cities, and almost everything in between.

Academia Under Fire in Hungary

A.-L. Barabasi

Science 356: 6338 (2017)

On 10 April, Hungarian President Janos Ader signed into law an amendment to the National Higher Education Law that would outlaw the Central European University (CEU). Although portrayed by the government as a purely administrative step, the "Lex-CEU" law is a strident attempt to curtail academic freedom and limit the independence of academic institutions.

Identifying and modeling the structural discontinuities of human interactions

S. Grauwin, M. Szell, S. Sobolevsky, P. Hovel, F. Simini, M. Vanhoof, Z. Smoreda, A.-L. Barabasi & C. Ratti

Scientific Reports 7: 46677 (2017)

The idea of a hierarchical spatial organization of society lies at the core of seminal theories in human geography that have strongly influenced our understanding of social organization. Along the same line, the recent availability of large-scale human mobility and communication data has offered novel quantitative insights hinting at a strong geographical confinement of human interactions within neighboring regions, extending to local levels within countries. However, models of human interaction largely ignore this effect. Here, we analyze several country-wide networks of telephone calls - both mobile and landline - and in either case uncover a systematic decrease of communication induced by borders we identify as the missing variable in state-of-the-art models. Using this empirical evidence, we propose an alternative modeling framework that naturally stylizes the damping effect of borders. We show that this new notion substantially improves the predictive power of widely used interaction models. This increases our ability to understand, model and predict social activities and to plan the development of infrastructures across multiple scales.

Integrating Personalized Gene Expression Profiles into Predictive Disease-associated Gene Pools

J. Menche, E. Guney, A. Sharma, P. J. Branigan, M. J. Loza, F. Baribaud, R. Dobrin, A.-L. Barabasi

Systems Biology and Applications 3:10 (2017)

Gene expression data are routinely used to identify genes that on average exhibit different expression levels between a case and a control group. Yet, very few of such differentially expressed genes are detectably perturbed in individual patients. Here, we develop a framework to construct personalized perturbation profiles for individual subjects, identifying the set of genes that are significantly perturbed in each individual. This allows us to characterize the heterogeneity of the molecular manifestations of complex diseases by quantifying the expression-level similarities of complex diseases by quantifying the expression-level similarities and differences among patients with the same phenotype. We show that despite the high heterogeneity of the individual perturbation profiles, patients with asthma, Parkinson and Huntington's disease share a broadpool of sporadically disease-associated genes, and that individuals with statistically significant overlap with this pool have a 80-100% chance of being diagnosed with the disease. The developed framework opens up the possibility to apply gene expression data in the context of precision medicine, with important implications for biomarker identification, drug development, diagnosis and treatment.

From Comorbidities of Chronic Obstructive Pulmonary Disease to Identification of Shared Molecular Mechanisms by Data Integration

D. Gomez-Cabrero, J. Menche, C. Vargas, I. Cano, D. Maier, A.-L. Barabasi, J. Tegner, J. Roca (Synergy-COPD Consortia)

BMC Bioinformatics 17: 1291 (2016)

Background Deep mining of healthcare data has provided maps of comorbidity relationships between diseases. In parallel, integrative multi-omics investigations have generated high-resolution molecular maps of putative relevance for understanding disease initiation and progression. Yet, it is unclear how to advance an observation of comorbidity relations (one disease to others) to a molecular understanding of the driver processes and associated biomarkers. Results Since Chronic Obstructive Pulmonary disease (COPD) has emerged as a central hub in temporal comorbidity networks, we developed a systematic integrative data-driven framework to identify shared disease-associated genes and pathways, as a proxy for the underlying generative mechanisms inducing comorbidity. We integrated records from approximately 13 M patients from the Medicare database with disease-gene maps that we derived from several resources including a semantic-derived knowledge-base. Using rank-based statistics we not only recovered known comorbidities but also discovered a novel association between COPD and digestive diseases. Furthermore, our analysis provides the first set of COPD co-morbidity candidate biomarkers, including IL15, TNF and JUP, and characterizes their association to aging and life-style conditions, such as smoking and physical activity. Conclusions The developed framework provides novel insights in COPD and especially COPD co-morbidity associated mechanisms. The methodology could be used to discover and decipher the molecular underpinning of other comorbidity relationships and furthermore, allow the identification of candidate co-morbidity biomarkers.

Quantifying the Evolution of Individual Scientific Impact

R. Sinatra, D. Wang, P. Deville, C. Song, A.-L. Barabasi

Science 4: 354, 6312 (November 2016)

Despite the frequent use of numerous quantitative indicators to gauge the professional impact of a scientist, little is known about how scientific impact emerges and evolves in time. Here, we quantify the changes in impact and productivity throughout a career in science, finding that impact, as measured by influential publications, is distributed randomly within a scientist’s sequence of publications. This random-impact rule allows us to formulate a stochastic model that uncouples the effects of productivity, individual ability, and luck and unveils the existence of universal patterns governing the emergence of scientific success. The model assigns a unique individual parameter Q to each scientist, which is stable during a career, and it accurately predicts the evolution of a scientist’s impact, from the h-index to cumulative citations, and independent recognitions, such as prizes.

Controllability of multiplex, multi-time-scale networks

M. Posfai, J. Gao, S. P. Cornelius, A.-L. Barabasi, R. D'Souza

Physical Review E 94: 3, 032316 (2016)

The paradigm of layered networks is used to describe many real-world systems, from biological networks to social organizations and transportation systems. While recently there has been much progress in understanding the general properties of multilayer networks, our understanding of how to control such systems remains limited. One fundamental aspect that makes this endeavor challenging is that each layer can operate at a different time scale; thus, we cannot directly apply standard ideas from structural control theory of individual networks. Here we address the problem of controlling multilayer and multi-time-scale networks focusing on two-layer multiplex networks with one-to-one interlayer coupling. We investigate the practically relevant case when the control signal is applied to the nodes of one layer. We develop a theory based on disjoint path covers to determine the minimum number of inputs (Ni) necessary for full control. We show that if both layers operate on the same time scale, then the network structure of both layers equally affect controllability. In the presence of time-scale separation, controllability is enhanced if the controller interacts with the faster layer: Ni decreases as the time-scale difference increases up to a critical time-scale difference, above which Ni remains constant and is completely determined by the faster layer. We show that the critical time-scale difference is large if layer I is easy and layer II is hard to control in isolation. In contrast, control becomes increasingly difficult if the controller interacts with the layer operating on the slower time scale and increasing time-scale separation leads to increased Ni, again up to a critical value, above which Ni still depends on the structure of both layers. This critical value is largely determined by the longest path in the faster layer that does not involve cycles. By identifying the underlying mechanisms that connect time-scale difference and controllability for a simplified model, we provide crucial insight into disentangling how our ability to control real interacting complex systems is affected by a variety of sources of complexity.

Control Principles of Complex Systems

Y.-Y. Liu and A.-L. Barabasi

Review of Modern Physics 88: 3, 035006-035064 (2016)

A reflection of our ultimate understanding of a complex system is our ability to control its behavior. Typically, control has multiple prerequisites: it requires an accurate map of the network that governs the interactions between the system’s components, a quantitative description of the dynamical laws that govern the temporal behavior of each component, and an ability to influence the state and temporal behavior of a selected subset of the components. With deep roots in dynamical systems and control theory, notions of control and controllability have taken a new life recently in the study of complex networks, inspiring several fundamental questions: What are the control principles of complex systems? How do networks organize themselves to balance control with functionality? To address these questions here recent advances on the controllability and the control of complex networks are reviewed, exploring the intricate interplay between the network topology and dynamical laws. The pertinent mathematical results are matched with empirical findings and applications. Uncovering the control principles of complex systems can help us explore and ultimately understand the fundamental laws that govern their behavior.

Control of Fluxes in Metabolic Networks

G. Basler, Z. Nikoloski, A. Larhlimi, A.-L. Barabasi, and Y.-Y. Liu

Genome Research 7: 26, 956-968 (2016)

Understanding the control of large-scale metabolic networks is central to biology and medicine. However, existing approaches either require specifying a cellular objective or can only be used for small networks. We introduce new coupling types describing the relations between reaction activities, and develop an efficient computational framework, which does not require any cellular objective for systematic studies of large-scale metabolism. We identify the driver reactions facilitating control of 23 metabolic networks from all kingdoms of life. We find that unicellular organisms require a smaller degree of control than multicellular organisms. Driver reactions are under complex cellular regulation in Escherichia coli, indicating their preeminent role in facilitating cellular control. In human cancer cells, driver reactions play pivotal roles in malignancy and represent potential therapeutic targets. The developed framework helps us gain insights into regulatory principles of diseases and facilitates design of engineering strategies at the interface of gene regulation, signaling, and metabolism.

Scaling Identity Connects Human Mobility and Social Interactions

P. Deville, C. Song, N. Eagle, V. D. Blondel, A.-L. Barabasi, D. Wang

PNAS 113: 26, 7047-7052 (2016)

Both our mobility and communication patterns obey spatial constraints: Most of the time, our trips or communications occur over a short distance, and occasionally, we take longer trips or call a friend who lives far away. These spatial dependencies, best described as power laws, play a consequential role in broad areas ranging from how an epidemic spreads to diffusion of ideas and information. Here we established the first formal link, to our knowledge, between mobility and communication patterns by deriving a scaling relationship connecting them. The uncovered scaling theory not only allows us to derive human movements from communication volumes, or vice versa, but it also documents a new degree of regularity that helps deepen our quantitative understanding of human behavior. Massive datasets that capture human movements and social interactions have catalyzed rapid advances in our quantitative understanding of human behavior during the past years. One important aspect affecting both areas is the critical role space plays. Indeed, growing evidence suggests both our movements and communication patterns are associated with spatial costs that follow reproducible scaling laws, each characterized by its specific critical exponents. Although human mobility and social networks develop concomitantly as two prolific yet largely separated fields, we lack any known relationships between the critical exponents explored by them, despite the fact that they often study the same datasets. Here, by exploiting three different mobile phone datasets that capture simultaneously these two aspects, we discovered a new scaling relationship, mediated by a universal flux distribution, which links the critical exponents characterizing the spatial dependencies in human mobility and social networks. Therefore, the widely studied scaling laws uncovered in these two areas are not independent but connected through a deeper underlying reality.

Untangling performance from success

B. Yucesoy, A.-L. Barabási

EPJ Data Science 5 (1), 17

Fame, popularity and celebrity status, frequently used tokens of success, are often loosely related to, or even divorced from professional performance. This dichotomy is partly rooted in the difficulty to distinguish performance, an individual measure that captures the actions of a performer, from success, a collective measure that captures a community’s reactions to these actions. Yet, finding the relationship between the two measures is essential for all areas that aim to objectively reward excellence, from science to business. Here we quantify the relationship between performance and success by focusing on tennis, an individual sport where the two quantities can be independently measured. We show that a predictive model, relying only on a tennis player’s performance in tournaments, can accurately predict an athlete’s popularity, both during a player’s active years and after retirement. Hence the model establishes a direct link between performance and momentary popularity. The agreement between the performance-driven and observed popularity suggests that in most areas of human achievement exceptional visibility may be rooted in detectable performance measures.

Controllability Analysis of the Directed Human Protein Interaction Network Identifies Disease Genes and Drug Targets

A. Vinayagama, T.E. Gibsonb, H.-J. Lee, B. Yilmazeld, C. Roeseld, Y. Hua, Y. Kwona, A. Sharma, Y.-Y. Liu, N. Perrimona, A.-L. Barabasi

Proceedings of the National Academy of Sciences 10.1073/pnas.1603992113, 1-6 (2016)

The protein-protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as "indispensable," "neutral," or "dispensable," which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a networks control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.

The Network Behind the Cosmic Web

B.C. Coutinho, S. Hong, K. Albrecht, A. Day, A.-L. Barabasi, P. Torrey, M. Vogelsberger, L. Hernquist

arXiv:1604.03236v2 (13 April 2016)

The concept of the cosmic web, viewing the universe as a set of discrete galaxies held together by gravity, is deeply ingrained in cosmology. Yet, little is known about the most effective construction and the characteristics of the underlying network. Here we explore seven network construction algorithms that use various galaxy distributions provided by both simulations and observations. We find that a model relying only on spatial proximity offers the best correlations between the physical characteristics of the connected galaxies. We show that the properties of the networks generated and from simulations and observations are identical, unveiling a deep universality of the cosmic web.

Universal resilience patterns in complex networks

J. Gao, B. Barzel, A.-L. Barabási

Nature 530, 307-312 (2016)

Resilience, a system’s ability to adjust its activity to retain its basic functionality when errors, failures and environmental changes occur, is a defining property of many complex systems. Despite widespread consequences for human health, the economy and the environment, events leading to loss of resilience—from cascading failures in technological systems to mass extinctions in ecological networks—are rarely predictable and are often irreversible. These limitations are rooted in a theoretical gap: the current analytical framework of resilience is designed to treat low-dimensional models with a few interacting components, and is unsuitable for multi-dimensional systems consisting of a large number of components that interact through a complex network. Here we bridge this theoretical gap by developing a set of analytical tools with which to identify the natural control and state parameters of a multi-dimensional complex system, helping us derive effective one-dimensional dynamics that accurately predict the system’s resilience. The proposed analytical framework allows us systematically to separate the roles of the system’s dynamics and topology, collapsing the behaviour of different networks onto a single universal resilience function. The analytical results unveil the network characteristics that can enhance or diminish resilience, offering ways to prevent the collapse of ecological, biological or economic systems, and guiding the design of technological systems resilient to both internal failures and environmental changes.

Network-based in silico drug efficacy screening

E. Guney, J. Menche, M. Vidal, A.-L. Barabási

Nature Communications 7:10331, 1-13 (2016)

The increasing cost of drug development together with a significant drop in the number of new drug approvals raises the need for innovative approaches for target identification and efficacy prediction. Here, we take advantage of our increasing understanding of the network-based origins of diseases to introduce a drug-disease proximity measure that quantifies the interplay between drugs targets and diseases. By correcting for the known biases of the interactome, proximity helps us uncover the therapeutic effect of drugs, as well as to distinguish palliative from effective treatments. Our analysis of 238 drugs used in 78 diseases indicates that the therapeutic effect of drugs is localized in a small network neighborhood of the disease genes and highlights efficacy issues for drugs used in Parkinson and several inflammatory disorders. Finally, network-based proximity allows us to predict novel drug-disease associations that offer unprecedented opportunities for drug repurposing and the detection of adverse effects.

Endophenotype Network Models: Common Core of Complex Diseases

S. D. Ghiassian, J. Menche, D. I. Chasman, F. Giulianini, R. Wang, P. Ricchiuto, M. Aikawa, H. Iwata, C. Muller, T. Zeller, A. Sharma, P. Wild, K. Lackner, S. Singh, P. M. Ridker, S. Blankenberg, A.-L. Barabasi, J. Loscalzo

Scientific Reports 6: 27414, 1-13 (2016)

Historically, human diseases have been differentiated and categorized based on the organ system in which they primarily manifest. Recently, an alternative view is emerging that emphasizes that different diseases often have common underlying mechanisms and shared intermediate pathophenotypes, or endo(pheno)types. Within this framework, a specific disease’s expression is a consequence of the interplay between the relevant endophenotypes and their local, organ-based environment. Important examples of such endophenotypes are inflammation, fibrosis, and thrombosis and their essential roles in many developing diseases. In this study, we construct endophenotype network models and explore their relation to different diseases in general and to cardiovascular diseases in particular. We identify the local neighborhoods (module) within the interconnected map of molecular components, i.e., the subnetworks of the human interactome that represent the inflammasome, thrombosome, and fibrosome. We find that these neighborhoods are highly overlapping and significantly enriched with disease-associated genes. In particular they are also enriched with differentially expressed genes linked to cardiovascular disease (risk). Finally, using proteomic data, we explore how macrophage activation contributes to our understanding of inflammatory processes and responses. The results of our analysis show that inflammatory responses initiate from within the cross-talk of the three identified endophenotypic modules.

Tissue Specificity of Human Disease Module

M. Kitsak, A. Sharma, J. Menche, E. Guney, S. D. Ghiassian, J. Loscalzo, A.-L. Barabasi

Scientific Reports 6: 35241 (2016)

Genes carrying mutations associated with genetic diseases are present in all human cells; yet, clinical manifestations of genetic diseases are usually highly tissue-specific. Although some disease genes are expressed only in selected tissues, the expression patterns of disease genes alone cannot explain the observed tissue specificity of human diseases. Here we hypothesize that for a disease to manifest itself in a particular tissue, a whole functional subnetwork of genes (disease module) needs to be expressed in that tissue. Driven by this hypothesis, we conducted a systematic study of the expression patterns of disease genes within the human interactome. We find that genes expressed in a specific tissue tend to be localized in the same neighborhood of the interactome. By contrast, genes expressed in different tissues are segregated in distinct network neighborhoods. Most important, we show that it is the integrity and the completeness of the expression of the disease module that determines disease manifestation in selected tissues. This approach allows us to construct a disease-tissue network that confirms known and predicts unexpected disease-tissue associations.

Canonical genetic signatures of the adult human brain

M. Hawrylycz, J. A. Miller, V. Menon, D. Feng, T. Dolbeare, A. L. Guillozet-Bongaarts, A. G. Jegga, B. J. Aronow, C.-K. Lee, A. Bernard, M. F. Glasser, D. L. Dierker, J. Menche, A. Szafer, F. Collman, P. Grange, K. A. Berman, S. Mihalas, Z. Yao, L. Stewart, A.-L. Barabási, J. Schulkin, J. Phillips, L. Ng, C. Dang, D. R. Haynor, A. Jones, D. C. Van Essen, C. Koch, D. Lein

Nature Neuroscience 4171, 1-15 (2015)

The structure and function of the human brain are highly stereotyped, implying a conserved molecular program responsible for its development, cellular structure and function. We applied a correlation-based metric called differential stability to assess reproducibility of gene expression patterning across 132 structures in six individual brains, revealing mesoscale genetic organization. The genes with the highest differential stability are highly biologically relevant, with enrichment for brain-related annotations, disease associations, drug targets and literature citations. Using genes with high differential stability, we identified 32 anatomically diverse and reproducible gene expression signatures, which represent distinct cell types, intracellular components and/or associations with neurodevelopmental and neurodegenerative disorders. Genes in neuron-associated compared to non-neuronal networks showed higher preservation between human and mouse; however, many diversely patterned genes displayed marked shifts in regulation between species. Finally, highly consistent transcriptional architecture in neocortex is correlated with resting state functional connectivity, suggesting a link between conserved gene expression and functionally relevant circuitry.

Returners and explorers dichotomy in human mobility

L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedreschi, F. Giannotti, A.-L. Barabási

Nature Communications 6:8166, 1-8 (2015)

The availability of massive digital traces of human whereabouts has offered a series of novel insights on the quantitative patterns characterizing human mobility. In particular, numerous recent studies have lead to an unexpected consensus: the considerable variability in the characteristic travelled distance of individuals coexists with a high degree of predictability of their future locations. Here we shed light on this surprising coexistence by systematically investigating the impact of recurrent mobility on the characteristic distance travelled by individuals. Using both mobile phone and GPS data, we discover the existence of two distinct classes of individuals: returners and explorers. As existing models of human mobility cannot explain the existence of these two classes, we develop more realistic models able to capture the empirical findings. Finally, we show that returners and explorers play a distinct quantifiable role in spreading phenomena and that a correlation exists between their mobility patterns and social interactions.

Spectrum of controlling and observing complex networks

G. Yan, G. Tsekenis, B. Barzel, J.-J. Slotine, Y.-Y. Liu, A.-L. Barabási

Nature Physics 11, 779-796 (2015)

Recent studies have made important advances in identifying sensor or driver nodes, through which we can observe or control a complex system. But the observational uncertainty induced by measurement noise and the energy required for control continue to be significant challenges in practical applications. Here we show that the variability of control energy and observational uncertainty for different directions of the state space depend strongly on the number of driver nodes. In particular, we find that if all nodes are directly driven, control is energetically feasible, as the maximum energy increases sub-linearly with the system size. If, however, we aim to control a system through a single node, control in some directions is energetically prohibitive, increasing exponentially with the system size. For the cases in between, the maximum energy decays exponentially when the number of driver nodes increases. We validate our findings in several model and real networks, arriving at a series of fundamental laws to describe the control energy that together deepen our understanding of complex systems.

Constructing minimal models for complex system dynamics

B. Barzel, Y.-Y. Liu, A.-L. Barabási

Nature Communications 6:7186, 1-8 (2015)

One of the strengths of statistical physics is the ability to reduce macroscopic observations into microscopic models, offering a mechanistic description of a system’s dynamics. This paradigm, rooted in Boltzmann’s gas theory, has found applications from magnetic phenomena to subcellular processes and epidemic spreading. Yet, each of these advances were the result of decades of meticulous model building and validation, which are impossible to replicate in most complex biological, social or technological systems that lack accurate microscopic models. Here we develop a method to infer the microscopic dynamics of a complex system from observations of its response to external perturbations, allowing us to construct the most general class of nonlinear pairwise dynamics that are guaranteed to recover the observed behavior. The result, which we test against both numerical and empirical data, is an effective dynamic model that can predict the system’s behavior and provide crucial insights into its inner workings.
The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

Uncovering disease-disease relationships through the incomplete interactome

J. Menche, A. Sharma, M. Kitsak, D. Ghiassian, M. Vidal, J. Loscazlo, A.-L. Barabasi

Science 347:6224, 1257601-1 (2015)

According to the disease module hypothesis, the cellular components associated with a disease segregate in the same neighborhood of the human interactome, the map of biologically relevant molecular interactions. Yet, given the incompleteness of the interactome and the limited knowledge of disease-associated genes, it is not obvious if the available data have sufficient coverage to map out modules associated with each disease. Here we derive mathematical conditions for the identifiability of disease modules and show that the network-based location of each disease module determines its pathobiological relationship to other diseases. For example, diseases with overlapping network modules show significant coexpression patterns, symptom similarity, and comorbidity, whereas diseases residing in separated network neighborhoods are phenotypically distinct. These tools represent an interactome-based platform to predict molecular commonalities between phenotypically related diseases, even if they do not share primary disease genes.

A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma

A. Sharma, J. Menche, C. C. Huang, T. Ort, X. Zhou, M. Kitsak, N. Sahni, D. Thibault, L. Voung, F. Guo, S. D. Ghiassian, N. Gulbahce, F. Baribaud, J. Tocker, R. Dobrin, E. Barnathan, H. Liu, R. A. Panettieri Jr., K. G. Tantisira, W. Qiu, B. A. Raby, E. K. Silverman, M. Vidal, S. T. Weiss, and A.-L. Barabási

Human Molecular Genetics 101093, 1-16 (2015)

Recent advances in genetics have spurred rapid progress towards the systematic identification of genes involved in complex diseases. Still, the detailed understanding of the molecular and physiological mechanisms through which these genes affect disease phenotypes remains a major challenge. Here, we identify the asthma disease module, i.e. the local neighborhood of the interactome whose perturbation is associated with asthma, and validate it for functional and pathophysiological relevance, using both computational and experimental approaches. We find that the asthma disease module is enriched with modest GWAS P-values against the background of random variation, and with differentially expressed genes from normal and asthmatic fibroblast cells treated with an asthma-specific drug. The asthma module also contains immune response mechanisms that are shared with other immune-related disease modules. Further, using diverse omics (genomics,gene-expression, drug response) data,we identify the GAB1 signaling pathway as an important novel modulator in asthma. The wiring diagram of the uncovered asthma module suggests a relatively close link between GAB1 and glucocorticoids (GCs), which we experimentally validate, observing an increase in the level of GAB1 after GC treatment in BEAS-2B bronchial epithelial cells. The siRNA knockdown of GAB1 in the BEAS-2B ce

Destruction perfected

I. A. Kovács, A.-L. Barabási

Nature (News & Views) 524, 38-39 (2015)

Pinpointing the nodes whose removal most effectively disrupts a network has become a lot easier with the development of an efficient algorithm. Potential applications might include cybersecurity and disease control. See Letter p.65, by F. Morone and H. A. Makse (Supplementary 1).

A proteome-scale map of the human interactome network

T. Rolland, M. Tasan, , B. Charloteaux, S. J. Pevzner,, Q. Zhong, N. Sahni, S. Yi,, I. Lemmens, C. Fontanillo,, R. Mosca, A. Kamburov, , S. D. Ghiassian, X. Yang,, L. Ghamsari, D. Balcha,, B. E. Begg, P. Braun, M. Brehm, M. P. Froly, A.-R. Carvunis, D, Convery-Zupan, R. Carominas,, J. Coulombe-Huntington, , E. Dann, M. Dreze, A. Dricot,, C. Fan, E. Franzosa, F. Gebrea, B. J. Gutierrez, M. F. Hardy,, M. Jin, S. Kang, R. Kiros, G. , Lin, K. Luck, A. MacWilliams,, J. Menche, R R. Murray, A., Palagi, M. M. Poulin, X. , Rambout, J. Rasla, P. Reichert, V. Romero, E. Ruyssinck, J. M., Sahalie, plus 20 more co-authors

Cell 159:5, 1212-1226 (2014)

Just as reference genome sequences revolutionized human genetics, reference maps of interactome networks will be critical to fully understand genotype-phenotype relationships. Here, we describe a systematic map of ∼14,000 high-quality human binary protein-protein interactions. At equal quality, this map is ∼30% larger than what is available from small-scale studies published in the literature in the last few decades. While currently available information is highly biased and only covers a relatively small portion of the proteome, our systematic map appears strikingly more homogeneous, revealing a “broader” human interactome network than currently appreciated. The map also uncovers significant interconnectivity between known and candidate cancer gene products, providing unbiased evidence for an expanded functional cancer landscape, while demonstrating how high-quality interactome models will help “connect the dots” of the genomic revolution.

Collective credit allocation in science

H.-W. Shen, A.-L. Barabasi

Proceedings of the National Academy of Sciences 10.1073/pnas.1401992111, 1-6 (2014)

Collaboration among researchers is an essential component of the modern scientific enterprise, playing a particularly important role in multidisciplinary research. However, we _continue to wrestle with allocating credit to the coauthors of publications with multiple authors, because the relative contribution of each author is difficult to determine. At the same time, the scientific community runs an informal field-dependent credit allocation process that assigns credit in a collective fashion to each work. Here we develop a credit allocation algorithm that captures the coauthors’ contribution to a publication as perceived by the scientific community, reproducing the informal collective credit allocation of science. We validate the method by identifying the authors of Nobel-winning papers that are credited for the discovery, independent of their positions in the author list. The method can also compare the relative impact of researchers working in the same field, even if they did not publish together. The ability to accurately measure the relative credit of researchers could affect many aspects of credit allocation in science, potentially impacting hiring, funding, and promotion decisions.

A network framework of cultural history

M. Schich, C. Song, Y. Y. Ahn, A. Mirsky, M. Martino, A.-L. Barabási, D. Helbing

Science 345, 558-562 (2014)

The emergent processes driving cultural history are a product of complex interactions among large numbers of individuals, determined by difficult-to-quantify historical conditions. To characterize these processes, we have reconstructed aggregate intellectual mobility over two millennia through the birth and death locations of more than 150,000 notable individuals. The tools of network and complexity theory were then used to identify characteristic statistical patterns and determine the cultural and historical relevance of deviations. The resulting network of locations provides a macroscopic perspective of cultural history, which helps us to retrace cultural narratives of Europe and North America using large-scale visualization and quantitative dynamical tools and to derive historical trends of cultural centers beyond the scope of specific events or narrow time intervals.

A genetic epidemiology approach to cyber-security

S. Gil, A. Kott, A.-L. Barabási

Scientific Reports 4:5659, 1-7 (2014)

While much attention has been paid to the vulnerability of computer networks to node and link failure, there is limited systematic understanding of the factors that determine the likelihood that a node (computer) is compromised. We therefore collect threat log data in a university network to study the patterns of threat activity for individual hosts. We relate this information to the properties of each host as observed through network-wide scans, establishing associations between the network services a host is running and the kinds of threats to which it is susceptible. We propose a methodology to associate services to threats inspired by the tools used in genetics to identify statistical associations between mutations and diseases. The proposed approach allows us to determine probabilities of infection directly from observation, offering an automated high-throughput strategy to develop comprehensive metrics for cyber-security.

Human symptoms–disease network

X. Z. Zhou, J. Menche, A.-L. Barabási, A. Sharma

Nature Communications 5:4212, 1-10 (2014)

In the post-genomic era, the elucidation of the relationship between the molecular origins of diseases and their resulting phenotypes is a crucial task for medical research. Here, we use a large-scale biomedical literature database to construct a symptom-based human disease network and investigate the connection between clinical manifestations of diseases and their underlying molecular interactions. We find that the symptom-based similarity of two diseases correlates strongly with the number of shared genetic associations and the extent to which their associated proteins interact. Moreover, the diversity of the clinical manifestations of a disease can be related to the connectivity patterns of the underlying protein interaction network. The comprehensive, high-quality map of disease–symptom relations can further be used as a resource helping to address important questions in the field of systems medicine, for example, the identification of unexpected associations between diseases, disease etiology research or drug design.

Career on the move: Geography, stratification, and scientific impact

P. Deville, D. Wang, R. Sinatra, C. Song, V. Blondel, A.-L. Barabási

Scientific Reports 4, 1-7 (2014)

Changing institutions is an integral part of an academic life. Yet little is known about the mobility patterns of scientists at an institutional level and how these career choices affect scientific outcomes. Here, we examine over 420,000 papers, to track the affiliation information of individual scientists, allowing us to reconstruct their career trajectories over decades. We find that career movements are not only temporally and spatially localized, but also characterized by a high degree of stratification in institutional ranking. When cross-group movement occurs, we find that while going from elite to lower-rank institutions on average associates with modest decrease in scientific performance, transitioning into elite institutions does not result in subsequent performance gain. These results offer empirical evidence on institutional level career choices and movements and have potential implications for science policy.

A diVIsive Shuffling Approach (VIStA) for gene expression analysis to identify subtypes in Chronic Obstructive Pulmonary Disease

J. Mench, A. Sharma, M. H. Cho, R. J. Mayer, S. I. Rennard, B. Celli, B. E. Miller, N. Locantore, R. Tal-Singer, S. Ghosh, C. Larminie, G. Bradley, J. H. Riley, A. Agusti, E. K. Silverman, A.-L. Barabási

BMC Systems Biology 8, 1-13 (2014)

Background: An important step toward understanding the biological mechanisms underlying a complex disease is a refined understanding of its clinical heterogeneity. Relating clinical and molecular differences may allow us to define more specific subtypes of patients that respond differently to therapeutic interventions. Results: We developed a novel unbiased method called diVIsive Shuffling Approach (VIStA) that identifies subgroups of patients by maximizing the difference in their gene expression patterns. We tested our algorithm on 140 subjects with Chronic Obstructive Pulmonary Disease (COPD) and found four distinct, biologically and clinically meaningful combinations of clinical characteristics that are associated with large gene expression differences. The dominant characteristic in these combinations was the severity of airflow limitation. Other frequently identified measures included emphysema, fibrinogen levels, phlegm, BMI and age. A pathway analysis of the differentially expressed genes in the identified subtypes suggests that VIStA is capable of capturing specific molecular signatures within in each group. Conclusions: The introduced methodology allowed us to identify combinations of clinical characteristics that correspond to clear gene expression differences. The resulting subtypes for COPD contribute to a better understanding of its heterogeneity.

Bordering Fiction

Barabasi, A.-L.

Science 343: 6169 (2014)

Eggers portrays a world--in which an omnipotent social networking company encourages everyone to monitor everybody everywhere--that feels eerily everyday.

Quantifying information flow during emergencies

L. Gao, C. Song, Z. Gao, A.-L. Barabasi, J. P. Bagrow, D. Wang

Scientific Reports 4, 1-6 (2014)

Recent advances on human dynamics have focused on the normal patterns of human activities, with the quantitative understanding of human behavior under extreme events remaining a crucial missing chapter. This has a wide array of potential applications, ranging from emergency response and detection to traffic control and management. Previous studies have shown that human communications are both temporally and spatially localized following the onset of emergencies, indicating that social propagation is a primary means to propagate situational awareness. We study real anomalous events using country-wide mobile phone data, finding that information flow during emergencies is dominated by repeated communications. We further demonstrate that the observed communication patterns cannot be explained by inherent reciprocity in social networks, and are universal across different demographics.

Modeling and predicting popularity dynamics via reinforced poisson processes

H. Shen, D. Wang, C. Song, A.-L. Barabási

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence , 291-297 (2014)

An ability to predict the popularity dynamics of individual items within a complex evolving system has important implications in an array of areas. Here we propose a generative probabilistic framework using a reinforced Poisson process to explicitly model the process through which individual items gain their popularity. This model distinguishes itself from existing models via its capability of modeling the arrival process of popularity and its remarkable power at predicting the popularity of individual items. It possesses the flexibility of applying Bayesian treatment to further improve the predictive power using a conjugate prior. Extensive experiments on a longitudinal citation dataset demonstrate that this model consistently outperforms existing popularity prediction methods.

Target control of complex networks

Jianxi Gao, Y.-Y.Liu, R. M. D'Souza, A.-L. Barabási

Nature Communications 5:5415, 1-7 (2014)