My current research interests explore the role of networks in various domains, from biology to social science, using large-scale datasets. Behind all my research projects stands the question of understanding how collective phenomena emerge from elementary parts: cells from proteins, organisms from cells, teams from individuals etc. These collective phenomena exhibit macroscopic observables, or “phenotypes”. As such, understanding the disease state of an individual through the lens of the molecular protein interaction network will involve a similar framework than understanding the performance of a team through the lens of team members interactions. My research mixes network science for representing the data and extracting relevant features, and machine learning / data science / physics to understand how these properties relate to the macroscopic observable in question.
The current projects I’m pursing are varied and can be grouped under several umbrellas:
- Role of non-coding RNAs in diseases. In collaboration with the Sharma Lab at Harvard Medical School, we study the impact of miRNAs and lincRNAs in the interactome to prioritize their role in diseases. We also built a disease similarity map based on shared miRNAs, showing high resolution to stratify disease subtypes.
- Multi-layer interactome. We use the information on cell to cell and tissue to tissue communication to build the mutli-layer interactome representing the interactions between tissue-specific layers. Using this map, we study the spread of perturbations in the organism.
- Personalized medicine. Using data from stressors or drug treatments at the individual level, we study how gene expression relates to the response using machine learning and biological networks.
- Virus-host protein interactions. In collaboration with Thomas Rolland and Yves Jacob from Pasteur Institute, we use experimental data of interactions between virus and host proteins to decipher the interactome effects that explain different virulences of various strains of Ebola viruses.
- Hospital network mobility. In collaboration with the Dhand Lab at Northeastern University, we study the mobility of millions of patients in the hospital system in California.
- Social network modules. Based on the “disease module” hypothesis in which genes linked to a similar disease state cluster in the same neighborhood of the interactome, we study similar effects in the context of “social” biomarkers that associate to social properties. In particular, we look at the role of hormones and their link to social network properties, such as betweenness centrality in a team. For this, we are designing low-cost DIY solutions to map the level of several hormones from saliva samples using changes of color of a strip that can be recorded on a smartphone.
- Large-scale database on emotions. In collaboration with CRI fellow Felix Schoeller, we investigate what stimuli cause strong emotions, such as chills, using large scale data collection and analysis. To do so, we extracted a large dataset of Youtube videos for which individuals significantly experienced strong emotions (goosebumps), as given by the analysis of the comments. Using videos metadata as well as their underlying relational network, we reconstruct the various types of stimuli underlying these strong emotional experiences to build a first large scale unsupervised investigation of what causes chills.
- Large scale analysis of team success in the iGEM competition. In collaboration with the Barabasi Lab at Northeastern University, we use data extracted from open wiki webpages from 2,000+ teams that have participated in the international Genetically Engineered Machines (iGEM) competition over the past 10 years. From these wikis, we can extract information on the collaborative structure of the project (who did what at what time), as well as which teams collaborated with other teams, and how teams re-combined BioBricks produced by other teams to make new innovative BioBricks. Features from this data are then associated to team success (medal, prizes, finalists) to explore their importance in performance and success.
- Measuring iGEM team social interactions. In this project, we work with iGEM teams to study their interactions in the lab using proximity sensors (bluetooth enabled smartphone app) as well as communication data (e-mail, Slack…). This is an ongoing study with the goal to study 20+ teams in 2019. The proximity sensors are developed in collaboration with the Matter Lab at the Child Mind Institute in NYC.
- Problem-based learning. We leverage a dataset of 1,000 teams from a medical school that have pursued a curriculum of collaborative problem based learning. Students can interact on a Moodle platform, allowing to reconstruct their interaction networks. Network properties can then be associated to grades to understand the healthy collaborative practices in learning.
- The social network of learners. In collaboration with Orange Labs, we also study another dataset of 400+ learners from Madagascar for which we have answers to quizzes as well as interactions through mobile phones over a period of 3 months.
Science of Science / Science Design
- The rise and fall of scientific fields. Here we investigate emergence of new main areas of the research fields by analyzing the scientific publications from arXiv. We extract the publication path of each researcher and analyze them in the scientific categories space. The trajectories are modeled as vectors in an n-dimensional space, where each dimension is a science-field-tag. Each trajectory is hence a jump over the corners of a n-dimensional hypercube (single tag papers), or intermediate points (2 or 3 tags). We aim to: 1/ study the universal patterns of the birth, life and death of a topic and 2/ study the types of trajectories (interdisciplinary, specialist…) underlying the different phases of the life of a topic.
- Measuring social bias in research. We analyse how laboratories around the world have created the “interactome” map of human protein-protein interactions in the past 30 years. By comparing this map with the recently obtained “ground truth” systematic map (obtained by checking random pairs of proteins for interactions), we can disentangle the importance of biology (involvement of a protein in a disease), topology (how far that protein is in the known network) and social bias (who published about that protein and is it a collaborator?) in the evolution of the knowledge about the interactome.
- Science Design. In this project, we build recommendation systems for large open science projects. This work is done in the context of the JOGL (Just One Giant Lab) platform, with the key question being: if there are 1,000 people working on a same open science project, how can we filter information so that each person can contribute significantly without being overwhelmed by all of the project information?
AI and networks
- Building an AI network scientist. In collaboration with CRI fellow Remy Kusters, we investigate how to use neural networks to 1/ learn and recognize various types of networks (Barabasi-Albert, Erdös-Renyi, Watts-Strogatz…) using convolutional neural networks, and 2/ learn finer network structure that distinguishes various networks of different sizes corresponding to team networks in the iGEM competition, with the goal to predict their end performance (medals).