Integrating Behavior, Text, and Networks to Forecast Online Participation

As online platforms increasingly rely on voluntary contributions—from open science to collaborative innovation—the ability to anticipate user engagement becomes both a scientific and practical priority. Yet predicting who will stay active, who will disengage, and why, remains a complex challenge. Our recent paper, KEGNN: Knowledge-Enhanced Graph Neural Networks for User Engagement Prediction (Fan et al., International Conference on Multimedia Retrieval 2025), introduces a novel framework that addresses this gap by integrating behavioral, social, and semantic signals into a unified predictive model.

Continue reading

From Text to Network: Mapping Scientific Collaboration Using LLMs

Understanding how scientists collaborate is key to improving research, but much of that collaboration is informal and buried in unstructured text. In our new article published in Applied Network Science, we show how Large Language Models (LLMs) can uncover these hidden networks—retrieving both inter-team collaborations and intra-team task allocations from free-form text with high accuracy.

Continue reading

From Self to System: Designing Social Containers for Collective Intelligence

How do we design social containers that support the emergence of healthy, adaptive groups?

In our forthcoming article in Group, we propose a multi-level developmental framework rooted in complexity and network science. We explore how structured environments—composed of nested scales of interaction (self, dyad, group, community)—can cultivate core relational competencies such as co-regulation, perspective-taking, and group-level coordination. These capacities are not merely psychological traits but emergent properties of well-designed interaction networks.

Continue reading

The Shape of Participation: Uncovering Structure in Open Collaboration

Open-source communities are often seen as paragons of decentralized collaboration. But even in these systems, invisible hierarchies shape how people contribute, coordinate, and influence. Can we better understand these structures—not just who contributes the most, but how participation patterns emerge, persist, and evolve?

In our recent study published in Physica A, we explore this question using rank-size distributions—mathematical tools that map how activity is distributed across contributors. The most famous of these, Zipf’s Law, has long been used to describe hierarchies in systems as varied as cities, languages, and scientific publishing. It assumes a simple power-law decay: the second most active contributor does half as much as the first, the third one-third, and so on. But while this model is elegant, it’s also limited.

Continue reading

The Geography of Knowledge: Tracking Researcher Mobility Through Scientific Space

Our paper on the mobility of scientists through the knowledge landscape has been accepted in EPJ Data Science! In this study, we build on our earlier work on the rise and fall of scientific fields in arXiv (see this post), and propose a new lens: what if we studied science like we study human movement?

We constructed a low-dimensional map of scientific knowledge using t-SNE embeddings of 1.5 million arXiv preprints across physics, computer science, and mathematics. This space allows us to track researchers as they “move” through fields via their publications—each trajectory forming a unique scientific path through the landscape.

Continue reading

Rethinking Science from the Ground Up

What kind of science do we need to navigate the crises of the 21st century? In our new article published in Royal Society Open Science, we argue that the answer lies in embracing a richer, more situated understanding of knowledge—one that takes seriously the partiality of perspectives and the challenge of integrating diverse, and sometimes incommensurable, ways of seeing the world.

Drawing on epistemology, complexity science, and our experiences in democratic citizen science, we propose a shift from an industrial model of research—focused on outputs, efficiency, and consensus—toward an ecological model of inquiry. This alternative vision emphasizes diversity, deliberation, and long-term robustness. It reframes citizen science not as an auxiliary to academic research, but as a model for a more inclusive and deliberative scientific practice.

Continue reading

A Laboratory Ethnography at Scale: Lessons from 3,000 Synthetic Biology Teams

This new preprint is the result of a collaboration initiated during my postdoctoral stay at the Barabasi lab in Boston, which I continued at the LPI as an affiliated professor. In this project, we introduce the synthetic biology competition iGEM as a model system for the Science of Science and Innovation, enabling large-scale “laboratory ethnography.” We present the collection and analysis of laboratory notebooks data from 3,000 teams, which we deposited on the open archive Zenodo. We highlight the organizational characteristics (intra- and inter-team collaboration networks) of teams related to learning and success in the competition. In particular, we emphasize how teams overcome coordination costs as they grow in size, as well as the crystallization of the inter-team collaboration network over time, limiting access to relational capital for peripheral teams. This work is currently funded by an ANR JCJC grant to collect field data and build network models of collaborations and performance.

Continue reading

Analyzing Relational Structures in Educational Forums

We just published a new article in Educational Technology Research and Development: “Forum posts, communication patterns, and relational structures: A multi-level view of discussions in online courses” . This article relies on the approach we previously published using the formalism of Exponential Random Graph Models (ERGMs) to model the formation of relational networks from data of online forums used in university courses. Utilizing these models in the context of bipartite networks before projecting them into a weighted network allows for the creation of null models that assume different mechanisms of forum use. The statistical comparison of these null model projections with the actual network enabled us to assess the significance of global characteristics such as density, the number of communities, or clustering, as well as filter links to obtain sparse relational structures whose structural properties can be compared and grouped by similarity.