Integrating Behavior, Text, and Networks to Forecast Online Participation

As online platforms increasingly rely on voluntary contributions—from open science to collaborative innovation—the ability to anticipate user engagement becomes both a scientific and practical priority. Yet predicting who will stay active, who will disengage, and why, remains a complex challenge. Our recent paper, KEGNN: Knowledge-Enhanced Graph Neural Networks for User Engagement Prediction (Fan et al., International Conference on Multimedia Retrieval 2025), introduces a novel framework that addresses this gap by integrating behavioral, social, and semantic signals into a unified predictive model.

The Problem: Capturing the Dynamics of Engagement

Most prior research on online user behavior focuses on isolated indicators—likes, posts, log-ins, or social ties—but these are often insufficient to capture the underlying commitment or attentional shifts of participants. In crowd-powered platforms such as Just One Giant Lab (JOGL), where users self-organize to address societal challenges, engagement is both a cognitive and relational phenomenon. We hypothesize that sustained engagement emerges from an interplay between individual behavior over time, position in the social network, and the semantic content of contributions.

The Solution: A Multi-Modal Graph Neural Network

KEGNN (Knowledge-Enhanced Graph Neural Network) is designed to capture this interplay. Our approach models each user as a node in a dynamic graph, with three core input streams:

  1. Behavioral Time Series: Using an RFE (Recency, Frequency, Engagement) model, we quantify participation trajectories—tracking how recently and how often a user contributes, and the intensity of their activity.
  2. Social Network Dynamics: We construct a weekly-evolving follower-following graph to capture social embeddedness.
  3. Textual Semantics: Posts and comments are processed using natural language processing (NLP) methods to extract the thematic and linguistic features of shared knowledge.

Each component is embedded via specialized neural modules (RNN for time series, BiLSTM for text, GCN for graph structure), and fused into a joint representation that allows for predictive classification of future user engagement.

Empirical Validation

We apply KEGNN to a longitudinal dataset of user activity on JOGL, spanning over a year and comprising dynamic behavioral, textual, and network data for 300 users and 143 projects. The model achieves significantly higher predictive accuracy and F1-scores than adapted baselines, including RNN, LSTNet, and ColaGNN—even in long-horizon forecasts (up to three weeks ahead).

To understand what drives the model’s success, we ran ablation tests. Removing either the text analysis or the network structure led to clear drops in accuracy, showing that both content and social context are essential for predicting user engagement.

Implications and Future Directions

KEGNN provides a flexible and interpretable architecture for understanding engagement in open communities. Its modular design allows it to be extended to other domains where user behavior is temporally and relationally structured—such as open-source software, online education, or participatory policy platforms.

Beyond prediction, future work could leverage KEGNN to inform interventions—e.g., identifying early signs of disengagement, surfacing under-recognized contributors, or supporting community managers in sustaining collective momentum.

Leave a comment