Home>INRIA-Sciences Po Internship Opportunity

08.01.2025

INRIA-Sciences Po Internship Opportunity

Within the framework of the joint Inria-Sciences Po Exploratory Action SALM, we propose a 6-month internship (Spring/Summer 2025) directed towards a fully funded PhD thesis (start: Fall 2025). 

Context

Social media platforms offer new spaces for citizens to voice their concerns, share their emotions, and comment about their daily lives or the news. With the democratization of the Internet, we observed a growth of mainstream and alternative platforms, which started to host larger populations to accommodate all interests and preferences. As a result new voices have started to be heard.

We now observe a wide array of issues being raised, as well as a large diversity of identities expressing themselves online. 
One may rejoice about the phenomenon as evidence of increased social and cultural diversity in the digital public space. At the same time, some may see it as evidence of a growing polarization in our society, with increasingly homophilic communities isolating themselves in narrow niches composed of like-minded individuals.

The objective of this internship is not to take a position on this debate, but to investigate the consequences of this socio-cultural diversity on various NLP tasks which are supposed to work on any kind of utterance. How accurate are opinion mining, stance detection or sentiment analysis methods when confronted to data coming from heterogeneous populations ? Evaluation metrics usually ignore how the social/demographic/cultural/ideological/ethnic background of speakers may influence the performance of a NLP algorithm. 

Timeline

We will leverage existing longitudinal data from Twitter and detailed information about the socioeconomic status and political preferences of nearly 7000 individuals available in the newly released SoSweet Corpus [ICAR et al., 2024]. The internship will start with a first exploratory stage where we will study the linguistic variations in-between them pursuing previous efforts by Abitbol et al. (2018). 

In a second phase, we will test existing algorithms on this fully informed dataset.  We expect toxicity detection to vary across social classes, but also possibly according to regional factors. Stance detection may also depend on the type of issues around which users position themselves. As a consequence, it may also measure differently stances expressed by right-wingers and left-wingers.

Finally, in a third phase, we suggest enriching NLP methods by integrating sociological methods and automated neural model-based techniques to improve the performance, explanatory capacities, and measurement validity of specific models. Prior research has shown that manual classification and annotation for sensitive topics is highly dependent on the annotator’s own cultural assumptions about the identities and characteristics of speakers, as well as their own ideological position. The standard methodology in this area aims at eliminating these "subjective" factors to the greatest extent possible. In contrast, we believe that incorporating external knowledge about speakers’ ideological positions will help to improve the performance of neural models via an increased context-sensitivity brought to the language models. These extra-linguistics information can even be seen as additional modalities. How to best include them in a language model is an open question for the field at large, that will be investigated in this project.

Work Environment

The internship will take place in part at Inria and at Sciences Po in Paris; the formal employer will be the Fondation Nationale des Sciences Politiques.

To apply

Please send your application (cv and cover letter) to Pr. Jean-Philippe Cointet (jeanphilippe.cointet@sciencespo.fr) and Dr. Djamé Seddah (djame.seddah@inria.fr) before 10 February 2025.

Internship and PhD Advisors

Djamé Seddah, Jean-Phillipe Cointet and Alexander Kindel