Last month, two of our team members, Jelena and Matthew, attended the 35th Computational Linguistics in the Netherlands (CLIN) conference in Leuven, Belgium. The panel we presented in was called “Multi-Modal Studies”, which very much suits our project, since our analysis of social media data involves both text and images.

Social media communication has become increasingly visual, with users sharing images, videos, and text. Platforms like Facebook now host large volumes of multi-modal content, where traditional text-based analysis alone is insufficient for capturing the full semantic landscape. We argued that this shift requires computational approaches that integrate natural language processing (NLP) with image analysis to better understand how narratives are formed, reinforced, and circulated online—especially in contexts of political unrest or conflict.

For the textual data we have obtained, we have shown that standard natural language processing techniques like collocation analysis can help us understand further what topics are found in the online discourse as well as what are often mentioned when these topics are brought up on social media. In addition, we have also briefly mentioned our ongoing work on the automatic language detection of our data, which existing tools can yet deal with Fulfulde. For the visual data, we employ the CLIP (Contrastive Language–Image Pretraining) model to generate image embeddings, enabling semantic alignment with textual concepts within a shared embedding space. We apply UMAP (Uniform Manifold Approximation and Projection) to the image embeddings to cluster semantically similar images and identify recurring visual motifs. Despite having more than 10000 images which we have collected, we are able to find coherent groups of pictures according to different themes, such as violence, flood, military etc. The multi-modal framework allows us to move beyond isolated analysis of text or image content and instead model the interplay between visual and linguistic elements as they contribute to the online construction of meaning.

Last but not least, our talk ended with interests and questions from the audiences.