Paper-Conference

A Study of Nationality Bias in Names and Perplexity using Off-the-Shelf Affect-related Tweet Classifiers
A Study of Nationality Bias in Names and Perplexity using Off-the-Shelf Affect-related Tweet Classifiers

We've developed a method to measure biases in AI models related to named entities from different countries, and our results show that the presence of certain country names can significantly influence predictions, such as hate speech detection and emotion analysis, with changes of up to 23% and 60% respectively! Our findings suggest that these biases are rooted in the pre-training data of language models, and we've uncovered interesting patterns that reveal how the language and country of origin can impact model predictions, with English-speaking country names having a particularly strong effect.

Nov 1, 2024

(Preprint) Scrapping The Web For Early Wildfire Detection: A New Annotated Dataset of Images and Videos of Smoke Plumes In-the-wild
(Preprint) Scrapping The Web For Early Wildfire Detection: A New Annotated Dataset of Images and Videos of Smoke Plumes In-the-wild

PyroNear-2024 is a new dataset for smoke plume detection, featuring 150,000 annotations on 50,000 images and videos of 400 wildfires from France, Spain, and the US. This dataset surpasses existing ones in size and diversity, and experiments show it's a challenging but valuable resource for training models, with potential for improved performance when combined with other datasets.

Oct 1, 2024

Findings of WASSA 2024 Shared Task on Empathy and Personality Detection in Interactions
Findings of WASSA 2024 Shared Task on Empathy and Personality Detection in Interactions

Findings of the shared task on Empathy, Personality, and Emotion Detection from the WASSA workshop @ ACL.

Jul 1, 2024

The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments
The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments

We've created the Touché23-ValueEval dataset, a large collection of over 9,300 arguments annotated with 54 human values, to help develop methods for analyzing the values that make arguments persuasive. Our dataset, which more than doubles the size of its predecessor, has already been used to achieve state-of-the-art results in identifying human values behind arguments, and has shown promising performance with large language models like Llama-2-7B.

May 1, 2024

Are Text Classifiers Xenophobic? A Country-Oriented Bias Detection Method with Least Confounding Variables
Are Text Classifiers Xenophobic? A Country-Oriented Bias Detection Method with Least Confounding Variables

Current bias detection methods in machine learning have their own biases and limitations, so we've developed a new approach that directly tests fine-tuned classifiers on real-world data to identify potential biases. Our method, which involves creating counterfactual examples by modifying named entities in target data, revealed significant biases in multilingual models, including sentiment analysis and stance recognition models, and shed light on the complex interactions between names, languages, and model predictions. Current models tend to prefer names from the countries speaking the language of the sentence, impulsing for the name IA Xenophobia.

May 1, 2024

Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness

TIDA is a new data augmentation method that uses text-to-image generation to create more diverse and realistic training data, helping AI models better understand complex correlations and improve their performance on tasks like gender recognition.

Dec 1, 2023

Deep Natural Language Feature Learning for Interpretable Prediction
Deep Natural Language Feature Learning for Interpretable Prediction

A technique for explanability in LLM, allowing to break a complex task into subtasks formulated as binary questions in natural language, and represent any samples using the output of a binary classifier on these subtasks.

Dec 1, 2023

Findings of WASSA 2023 Shared Task on Empathy, Emotion and Personality Detection in Conversation and Reactions to News Articles
Findings of WASSA 2023 Shared Task on Empathy, Emotion and Personality Detection in Conversation and Reactions to News Articles

Findings of the shared task on Empathy, Personality, and Emotion Detection from the WASSA workshop @ ACL.

Jul 1, 2023

CoFE: A New Dataset of Intra-Multilingual Multi-target Stance Classification from an Online European Participatory Democracy Platform
CoFE: A New Dataset of Intra-Multilingual Multi-target Stance Classification from an Online European Participatory Democracy Platform

A new dataset for Stance Recognition using data from the Participatory Democracy platform of the Conference for the Future of Europe. This dataset contains highly-multilingual interactions, as the platform used Machine Translation, in the sense that users interacts in using their (different) native languages in the same thread.

Nov 1, 2022

Opinions in Interactions : New Annotations of the SEMAINE Database
Opinions in Interactions : New Annotations of the SEMAINE Database

We've added new opinion annotations to the SEMAINE dataset, which captures dyadic interactions between humans and virtual agents, resulting in a rich dataset with over 73,000 words and 6 hours of conversation. Our annotations and proposed baseline model using RoBERTa embeddings achieve promising results, with a F1-score of 0.72, making it a valuable resource for opinion detection in human-computer interactions.

Jun 1, 2022