PyroNear-2024 is a new dataset for smoke plume detection, featuring 150,000 annotations on 50,000 images and videos of 400 wildfires from France, Spain, and the US. This dataset surpasses existing ones in size and diversity, and experiments show it's a challenging but valuable resource for training models, with potential for improved performance when combined with other datasets.
Oct 1, 2024
We've created the Touché23-ValueEval dataset, a large collection of over 9,300 arguments annotated with 54 human values, to help develop methods for analyzing the values that make arguments persuasive. Our dataset, which more than doubles the size of its predecessor, has already been used to achieve state-of-the-art results in identifying human values behind arguments, and has shown promising performance with large language models like Llama-2-7B.
May 1, 2024
A new dataset for Stance Recognition using data from the Participatory Democracy platform of the Conference for the Future of Europe. This dataset contains highly-multilingual interactions, as the platform used Machine Translation, in the sense that users interacts in using their (different) native languages in the same thread.
Nov 1, 2022
We've added new opinion annotations to the SEMAINE dataset, which captures dyadic interactions between humans and virtual agents, resulting in a rich dataset with over 73,000 words and 6 hours of conversation. Our annotations and proposed baseline model using RoBERTa embeddings achieve promising results, with a F1-score of 0.72, making it a valuable resource for opinion detection in human-computer interactions.
Jun 1, 2022
A new dataset of 2,600 online debate comments has been created to improve stance classification models. Fine-tuning and semi-supervised learning can boost accuracy by 3.4% over a baseline model.
Jun 1, 2022