20260331T103020260331T1230Europe/AmsterdamIR-for-Good Paper Session IIIAgriIR: A Scalable Framework for Domain-Specific Knowledge RetrievalExtending Logic Tensor Networks to Implicit Feedback for Representation-Aware Music RecommendationCultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IROne LLM to Train Them All: A Multi-Task Learning Framework for Fact-CheckingHow Information Retrieval Systems Construct and Amplify Immigration NarrativesTowards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and SafetyIntegrating AI and IR paradigms for sustainable and trustworthy accurate access to large scale Biomedical informationDebiasing CLIP with Neural InterventionsChemieECIR2026n.fontein@tudelft.nl
Philipp Mayr Team Leader, GESIS Leibniz Institute For The Social Sciences
One LLM to Train Them All: A Multi-Task Learning Framework
for Fact-Checking
IR for goodIR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines rather than isolated components. While large proprietary models achieve strong performance, their closed weights, complexity, and high costs limit sustainability. Fine-tuning smaller open weight models for individual AFC tasks can help but requires multiple specialized models resulting in high costs. We propose \textbf{multi-task learning (MTL)} as a more efficient alternative that trains a single model to perform claim detection, evidence ranking, and stance detection jointly. Using small decoder-only LLMs (e.g., Qwen3-4b), we explore three MTL strategies: classification heads, causal language modeling heads, and instruction-tuning, and evaluate them across model sizes, task orders, and standard non-LLM baselines. While multitask models do not universally surpass single-task baselines, they yield substantial improvements, achieving up to \textbf{44\%}, \textbf{54\%}, and \textbf{31\%} relative gains for claim detection, evidence re-ranking, and stance detection, respectively, over zero-/few-shot settings. Finally, we also provide practical, empirically grounded guidelines to help practitioners apply MTL with LLMs for automated fact-checking.
How Information Retrieval Systems Construct and Amplify
Immigration Narratives
IR for goodIR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
Information retrieval systems play a central role in how people access and understand information about complex social issues, including immigration. Yet little is known about how the datasets that underpin these systems represent migrants or structure public narratives about migration. In this paper, we investigate how immigration is framed within a widely used IR benchmark and how ranking models shape the visibility of those frames. Using MS MARCO as our data source, we curate immigration-related queries and annotate retrieved passages using a migration-specific framing taxonomy grounded in social-science research. Our goal is to identify which narratives dominate and to measure how different retrieval models influence their exposure. We find that legality and security frames are far more common than humanitarian or inclusive ones, and that neural reranking amplifies exclusionary portrayals compared to sparse retrieval.
Towards Reliable Machine Translation: Scaling LLMs for
Critical Error Detection and Safety
IR for goodIR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
Machine Translation (MT) plays a pivotal role in cross-lingual information access, public policy communication, and equitable knowledge dissemination. However, critical meaning errors, such as factual distortions, intent reversals, or biased translations, can undermine the reliability, fairness, and safety of multilingual systems. In this work, we explore the capacity of instruction-tuned Large Language Models (LLMs) to detect such critical errors, evaluating models across a range of scales (e.g., GPT-4o-mini, LLaMA 3.1 8B, LLaMA 3.3 70B, and GPT-OSS 20B/120B) using WMT-21, WMT-22, and a curated SynCED benchmark. Our findings show that model scaling and adaptation strategies (zero-shot, few-shot, fine-tuning) yield consistent improvements, outperforming encoder-only baselines like XLM-R and ModernBERT. We argue that improving critical error detection in MT contributes to safer, more trustworthy, and socially accountable information systems by reducing the risk of disinformation, miscommunication, and linguistic harm, especially in high-stakes or underrepresented contexts. This work positions error detection not merely as a technical challenge, but as a necessary safeguard in the pursuit of just and responsible multilingual AI.
Integrating AI and IR paradigms for sustainable and
trustworthy accurate access to large scale Biomedical
information
IR for goodIR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
In high-stakes domains such as health and biology, information retrieval systems must ensure accuracy while also supporting equitable access and protecting sensitive data. However, many state-of-the-art biomedical IR solutions rely on proprietary cloud infrastructures, raising concerns over cost, reproducibility, and patient privacy. We present a fully open-source retrieval-augmented question answering framework that accurately manages QA against the entire PubMed collection (over 38M documents) using modest, local, consumer-grade hardware. Inspired by BioASQ, our system combines sparse and dense retrieval with a lightweight local LLM for evidence-grounded biomedical QA. Experiments show that strong retrieval quality and real-time performance are achievable without reliance on commercial APIs or large GPU clusters. By reducing infrastructure barriers around on-premises data, this work provides a concrete path toward democratizing trustworthy biomedical IR for hospitals, universities, and healthcare organizations worldwide.