20260331T103020260331T1230Europe/AmsterdamIR-for-Good Paper Session IIIAgriIR: A Scalable Framework for Domain-Specific Knowledge RetrievalExtending Logic Tensor Networks to Implicit Feedback for Representation-Aware Music RecommendationCultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IROne LLM to Train Them All: A Multi-Task Learning Framework for Fact-CheckingHow Information Retrieval Systems Construct and Amplify Immigration NarrativesTowards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and SafetyIntegrating AI and IR paradigms for sustainable and trustworthy accurate access to large scale Biomedical informationDebiasing CLIP with Neural InterventionsChemieECIR2026conference-secretariat@blueboxevents.nl
AgriIR: A Scalable Framework for Domain-Specific KnowledgeRetrieval
IR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
This paper introduces AgriIR, a configurable retrieval augmented generation (RAG) framework designed to deliver grounded, domain-specific answers while maintaining flexibility and low computational cost. Instead of relying on large, monolithic models, AgriIR decomposes the information access process into declarative modular stages query refinement, sub-query planning, retrieval, synthesis, and evaluation. This design allows practitioners to adapt the framework to new knowledge verticals without modifying the architecture. Our reference implementation targets Indian agricultural information access, integrating 1B-parameter language models with adaptive retrievers and domain-aware agent catalogues. The system enforces deterministic citation, integrates telemetry for transparency, and includes automated deployment assets to ensure auditable, reproducible operation. By emphasizing architectural design and modular control, AgriIR demonstrates that well-engineered pipelines can achieve domain-accurate, trustworthy retrieval even under constrained resources. We argue that this approach exemplifies ¡°AI for Agriculture¡± by promoting accessibility, sustainability, and accountability in retrieval-augmented generation systems.
Dwaipayan Roy Assistant Professor, Indian Institute Of Science Education And Research Kolkata
Extending Logic Tensor Networks to Implicit Feedback forRepresentation-Aware Music Recommendation
IR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
Music recommender systems shape how people discover music, yet persistent concerns have been raised regarding fairness and representation. Achieving fairness in recommender systems is challenging because conventional methods rely on rigid quantitative criteria, making it difficult to express nuanced or socially informed fairness goals. We explore the use of Logic Tensor Networks (LTNs) to incorporate nuanced fairness constraints into music recommender systems. LTNs enable the formulation of soft, differentiable constraints in a specific first-order logic, allowing fairness to be expressed through expert knowledge or data-driven insights. We make two main contributions.First, we extend an existing LTN-based recommender framework to the implicit-feedback setting. Second, we propose a procedure leveraging the extended framework to integrate data-informed fairness regularization into matrix factorization (MF)¨Cbased music recommendation. We demonstrate effectiveness of the proposed procedure with a case study on country-level representation bias in music recommendation, where content from hegemonic markets (e.g., the U.S.) is often overrepresented while local music is underexposed. Our analysis reveals that this imbalance disproportionately affects users with high local mainstreaminess (those who prefer music popular within their own country) and low global mainstreaminess (those who prefer less globally popular music). Using LTNs, we design targeted, data-informed fairness constraints and show that our approach allows to mitigate these disparities while maintaining competitive recommendation quality.
Hannah Eckert PhD Student, Johannes Kepler University Linz
Cultural Analytics for Good: Building Inclusive EvaluationFrameworks for Historical IR
IR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
This work bridges information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library¡¯s BL19 digital collection (more than $35,000$ works from $1700-1899$), we construct a benchmark for studying language change and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. Central to our investigation is knowledge transfer from fiction to non-fiction, examining how narrative understanding and semantic richness in fiction can enhance retrieval performance for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.
Philipp Mayr Team Leader, GESIS Leibniz Institute For The Social Sciences
One LLM to Train Them All: A Multi-Task Learning Framework
for Fact-Checking
IR for goodIR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines rather than isolated components. While large proprietary models achieve strong performance, their closed weights, complexity, and high costs limit sustainability. Fine-tuning smaller open weight models for individual AFC tasks can help but requires multiple specialized models resulting in high costs. We propose \textbf{multi-task learning (MTL)} as a more efficient alternative that trains a single model to perform claim detection, evidence ranking, and stance detection jointly. Using small decoder-only LLMs (e.g., Qwen3-4b), we explore three MTL strategies: classification heads, causal language modeling heads, and instruction-tuning, and evaluate them across model sizes, task orders, and standard non-LLM baselines. While multitask models do not universally surpass single-task baselines, they yield substantial improvements, achieving up to \textbf{44\%}, \textbf{54\%}, and \textbf{31\%} relative gains for claim detection, evidence re-ranking, and stance detection, respectively, over zero-/few-shot settings. Finally, we also provide practical, empirically grounded guidelines to help practitioners apply MTL with LLMs for automated fact-checking.
Vinay Setty Associate Professor, University Of Stavanger
How Information Retrieval Systems Construct and Amplify
Immigration Narratives
IR for goodIR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
Information retrieval systems play a central role in how people access and understand information about complex social issues, including immigration. Yet little is known about how the datasets that underpin these systems represent migrants or structure public narratives about migration. In this paper, we investigate how immigration is framed within a widely used IR benchmark and how ranking models shape the visibility of those frames. Using MS MARCO as our data source, we curate immigration-related queries and annotate retrieved passages using a migration-specific framing taxonomy grounded in social-science research. Our goal is to identify which narratives dominate and to measure how different retrieval models influence their exposure. We find that legality and security frames are far more common than humanitarian or inclusive ones, and that neural reranking amplifies exclusionary portrayals compared to sparse retrieval.
Towards Reliable Machine Translation: Scaling LLMs for
Critical Error Detection and Safety
IR for goodIR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
Machine Translation (MT) plays a pivotal role in cross-lingual information access, public policy communication, and equitable knowledge dissemination. However, critical meaning errors, such as factual distortions, intent reversals, or biased translations, can undermine the reliability, fairness, and safety of multilingual systems. In this work, we explore the capacity of instruction-tuned Large Language Models (LLMs) to detect such critical errors, evaluating models across a range of scales (e.g., GPT-4o-mini, LLaMA 3.1 8B, LLaMA 3.3 70B, and GPT-OSS 20B/120B) using WMT-21, WMT-22, and a curated SynCED benchmark. Our findings show that model scaling and adaptation strategies (zero-shot, few-shot, fine-tuning) yield consistent improvements, outperforming encoder-only baselines like XLM-R and ModernBERT. We argue that improving critical error detection in MT contributes to safer, more trustworthy, and socially accountable information systems by reducing the risk of disinformation, miscommunication, and linguistic harm, especially in high-stakes or underrepresented contexts. This work positions error detection not merely as a technical challenge, but as a necessary safeguard in the pursuit of just and responsible multilingual AI.
Integrating AI and IR paradigms for sustainable andtrustworthy accurate access to large scale Biomedicalinformation
IR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
In high-stakes domains such as health and biology, information retrieval systems must ensure accuracy while also supporting equitable access and protecting sensitive data. However, many state-of-the-art biomedical IR solutions rely on proprietary cloud infrastructures, raising concerns over cost, reproducibility, and patient privacy. We present a fully open-source retrieval-augmented question answering framework that accurately manages QA against the entire PubMed collection (over 38M documents) using modest, local, consumer-grade hardware. Inspired by BioASQ, our system combines sparse and dense retrieval with a lightweight local LLM for evidence-grounded biomedical QA. Experiments show that strong retrieval quality and real-time performance are achievable without reliance on commercial APIs or large GPU clusters. By reducing infrastructure barriers around on-premises data, this work provides a concrete path toward democratizing trustworthy biomedical IR for hospitals, universities, and healthcare organizations worldwide.
IR for good10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC
This paper presents an inference-time method to mitigate demographic bias in CLIP-like vision¨Clanguage models through targeted neural interventions in their internal attention mechanisms. We first identify ``expert'' attention heads that encode demographic information by systematically analyzing CLIP¡¯s internal representations in response to labeled inputs. At inference, we intervene these heads -- replacing their activations with demographic prototypes or by neutralizing them (zero ablation). We chose to intervene specifically at the CLS token, as it aggregates information globally across image patches and is directly responsible for the final image embedding. Our results across multiple evaluation frameworks show that these targeted interventions can significantly reduce both gender and ethnicity biases in cross-modal retrieval and zero-shot classification, without compromising model performance.