IR-for-Good Paper Session III

Loading Session...

Session Information

AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval
Extending Logic Tensor Networks to Implicit Feedback for Representation-Aware Music Recommendation
Cultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IR
One LLM to Train Them All: A Multi-Task Learning Framework for Fact-Checking
How Information Retrieval Systems Construct and Amplify Immigration Narratives
Towards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and Safety
Integrating AI and IR paradigms for sustainable and trustworthy accurate access to large scale Biomedical information
Debiasing CLIP with Neural Interventions

IR4Good

Mar 31, 2026 10:30 - 12:30(Europe/Amsterdam)

Venue : Chemie

20260331T1030 20260331T1230 Europe/Amsterdam IR-for-Good Paper Session III AgriIR: A Scalable Framework for Domain-Specific Knowledge RetrievalExtending Logic Tensor Networks to Implicit Feedback for Representation-Aware Music RecommendationCultural Analytics for Good: Building Inclusive Evaluation Frameworks for Historical IROne LLM to Train Them All: A Multi-Task Learning Framework for Fact-CheckingHow Information Retrieval Systems Construct and Amplify Immigration NarrativesTowards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and SafetyIntegrating AI and IR paradigms for sustainable and trustworthy accurate access to large scale Biomedical informationDebiasing CLIP with Neural Interventions Chemie ECIR2026 conference-secretariat@blueboxevents.nl

Add to my Schedule

Sub Sessions

AgriIR: A Scalable Framework for Domain-Specific KnowledgeRetrieval

IR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

This paper introduces AgriIR, a configurable retrieval augmented generation (RAG) framework designed to deliver grounded, domain-specific answers while maintaining flexibility and low computational cost. Instead of relying on large, monolithic models, AgriIR decomposes the information access process into declarative modular stages query refinement, sub-query planning, retrieval, synthesis, and evaluation. This design allows practitioners to adapt the framework to new knowledge verticals without modifying the architecture. Our reference implementation targets Indian agricultural information access, integrating 1B-parameter language models with adaptive retrievers and domain-aware agent catalogues. The system enforces deterministic citation, integrates telemetry for transparency, and includes automated deployment assets to ensure auditable, reproducible operation. By emphasizing architectural design and modular control, AgriIR demonstrates that well-engineered pipelines can achieve domain-accurate, trustworthy retrieval even under constrained resources. We argue that this approach exemplifies ¡°AI for Agriculture¡± by promoting accessibility, sustainability, and accountability in retrieval-augmented generation systems.

Presenters

Co-Authors

Extending Logic Tensor Networks to Implicit Feedback forRepresentation-Aware Music Recommendation

IR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

Music recommender systems shape how people discover music, yet persistent concerns have been raised regarding fairness and representation. Achieving fairness in recommender systems is challenging because conventional methods rely on rigid quantitative criteria, making it difficult to express nuanced or socially informed fairness goals. We explore the use of Logic Tensor Networks (LTNs) to incorporate nuanced fairness constraints into music recommender systems. LTNs enable the formulation of soft, differentiable constraints in a specific first-order logic, allowing fairness to be expressed through expert knowledge or data-driven insights. We make two main contributions.First, we extend an existing LTN-based recommender framework to the implicit-feedback setting. Second, we propose a procedure leveraging the extended framework to integrate data-informed fairness regularization into matrix factorization (MF)¨Cbased music recommendation. We demonstrate effectiveness of the proposed procedure with a case study on country-level representation bias in music recommendation, where content from hegemonic markets (e.g., the U.S.) is often overrepresented while local music is underexposed. Our analysis reveals that this imbalance disproportionately affects users with high local mainstreaminess (those who prefer music popular within their own country) and low global mainstreaminess (those who prefer less globally popular music). Using LTNs, we design targeted, data-informed fairness constraints and show that our approach allows to mitigate these disparities while maintaining competitive recommendation quality.

Presenters

Cultural Analytics for Good: Building Inclusive EvaluationFrameworks for Historical IR

IR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

This work bridges information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library¡¯s BL19 digital collection (more than $35,000$ works from $1700-1899$), we construct a benchmark for studying language change and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. Central to our investigation is knowledge transfer from fiction to non-fiction, examining how narrative understanding and semantic richness in fiction can enhance retrieval performance for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.

Presenters

Co-Authors

One LLM to Train Them All: A Multi-Task Learning Framework for Fact-Checking

IR for goodIR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines rather than isolated components. While large proprietary models achieve strong performance, their closed weights, complexity, and high costs limit sustainability. Fine-tuning smaller open weight models for individual AFC tasks can help but requires multiple specialized models resulting in high costs. We propose \textbf{multi-task learning (MTL)} as a more efficient alternative that trains a single model to perform claim detection, evidence ranking, and stance detection jointly. Using small decoder-only LLMs (e.g., Qwen3-4b), we explore three MTL strategies: classification heads, causal language modeling heads, and instruction-tuning, and evaluate them across model sizes, task orders, and standard non-LLM baselines. While multitask models do not universally surpass single-task baselines, they yield substantial improvements, achieving up to \textbf{44\%}, \textbf{54\%}, and \textbf{31\%} relative gains for claim detection, evidence re-ranking, and stance detection, respectively, over zero-/few-shot settings. Finally, we also provide practical, empirically grounded guidelines to help practitioners apply MTL with LLMs for automated fact-checking.

Presenters

Co-Authors

How Information Retrieval Systems Construct and Amplify Immigration Narratives

IR for goodIR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

Information retrieval systems play a central role in how people access and understand information about complex social issues, including immigration. Yet little is known about how the datasets that underpin these systems represent migrants or structure public narratives about migration. In this paper, we investigate how immigration is framed within a widely used IR benchmark and how ranking models shape the visibility of those frames. Using MS MARCO as our data source, we curate immigration-related queries and annotate retrieved passages using a migration-specific framing taxonomy grounded in social-science research. Our goal is to identify which narratives dominate and to measure how different retrieval models influence their exposure. We find that legality and security frames are far more common than humanitarian or inclusive ones, and that neural reranking amplifies exclusionary portrayals compared to sparse retrieval.

Presenters

Co-Authors

Towards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and Safety

IR for goodIR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

Machine Translation (MT) plays a pivotal role in cross-lingual information access, public policy communication, and equitable knowledge dissemination. However, critical meaning errors, such as factual distortions, intent reversals, or biased translations, can undermine the reliability, fairness, and safety of multilingual systems. In this work, we explore the capacity of instruction-tuned Large Language Models (LLMs) to detect such critical errors, evaluating models across a range of scales (e.g., GPT-4o-mini, LLaMA 3.1 8B, LLaMA 3.3 70B, and GPT-OSS 20B/120B) using WMT-21, WMT-22, and a curated SynCED benchmark. Our findings show that model scaling and adaptation strategies (zero-shot, few-shot, fine-tuning) yield consistent improvements, outperforming encoder-only baselines like XLM-R and ModernBERT. We argue that improving critical error detection in MT contributes to safer, more trustworthy, and socially accountable information systems by reducing the risk of disinformation, miscommunication, and linguistic harm, especially in high-stakes or underrepresented contexts. This work positions error detection not merely as a technical challenge, but as a necessary safeguard in the pursuit of just and responsible multilingual AI.

Presenters

Integrating AI and IR paradigms for sustainable andtrustworthy accurate access to large scale Biomedicalinformation

IR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

In high-stakes domains such as health and biology, information retrieval systems must ensure accuracy while also supporting equitable access and protecting sensitive data. However, many state-of-the-art biomedical IR solutions rely on proprietary cloud infrastructures, raising concerns over cost, reproducibility, and patient privacy. We present a fully open-source retrieval-augmented question answering framework that accurately manages QA against the entire PubMed collection (over 38M documents) using modest, local, consumer-grade hardware. Inspired by BioASQ, our system combines sparse and dense retrieval with a lightweight local LLM for evidence-grounded biomedical QA. Experiments show that strong retrieval quality and real-time performance are achievable without reliance on commercial APIs or large GPU clusters. By reducing infrastructure barriers around on-premises data, this work provides a concrete path toward democratizing trustworthy biomedical IR for hospitals, universities, and healthcare organizations worldwide.

Presenters

Co-Authors

Debiasing CLIP with Neural Interventions

IR for good 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/31 08:30:00 UTC - 2026/03/31 10:30:00 UTC

This paper presents an inference-time method to mitigate demographic bias in CLIP-like vision¨Clanguage models through targeted neural interventions in their internal attention mechanisms. We first identify ``expert'' attention heads that encode demographic information by systematically analyzing CLIP¡¯s internal representations in response to labeled inputs. At inference, we intervene these heads -- replacing their activations with demographic prototypes or by neutralizing them (zero ablation). We chose to intervene specifically at the CLS token, as it aggregates information globally across image patches and is directly responsible for the final image embedding. Our results across multiple evaluation frameworks show that these targeted interventions can significantly reduce both gender and ethnicity biases in cross-modal retrieval and zero-shot classification, without compromising model performance.

Presenters

Co-Authors

193 visits

Session Participants

User Online

Session speakers, moderators & attendees

Shuvam Banerji Seal

Indian Institute of Science Education and Research - Kolkata

Hannah Eckert

PhD Student

Johannes Kepler University Linz

Suchana Datta

Postdoctoral Research Fellow

University College Dublin

Malin Astrid Larsson

University of Stavanger

Zarif Masud

PhD Student

Toronto Metropolitan University

+ 3 more speakers. View All

Clara Rus

PhD

University Of Amsterdam

No attendee has checked-in to this session!

22 attendees saved this session

Session Chat

Live Chat

Chat with participants attending this session

Questions & Answers

Answered

Submit questions for the presenters

Session Polls

Active

Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.

IR-for-Good Paper Session III

Session Information

Sub Sessions

AgriIR: A Scalable Framework for Domain-Specific KnowledgeRetrieval

Extending Logic Tensor Networks to Implicit Feedback forRepresentation-Aware Music Recommendation

Cultural Analytics for Good: Building Inclusive EvaluationFrameworks for Historical IR

One LLM to Train Them All: A Multi-Task Learning Framework for Fact-Checking

How Information Retrieval Systems Construct and Amplify Immigration Narratives

Towards Reliable Machine Translation: Scaling LLMs for Critical Error Detection and Safety

Integrating AI and IR paradigms for sustainable andtrustworthy accurate access to large scale Biomedicalinformation

Debiasing CLIP with Neural Interventions

Session Participants

Session Chat

Questions & Answers

Session Polls

Need Help?

Please enter the four digit secret code The secret code should have been announced or displayed at the session location.

AI-generated Summary

Please enter the four digit secret code
The secret code should have been announced or displayed at the session location.