Loading Session...

IR-for-Good Paper Session II

Back to Schedule Check-inYou can join session 5 minutes before start time.

Session Information

  • Measuring Political Stance and Consistency in Large Language Models
  • Judiciously Reducing Sub-group Comparisons for Learning Intersectional Fair Representations
  • Modeling Behavioral Patterns in News Recommendations Using Fuzzy Neural Networks
  • Does Reasoning Make Search More Fair? Comparing Fairness in Reasoning and Non-Reasoning Rerankers
Mar 30, 2026 14:30 - 15:30(Europe/Amsterdam)
Venue : Centrale (Plenary Room)
20260330T1430 20260330T1530 Europe/Amsterdam IR-for-Good Paper Session II Measuring Political Stance and Consistency in Large Language ModelsJudiciously Reducing Sub-group Comparisons for Learning Intersectional Fair RepresentationsModeling Behavioral Patterns in News Recommendations Using Fuzzy Neural NetworksDoes Reasoning Make Search More Fair? Comparing Fairness in Reasoning and Non-Reasoning Rerankers Centrale (Plenary Room) ECIR2026 n.fontein@tudelft.nl

Sub Sessions

Measuring Political Stance and Consistency in Large Language Models

IR for goodIR for good 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
With the incredible advancements in Large Language Models (LLMs), many people have started using them to satisfy their information needs. However, utilizing LLMs might be problematic for political issues where disagreement is common and model outputs may reflect training-data biases or deliberate alignment choices. To better characterize such behavior, we assess the stances of nine LLMs on 24 politically sensitive issues using five prompting techniques. We find that models often adopt opposing stances on several issues; some positions are malleable under prompting, while others remain stable. Among the models examined, Grok-3-mini is the most persistent, whereas Mistral-7B is the least. For issues involving countries with different languages, models tend to support the side whose language is used in the prompt. Notably, no prompting technique alters model stances on the Qatar blockade or the oppression of Palestinians. We hope these findings raise user awareness when seeking political guidance from LLMs and encourage developers to address these concerns.
Presenters
MK
Mucahid Kutlu
TOBB University
Co-Authors
SK
Saban Kardas
Qatar University
SA
Salah Feras Alali
Qatar University
MM
Mohammad Nashat Maasfeh
Qatar University

Judiciously Reducing Sub-group Comparisons for Learning Intersectional Fair Representations

IR for goodIR for good 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
Ensuring fairness in ranking systems is critical to avoid discriminatory outcomes towards minority groups in high stakes domains such as recruitment. Most fairness interventions only address fairness for one or more binary groups without accounting for intersectional fairness. We study the problem of achieving intersectional fairness in ranking systems, where individuals may face compounded disadvantages. We adapt and extend existing pre-processing fairness intervention methods to optimize for intersectional group fairness. Importantly, as the number of intersectional sub-groups grows exponentially with the number of attributes, optimization becomes computationally expensive and possibly infeasible. To address this challenge, we propose to reduce the number of sub-group comparisons when optimizing for intersectional fairness, based on the highest disparities between sub-groups. Our results show that limiting sub-group comparisons achieves comparable or better intersectional fairness. We validate this on three real-world datasets and a simulated setup designed to test robustness to intersectional fairness challenges.
Presenters
CR
Clara Rus
PhD, University Of Amsterdam
Co-Authors
AY
Andrew Yates
Johns Hopkins University, HLTCOE
MD
Maarten De Rijke
Distinguished University Professor, University Of Amsterdam

Modeling Behavioral Patterns in News Recommendations Using Fuzzy Neural Networks

IR for goodIR for good 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
News recommender systems are increasingly driven by black-box models, offering little transparency for editorial decision-making. In this work, we introduce a transparent recommender system that uses fuzzy neural networks to learn human-readable rules from behavioral data for predicting article clicks. By extracting the rules at configurable thresholds, we can control rule complexity and thus, the level of interpretability. We evaluate our approach on two publicly available news datasets (i.e., MIND and EB-NeRD) and show that we can accurately predict click behavior compared to several established baselines, while learning human-readable rules. Furthermore, we show that the learned rules reveal news consumption patterns, enabling editors to align content curation goals with target audience behavior.
Presenters Kevin Innerebner
PhD Student, Graz University Of Technology
Co-Authors
SB
Stephan Bartl
Graz University Of Technology
MR
Markus Reiter-Haas
Graz University Of Technology
EL
Elisabeth Lex
Graz University Of Technology

Does Reasoning Make Search More Fair? Comparing Fairness in Reasoning and Non-Reasoning Rerankers

IR for goodIR for good 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
While reasoning rerankers, such as Rank1, have demonstrated strong abilities in improving ranking relevance, it is unclear how they perform on other retrieval qualities such as fairness. We conduct the first systematic comparison of fairness between reasoning and non-reasoning rerankers. Using the TREC 2022 Fair Ranking Track dataset, we evaluate six reranking models across multiple retrieval settings and demographic attributes. Our findings demonstrate reasoning neither improve nor harm fairness compared to non-reasoning approaches. Our fairness metric, Attention-Weighted Rank Fairness (AWRF) remained stable (0.33-0.35) across all models, even as relevance varies substantially (nDCG 0.247-1.000). Demographic breakdown analysis revealed fairness gaps for geographic attributes regardless of model architecture. These results indicate that future work in specializing reasoning models to be aware of fairness attributes could lead to improvements, as current implementations preserve the fairness characteristics of their input ranking.
Presenters
SS
Saron Samuel
Johns Hopkins University
Co-Authors
BD
Benjamin Van Durme
Johns Hopkins University
EY
Eugene Yang
Research Scientist, Human Language Technology Center Of Excellence, Johns Hopkins University
14 visits

Session Participants

User Online
Session speakers, moderators & attendees
TOBB University
PhD
,
University Of Amsterdam
PhD student
,
Graz University Of Technology
Johns Hopkins University
No moderator for this session!
No attendee has checked-in to this session!
7 attendees saved this session

Session Chat

Live Chat
Chat with participants attending this session

Questions & Answers

Answered
Submit questions for the presenters

Session Polls

Active
Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.