20260330T143020260330T1530Europe/AmsterdamIR-for-Good Paper Session IIMeasuring Political Stance and Consistency in Large Language ModelsJudiciously Reducing Sub-group Comparisons for Learning Intersectional Fair RepresentationsModeling Behavioral Patterns in News Recommendations Using Fuzzy Neural NetworksDoes Reasoning Make Search More Fair? Comparing Fairness in Reasoning and Non-Reasoning RerankersCentrale (Plenary Room) ECIR2026n.fontein@tudelft.nl
Measuring Political Stance and Consistency in Large
Language Models
IR for goodIR for good02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
With the incredible advancements in Large Language Models (LLMs), many people have started using them to satisfy their information needs. However, utilizing LLMs might be problematic for political issues where disagreement is common and model outputs may reflect training-data biases or deliberate alignment choices. To better characterize such behavior, we assess the stances of nine LLMs on 24 politically sensitive issues using five prompting techniques. We find that models often adopt opposing stances on several issues; some positions are malleable under prompting, while others remain stable. Among the models examined, Grok-3-mini is the most persistent, whereas Mistral-7B is the least. For issues involving countries with different languages, models tend to support the side whose language is used in the prompt. Notably, no prompting technique alters model stances on the Qatar blockade or the oppression of Palestinians. We hope these findings raise user awareness when seeking political guidance from LLMs and encourage developers to address these concerns.
Judiciously Reducing Sub-group Comparisons for Learning
Intersectional Fair Representations
IR for goodIR for good02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
Ensuring fairness in ranking systems is critical to avoid discriminatory outcomes towards minority groups in high stakes domains such as recruitment. Most fairness interventions only address fairness for one or more binary groups without accounting for intersectional fairness. We study the problem of achieving intersectional fairness in ranking systems, where individuals may face compounded disadvantages. We adapt and extend existing pre-processing fairness intervention methods to optimize for intersectional group fairness. Importantly, as the number of intersectional sub-groups grows exponentially with the number of attributes, optimization becomes computationally expensive and possibly infeasible. To address this challenge, we propose to reduce the number of sub-group comparisons when optimizing for intersectional fairness, based on the highest disparities between sub-groups. Our results show that limiting sub-group comparisons achieves comparable or better intersectional fairness. We validate this on three real-world datasets and a simulated setup designed to test robustness to intersectional fairness challenges.
Maarten De Rijke Distinguished University Professor, University Of Amsterdam
Modeling Behavioral Patterns in News Recommendations Using
Fuzzy Neural Networks
IR for goodIR for good02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
News recommender systems are increasingly driven by black-box models, offering little transparency for editorial decision-making. In this work, we introduce a transparent recommender system that uses fuzzy neural networks to learn human-readable rules from behavioral data for predicting article clicks. By extracting the rules at configurable thresholds, we can control rule complexity and thus, the level of interpretability. We evaluate our approach on two publicly available news datasets (i.e., MIND and EB-NeRD) and show that we can accurately predict click behavior compared to several established baselines, while learning human-readable rules. Furthermore, we show that the learned rules reveal news consumption patterns, enabling editors to align content curation goals with target audience behavior.
Presenters Kevin Innerebner PhD Student, Graz University Of Technology Co-Authors
Does Reasoning Make Search More Fair? Comparing Fairness in
Reasoning and Non-Reasoning Rerankers
IR for goodIR for good02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC
While reasoning rerankers, such as Rank1, have demonstrated strong abilities in improving ranking relevance, it is unclear how they perform on other retrieval qualities such as fairness. We conduct the first systematic comparison of fairness between reasoning and non-reasoning rerankers. Using the TREC 2022 Fair Ranking Track dataset, we evaluate six reranking models across multiple retrieval settings and demographic attributes. Our findings demonstrate reasoning neither improve nor harm fairness compared to non-reasoning approaches. Our fairness metric, Attention-Weighted Rank Fairness (AWRF) remained stable (0.33-0.35) across all models, even as relevance varies substantially (nDCG 0.247-1.000). Demographic breakdown analysis revealed fairness gaps for geographic attributes regardless of model architecture. These results indicate that future work in specializing reasoning models to be aware of fairness attributes could lead to improvements, as current implementations preserve the fairness characteristics of their input ranking.