Loading Session...

Applied Generation, Evaluation & Analysis with LLMs

Back to Schedule Check-inYou can join session 5 minutes before start time.

Session Information

  • Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare
  • Small Models, Big Picture! A Language Model Augmentation for Enhanced Reader-Aware Summarization
  • From Comments to Conclusions: Adaptive Reader-Aware Summary Generation in Low-Resource Languages via Agent Debate
  • Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference
  • Towards Quantitative Summarization Evaluation: An Integrated Atomic-Based Evaluation Framework and Dataset for Text Summarization
  • ExpertMix: Aspect and Severity Detection in Conversational Complaints
  • MemTool: Optimizing Short-Term Memory Management for Dynamic Tool Retrieval and Invocation in LLM Agent Multi-Turn Conversations
Mar 30, 2026 10:30 - 12:30(Europe/Amsterdam)
Venue : Chaos
20260330T1030 20260330T1230 Europe/Amsterdam Applied Generation, Evaluation & Analysis with LLMs Contradictions in Context: Challenges for Retrieval-Augmented Generation in HealthcareSmall Models, Big Picture! A Language Model Augmentation for Enhanced Reader-Aware SummarizationFrom Comments to Conclusions: Adaptive Reader-Aware Summary Generation in Low-Resource Languages via Agent DebatePrompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM InferenceTowards Quantitative Summarization Evaluation: An Integrated Atomic-Based Evaluation Framework and Dataset for Text SummarizationExpertMix: Aspect and Severity Detection in Conversational ComplaintsMemTool: Optimizing Short-Term Memory Management for Dynamic Tool Retrieval and Invocation in LLM Agent Multi-Turn Conversations Chaos ECIR2026 n.fontein@tudelft.nl

Sub Sessions

Contradictions in Context: Challenges for Retrieval-Augmented Generation in Healthcare

Full papersApplications Machine Learning and Large Language ModelsFull papers 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/30 08:30:00 UTC - 2026/03/30 10:30:00 UTC
In high-stakes information domains such as healthcare, where large language models (LLMs) can produce hallucinations or misinformation, retrieval-augmented generation (RAG) has been proposed as a mitigation strategy, grounding model outputs in external, domain-specific documents. Yet, this approach can introduce errors when source documents contain outdated or contradictory information. This work investigates the performance of five LLMs in generating RAG-based responses to medicine-related queries. Our contributions are three-fold: i) the creation of a benchmark dataset using consumer medicine information documents from the Australian Therapeutic Goods Administration (TGA), where headings are repurposed as natural language questions, ii) the retrieval of PubMed abstracts using TGA headings, stratified across multiple publication years, to enable controlled temporal evaluation of outdated evidence, and iii) a comparative analysis of the frequency and impact of outdated or contradictory content on model-generated responses, assessing how LLMs integrate and reconcile temporally inconsistent information. Our findings show that contradictions between highly similar abstracts do, in fact, degrade performance, leading to inconsistencies and reduced factual accuracy in model answers. These results highlight that retrieval similarity alone is insufficient for reliable medical RAG and underscore the need for contradiction-aware filtering strategies to ensure trustworthy responses in high-stakes domains.
Presenters Saeedeh Javadi
PhD Candidate, RMIT University
Co-Authors
SM
Sara Mirabi
PhD Candidate, Deakin University
BO
Bahadorreza Ofoghi
Deakin University

Small Models, Big Picture! A Language Model Augmentation for Enhanced Reader-Aware Summarization

Full papersApplications Machine Learning and Large Language ModelsFull papers 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/30 08:30:00 UTC - 2026/03/30 10:30:00 UTC
Integrating heterogeneous modalities for effective information access remains a central challenge in Information Retrieval (IR), particularly in reader-aware summarization, where user perspectives must be incorporated alongside textual and multimedia content. In this work, we present a novel augmentation framework that combines the strengths of Language Models (LMs) and multimodal models to generate holistic news summaries. Our approach seamlessly integrates textual articles, visual evidence from images, user-generated comments, and distilled insights from video streams. Through extensive experiments, we show that this LM-ensembled multimodal framework consistently surpasses specialized Video Language Models (Video LMs) in terms of coherence, informativeness, and user-sensitivity across multiple benchmarks. To further advance multimodal IR research, we extend the Reader-Aware Multi-Document Summarization (RAMDS) dataset with video components, introducing VARAMDS (Video-Augmented-RAMDS), the first resource to explicitly couple news text, imagery, reader comments, and video content. Our findings demonstrate that LM-driven augmentation not only improves multimodal summarization quality but also sets a new standard for reader-aware, comment-sensitive synthesis, bridging gaps between heterogeneous information sources and supporting richer retrieval-oriented applications in resource-constrained environments.
Presenters Raghvendra Kumar
PhD Student 4th Year, Indian Institute Of Technology Patna
Co-Authors
AP
A S Poornash
Indian Institute Of Technology Patna
SS
Sriparna Saha
Associate Professor, Indian Institute Of Technology Patna

From Comments to Conclusions: Adaptive Reader-Aware Summary Generation in Low-Resource Languages via Agent Debate

Full papersApplications Machine Learning and Large Language ModelsFull papers 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/30 08:30:00 UTC - 2026/03/30 10:30:00 UTC
Reader-aware summarization distills articles while embedding user opinions and contextual grounding, shaping results that resonate with diverse readers and ease the challenge of extracting meaning from abundant news sources. However, research so far has centered on English and Chinese, with the complex multilingual and multimodal ecosystem of Indian news, shaped by articles, images, and user comments, still largely overlooked. Traditional single large language models (LLMs) often fail to integrate such heterogeneous evidence, yielding shallow or biased outputs. We introduce a Multi-Agent Debate (MAD) framework for reader-aware oriented summarization, built on the COSMMIC dataset, a multilingual, multimodal, and comment-sensitive resource for Indian news. MAD employs role-specialized agents (article analyst, comment integrator, image contextualizer, summary planner, and judge) that deliberate to produce a final summary, accompanied by a justification that attributes information to its source modality. This design not only enhances informativeness and factual consistency but also provides interpretability crucial for trustworthy Information Retrieval (IR) systems. Extensive automatic and human evaluations demonstrate that MAD significantly outperforms strong baselines in generating summaries that are more grounded, diverse, and aligned with reader context, especially in low-resource Indian languages.
Presenters Raghvendra Kumar
PhD Student 4th Year, Indian Institute Of Technology Patna
Co-Authors
MA
Mohammed Salman S A
National Institute Of Technology Tiruchirappalli
JV
Jaya Verma
Indian Institute Of Technology Patna
SS
Sriparna Saha
Associate Professor, Indian Institute Of Technology Patna

Prompt Compression in the Wild: Measuring Latency, Rate Adherence, and Quality for Faster LLM Inference

Full papersMachine Learning and Large Language ModelsFull papers 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/30 08:30:00 UTC - 2026/03/30 10:30:00 UTC
With the wide adaptation of Language Models for IR -- and specifically RAG systems -- the latency of the underlying LLM becomes a crucial bottleneck, since the long contexts of retrieved passages leads large prompts and therefore, compute increase. Prompt compression, which reduces the size of input prompts while aiming to preserve performance on downstream tasks, has established itself as a cost-effective and low-latency method for accelerating inference in large language models. However, its usefulness depends on whether the additional preprocessing time during generation is offset by faster decoding. We present the first systematic, large-scale study of this trade-off, with thousands of runs and 30.000 queries across several open-source LLMs and four GPU classes. Our evaluation separates compression overhead from decoding latency while tracking output quality and memory usage. LLMLingua achieves up to 18% end-to-end speed-ups, when prompt length, compression ratio, and hardware capacity are well matched, with response quality remaining statistically unchanged across summarization, code generation, and question answering tasks. Outside this operating window, however, the compression step dominates and cancels out the gains. We also show that effective compression can reduce memory usage enough to offload workloads from data center GPUs to commodity cards, with only a 0.3s increase in latency. Our open-source profiler predicts the latency break-even point for each model--hardware setup, providing practical guidance on when prompt compression delivers real-world benefits.
Presenters
CK
Cornelius Kummer
TU Dresden
Co-Authors
LJ
Lena Jurkschat
Research Associate, TU Dresden, ScaDS.AI
MF
Michael F?rber
ScaDS.AI & TU Dresden
SV
Sahar Vahdati
TU Dresden

ExpertMix: Aspect and Severity Detection in Conversational Complaints

Full papersApplications Machine Learning and Large Language ModelsFull papers 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/30 08:30:00 UTC - 2026/03/30 10:30:00 UTC
Presenters
SD
Sarmistha Das
Indian Institute Of Technology Patna
Co-Authors
AS
Apoorva Singh
Fondazione Bruno Kessler, Italy
RS
Rishu Kumar Singh
IIT Patna
NS
Navneet Shreya
National Institute Of Technology Patna
SS
Sriparna Saha
Associate Professor, Indian Institute Of Technology Patna

MemTool: Optimizing Short-Term Memory Management for Dynamic Tool Retrieval and Invocation in LLM Agent Multi-Turn Conversations

Full papersSearch and ranking System aspectsFull papers 10:30 AM - 12:30 PM (Europe/Amsterdam) 2026/03/30 08:30:00 UTC - 2026/03/30 10:30:00 UTC
Presenters
EL
Elias Lumer
Lead AI Researcher, PricewaterhouseCoopers U.S.
31 visits

Session Participants

User Online
Session speakers, moderators & attendees
PhD Candidate
,
RMIT University
PhD Student 4th Year
,
Indian Institute Of Technology Patna
Indian Institute of Technology Patna
Lead AI Researcher
,
PricewaterhouseCoopers U.S.
University Of Brest
No attendee has checked-in to this session!
11 attendees saved this session

Session Chat

Live Chat
Chat with participants attending this session

Questions & Answers

Answered
Submit questions for the presenters

Session Polls

Active
Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.