Reproducibility I: Recommender Systems

Loading Session...

Session Information

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities
RecRankerEval: A Reproducible Framework for Deploying and Evaluating LLM-based Top-$k$ Recommenders
Efficient Optimization of Hierarchical Identifiers for Generative Recommendation
A Reproducible and Fair Evaluation of Partition-aware Collaborative Filtering
A Systematic Reproducibility Study of BSARec for Sequential Recommendation

Reproducibility

Mar 30, 2026 14:30 - 15:30(Europe/Amsterdam)

Venue : Chaos

20260330T1430 20260330T1530 Europe/Amsterdam Reproducibility I: Recommender Systems Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual ModalitiesRecRankerEval: A Reproducible Framework for Deploying and Evaluating LLM-based Top-$k$ RecommendersEfficient Optimization of Hierarchical Identifiers for Generative RecommendationA Reproducible and Fair Evaluation of Partition-aware Collaborative FilteringA Systematic Reproducibility Study of BSARec for Sequential Recommendation Chaos ECIR2026 conference-secretariat@blueboxevents.nl

Add to my Schedule

Sub Sessions

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

ReproducibilityReproducibility 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC

ultimodal recommendation has emerged as a mainstream paradigm, typically leveraging text and visual embeddings extracted from pre-trained models such as Sentence-BERT, Vision Transformers, and ResNet. This approach is founded on the intuitive assumption that incorporating multimodal embeddings can enhance recommendation performance. However, despite its popularity, this assumption lacks comprehensive empirical verification. This presents a critical research gap. To address it, we pose the central research question of this paper: Are multimodal embeddings truly beneficial for recommendation?

Presenters

Co-Authors

RecRankerEval: A Reproducible Framework for Deploying andEvaluating LLM-based Top-$k$ Recommenders

Reproducibility 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC

Large Language Models (LLMs) have been shown to have promising effectiveness in recommender systems. RecRanker, a recent LLM-based recommendation model, has demonstrated strong results on the top-$k$ recommendation task. However, the contribution of each of its core components, namely user sampling, initial ranking list generation, prompt construction, and an instruction tuning strategy, remains underexplored. In this work, we inspect the reproducibility of RecRanker, and study the impact and role of its various components in recommendation performance. We begin by reproducing the RecRanker's pipeline through the implementation of all its key components. Our reproduction shows that the pairwise and listwise instruction tuning methods achieve a performance comparable to that reported in the original paper. For the pointwise method, while we are also able to reproduce the original paper¡¯s results, further analysis shows that the abnormal high performance due to data leakage from the inclusion of ground-truth information in the prompts. To enable a fair and comprehensive evaluation of LLM-based top-$k$ recommendations, we propose RecRankerEval, an extensible framework that covers five key dimensions: user sampling strategy, initial recommendation model, LLM backbone, dataset selection, and instruction tuning method. Using the RecRankerEval framework, we show that the original results of RecRanker can be reproduced on the ML-100K and ML-1M datasets, as well as an additional Amazon-Music dataset, but not on BookCrossing due to the lack of timestamp information in the original RecRanker paper. Furthermore, we demonstrate that RecRanker's performance can be improved by employing alternative user sampling methods (e.g., DBSCAN), stronger initial recommenders (e.g., XSimGCL), and more capable LLMs (e.g., Llama3).

Presenters

Co-Authors

Efficient Optimization of Hierarchical Identifiers forGenerative Recommendation

Reproducibility 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC

SEATER is a generative retrieval model that improves recommendation inference efficiency and retrieval quality by utilizing balanced tree-structured item identifiers and contrastive training objectives. We reproduce and validate SEATER¡¯s reported improvements in retrieval quality over strong baselines across all datasets from the original work, and extend the evaluation to Yambda, a large-scale music recommendation dataset. Our experiments verify SEATER¡¯s strong performance, but show that its tree construction step during training becomes a major bottleneck as the number of items grows. To address this, we implement and evaluate two alternative construction algorithms: a greedy method optimized for minimal build time, and a hybrid method that combines greedy clustering at high levels with more precise grouping at lower levels. The greedy method reduces tree construction time to less than 2% of the original with only a minor drop in quality on the dataset with the largest item collection. The hybrid method achieves retrieval quality on par with the original, and even improves on the largest dataset, while cutting construction time to just 5¨C8%. All data and code are publicly available for full reproducibility at https://anonymous.4open.science/r/re-seater-8003/.

Presenters

Co-Authors

A Reproducible and Fair Evaluation of Partition-awareCollaborative Filtering

Reproducibility 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC

Similarity-based collaborative filtering (CF) models have long demonstrated strong offline performance and conceptual simplicity. However, their scalability is limited by the quadratic cost of maintaining dense item¨Citem similarity matrices. Partitioning-based paradigms have recently emerged as an effective strategy to balance effectiveness and efficiency, allowing models to learn local similarities within coherent subgraphs while maintaining limited global context. In this work, we focus on the Fine-tuning Partition-aware Similarity Refinement (FPSR) framework, a prominent representative of this family, and its extension FPSR+. Reproducible evaluation of partition?aware collaborative filtering remains challenging, as prior FPSR/FPSR+ reports often rely on splits of unclear provenance and omit some similarity?based baselines, complicating fair comparison. We present a transparent, fully reproducible benchmark of FPSR and FPSR+. Based on our results, the family of FPSR models does not consistently perform at the highest level. Overall, it remains competitive, validates its design choices, and shows significant advantages in long-tail scenarios. This highlights the accuracy¨Ccoverage trade-offs resulting from partitioning, global components, and hub design. Our investigation clarifies when partition?aware similarity modeling is most beneficial and offers actionable guidance for scalable recommender system design under reproducible protocols. Source code at https://split.to/rep_ecir.

Presenters

Co-Authors

A Systematic Reproducibility Study of BSARec for SequentialRecommendation

Reproducibility 02:30 PM - 03:30 PM (Europe/Amsterdam) 2026/03/30 12:30:00 UTC - 2026/03/30 13:30:00 UTC

In sequential recommendation (SR), the self-attention mechanism of Transformer-based models acts as a low-pass filter, limiting their ability to capture high-frequency signals that reflect short-term user interests. To overcome this, BSARec augments the Transformer encoder with a frequency layer that rescales high-frequency components using the Fourier transform. However, the overall effectiveness of BSARec and the roles of its individual components have yet to be systematically validated. We reproduce BSARec and show that it outperforms other SR methods on some datasets. To empirically assess whether BSARec improves performance on high-frequency signals, we propose a metric to quantify user history frequency and evaluate SR methods across different user groups. We compare digital signal processing (DSP) techniques and find that the discrete wavelet transform (DWT) offer only slight improvements over Fourier transforms, and DSP methods provide no clear advantage over simple residual connections. Finally, we explore padding strategies and find that non-constant padding significantly improves recommendation performance, whereas constant padding hinders the frequency rescaler¡¯s ability to capture high-frequency signals.

Presenters

Co-Authors

185 visits

Session Participants

User Online

Session speakers, moderators & attendees

Yu Ye

University of Glasgow

Zeyuan Meng

PhD student

University Of Glasgow

Federica Valeau

Universiteit van Amsterdam

Domenico De Gioia

Politecnico di Bari

Jan Hutter

MSc Student

Universiteit Van Amsterdam

Dr. Chuan Meng

Postdoctoral Researcher

The University Of Edinburgh

No attendee has checked-in to this session!

25 attendees saved this session

Session Chat

Live Chat

Chat with participants attending this session

Questions & Answers

Answered

Submit questions for the presenters

Session Polls

Active

Participate in live polls

Need Help?

Technical Issues?

If you're experiencing playback problems, try adjusting the quality or refreshing the page.

Questions for Speakers?

Use the Q&A tab to submit questions that may be addressed in follow-up sessions.

Reproducibility I: Recommender Systems

Session Information

Sub Sessions

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

RecRankerEval: A Reproducible Framework for Deploying andEvaluating LLM-based Top-$k$ Recommenders

Efficient Optimization of Hierarchical Identifiers forGenerative Recommendation

A Reproducible and Fair Evaluation of Partition-awareCollaborative Filtering

A Systematic Reproducibility Study of BSARec for SequentialRecommendation

Session Participants

Session Chat

Questions & Answers

Session Polls

Need Help?

Please enter the four digit secret code The secret code should have been announced or displayed at the session location.

AI-generated Summary

Please enter the four digit secret code
The secret code should have been announced or displayed at the session location.