Less is More: Benchmarking LLM Based Recommendation Agents

Abstract
Large Language Model (LLM) based recommendation agents have shown promising results but typically rely on large context windows filled with user history. In this paper, we benchmark multiple LLM-based recommendation agents across varying context lengths, investigating the relationship between context size, recommendation quality, and cost. Our experiments across DeepSeek V3, GPT-4o-mini, and Gemini 2.0 Flash on the REGEN and MovieLens datasets demonstrate that minimal context achieves comparable recommendation quality to full-history approaches, while yielding substantial cost savings. Our findings suggest that “less is more” — carefully curated, minimal context can match or exceed the performance of exhaustive context, with significant efficiency gains.
Type
Publication
In The ACM Web Conference 2026 (WWW 2026) — LARS Workshop
Note
Oral Presentation at the LARS Workshop, ACM The Web Conference 2026 (WWW 2026).
This paper investigates the effect of context length on LLM-based recommendation agent performance and cost. Using a multi-model, multi-dataset experimental framework spanning DeepSeek V3, GPT-4o-mini, and Gemini 2.0 Flash across REGEN and MovieLens datasets, we find that minimal context achieves similar recommendation quality to full-history approaches — with significantly lower inference costs.
Key findings:
- Minimal context can match full-history recommendation quality
- Large cost savings are achievable without sacrificing accuracy
- Results are consistent across multiple LLMs and datasets