Less is More: Benchmarking LLM Based Recommendation Agents

Apr 1, 2026·

Mahalakshmi Venkateswarlu

UC Santa Cruz Collaborator

· 1 min read

PDF Code Slides

Abstract

Large Language Model (LLM) based recommendation agents have shown promising results but typically rely on large context windows filled with user history. In this paper, we benchmark multiple LLM-based recommendation agents across varying context lengths, investigating the relationship between context size, recommendation quality, and cost. Our experiments across DeepSeek V3, GPT-4o-mini, and Gemini 2.0 Flash on the REGEN and MovieLens datasets demonstrate that minimal context achieves comparable recommendation quality to full-history approaches, while yielding substantial cost savings. Our findings suggest that “less is more” — carefully curated, minimal context can match or exceed the performance of exhaustive context, with significant efficiency gains.

Type

Conference paper

Publication

In The ACM Web Conference 2026 (WWW 2026) — LARS Workshop

Note

Oral Presentation at the LARS Workshop, ACM The Web Conference 2026 (WWW 2026).

This paper investigates the effect of context length on LLM-based recommendation agent performance and cost. Using a multi-model, multi-dataset experimental framework spanning DeepSeek V3, GPT-4o-mini, and Gemini 2.0 Flash across REGEN and MovieLens datasets, we find that minimal context achieves similar recommendation quality to full-history approaches — with significantly lower inference costs.

Key findings:

Minimal context can match full-history recommendation quality
Large cost savings are achievable without sacrificing accuracy
Results are consistent across multiple LLMs and datasets

Last updated on Apr 1, 2026

No results found

Less is More: Benchmarking LLM Based Recommendation Agents