Small Language Models vs Large Language Models: Energy Benefits and Trade-offs

MSc Thesis Wageningen University & Research Open

Smaller models are often assumed to be greener. But when do SLMs actually provide real energy benefits, and what do we lose or gain in quality and usability?

The Problem

Small Language Models (SLMs) are increasingly promoted as efficient alternatives to Large Language Models (LLMs). However, energy comparisons are often incomplete or context-dependent: gains can vary with task type, prompt design, output length, hardware, and serving setup.

Without rigorous empirical evidence, teams risk making model choices based on assumptions rather than measurable sustainability outcomes.

What This Thesis Is About

This thesis investigates the energy benefits and trade-offs of SLMs versus LLMs under realistic inference scenarios. The objective is to quantify where SLMs provide meaningful energy savings, where they do not, and how those patterns relate to task performance and deployment constraints.

This is a strong fit for students who want to work on practical Green AI questions with direct relevance for industry and research decision-making.

Why This Topic Is Exciting

You work on a timely and high-impact question for sustainable AI deployment.
You build hands-on expertise in benchmarking, model evaluation, and energy-aware experimentation.
You produce practical recommendations teams can use to choose greener model configurations.

Objectives

Design a fair comparison protocol for SLM and LLM evaluation across selected tasks and settings.
Measure and compare energy consumption during inference under controlled workloads.
Analyze energy-quality trade-offs to identify scenarios where SLMs are clearly beneficial or insufficient.
Propose decision guidelines for selecting model size based on sustainability and performance requirements.

What You Will Do

Phase	Work
Literature & setup	Review existing SLM/LLM efficiency work; select tasks, models, and metrics
Benchmark design	Define controlled experiment settings (prompts, decoding, hardware, repetitions)
Data collection	Run SLM and LLM evaluations while logging quality and energy metrics
Analysis	Quantify trade-offs and identify conditions where SLMs offer robust benefits
Outputs	Deliver thesis, reproducible benchmark scripts, and practical model-selection guidance

What You Gain

A portfolio project at the intersection of LLM systems, benchmarking, and sustainability.
Practical skills in reproducible experimentation, measurement tooling, and quantitative trade-off analysis.
Experience producing evidence-based recommendations useful for PhD or industry career paths.

Who Should Apply

Required courses:

Programming in Python (INF-22306)
Machine Learning (FTE-35306)
Big Data (INF-33806)

Required skills & mindset:

Solid programming and data analysis skills
Interest in AI systems and sustainability
Curiosity for empirical experimentation and benchmarking
Motivation to work independently while discussing progress regularly

Key References

Verdecchia, R., Sallou, J., & Cruz, L. (2023). A systematic review of Green AI. WIREs Data Mining and Knowledge Discovery, 13(4), e1507. doi:10.1002/widm.1507
Luccioni, A. S., Jernite, Y., & Strubell, E. (2024). Power Hungry Processing: Watts Driving the Cost of AI Deployment? FAccT ‘24. doi:10.1145/3630106.3658542
Kim, B., Choi, Y., Mei, H., et al. (2025). The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization. arXiv:2505.06371. arXiv:2505.06371
Tripathy, A., Harshit, C. P., & Vaidhyanathan, K. (2025). SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs. arXiv:2512.09543. arXiv:2512.09543

Supervisors

June Sallou