As language models achieve widespread adoption, accurately assessing the quality of the outputs they generate remains a significant challenge. Whilst we’ve seen great success in measuring their logical, mathematical and coding abilities, we still struggle to assess their abilities within artistic domains like writing and painting.
In this talk we introduce “LLM-as-a-judge” as one approach for tackling this challenge, where we utilize language models to serve as human stand-ins in the evaluation process.
We will begin by covering the high-level conceptual framework we’ve been using, before moving on to a live demonstration of how these principles can be implemented in practice.
Dette er fra konferencen "Driving AI 2024".
AI-teknologi med overvægt til den nørdede side – åbent for alle med en nysgerrighed for AI's fremtid.
Thumbnailen er AI-genereret.