Evaluation benchmark
Also known as: eval harness, benchmark
Standard tasks and metrics used to compare models; enterprise programs add private evals aligned to internal KPIs, not only public leaderboards.
Also known as: eval harness, benchmark
Standard tasks and metrics used to compare models; enterprise programs add private evals aligned to internal KPIs, not only public leaderboards.
Contact if you need a term added for a security or procurement review.