1
Research & Analysis

Benchmark design

Published Oct 22, 2025 Original by Lilian Weng Shared by Prompt Ranker Source
Optimised for: GPT-4
v1.0 Oct 22, 2025 · 20:10 by Prompt Ranker
Add version
Design benchmark for [capability]. Define: Tasks, Metrics, Baseline, Data sources, Evaluation protocol, Success criteria.
Version Notes
Benchmark design