HELM
HELM is a large model evaluation system introduced by Stanford University. The evaluation methodology consists of three main modules: scenarios, fitness, and metrics, and each evaluation run requires the specification of a scenario, a prompt to fit the model, and one or more metrics.