跳到内容

夜深了,注意休息,愿你今夜好梦。

LLM评估

arize.com

The Arize AI platform focuses on the observability of AI and machine learning, helping teams monitor, debug and optimize AI models and large language models in production environments. It provides real-time monitoring, performance tracking, and LLM evaluation, and supports a wide range of model types and mainstream providers for a variety of industries, including finance, e-commerce, and autonomous driving.

2026年4月15日 372 0 浏览 372,收藏 0

OpenCompass Sinan - Review List

OpenCompass LLM Leaderboard is an open source evaluation platform for Large Language Models, providing benchmark tests on over 100 datasets, covering dimensions such as knowledge, logic, math, and code. The list is updated in real-time to show the comprehensive performance ranking of open source and commercial models such as GPT-4, Claude, Qwen, etc., providing researchers and developers with an objective reference for model selection.

2026年4月15日 421 0 浏览 421,收藏 0
正文
强调色