LLM评估 - Sesame Pie AI

arize.com

The Arize AI platform focuses on the observability of AI and machine learning, helping teams monitor, debug and optimize AI models and large language models in production environments. It provides real-time monitoring, performance tracking, and LLM evaluation, and supports a wide range of model types and mainstream providers for a variety of industries, including finance, e-commerce, and autonomous driving.

AI可观测性 AI研究机构 Arize AI LLM评估

2026年4月15日 372 0

OpenCompass Sinan - Review List

OpenCompass LLM Leaderboard is an open source evaluation platform for Large Language Models, providing benchmark tests on over 100 datasets, covering dimensions such as knowledge, logic, math, and code. The list is updated in real-time to show the comprehensive performance ranking of open source and commercial models such as GPT-4, Claude, Qwen, etc., providing researchers and developers with an objective reference for model selection.

LLM评估 OpenCompass 基准测试大模型评测

2026年4月15日 421 0