https://www.ycombinator.com/companies/datacurve
Datacurve: Frontier coding data for training and evaluating LLMs | Y Combinator
Frontier coding data for training and evaluating LLMs. Founded in 2024 by Serena Ge and Charley Lee, Datacurve has 4 employees based in San Francisco, CA, USA.
for trainingevaluating llmsdatacurvefrontiercoding
https://arxiv.org/abs/2403.04132
[2403.04132] Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Abstract page for arXiv paper 2403.04132: Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
an open platform
https://tldr.takara.ai/p/2509.22991
ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning |...
We introduce ADAM (A Diverse Archive of Mankind), a framework for evaluating and improving multimodal large language models (MLLMs) in biographical reasoning...
https://tldr.takara.ai/p/2506.04557
SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African...
Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often suffer from limi...
https://sol.sbc.org.br/index.php/webmedia_estendido/article/view/38209
Evaluating Zero-shot Reasoning with Agentic LLMs for Smart Contract Vulnerability Detection | Anais...
https://pure.kfupm.edu.sa/en/publications/evaluating-multi-modal-llms-for-automatically-recognizing-semanti/fingerprints/
Evaluating Multi-Modal LLMs for Automatically Recognizing Semantic Elements in UML Use Case Diagram...