Robuta

https://www.ycombinator.com/companies/datacurve Datacurve: Frontier coding data for training and evaluating LLMs | Y Combinator Frontier coding data for training and evaluating LLMs. Founded in 2024 by Serena Ge and Charley Lee, Datacurve has 4 employees based in San Francisco, CA, USA. for trainingevaluating llmsdatacurvefrontiercoding https://arxiv.org/abs/2403.04132 [2403.04132] Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Abstract page for arXiv paper 2403.04132: Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference an open platform https://tldr.takara.ai/p/2509.22991 ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning |... We introduce ADAM (A Diverse Archive of Mankind), a framework for evaluating and improving multimodal large language models (MLLMs) in biographical reasoning... https://tldr.takara.ai/p/2506.04557 SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African... Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often suffer from limi... https://sol.sbc.org.br/index.php/webmedia_estendido/article/view/38209 Evaluating Zero-shot Reasoning with Agentic LLMs for Smart Contract Vulnerability Detection | Anais... https://pure.kfupm.edu.sa/en/publications/evaluating-multi-modal-llms-for-automatically-recognizing-semanti/fingerprints/ Evaluating Multi-Modal LLMs for Automatically Recognizing Semantic Elements in UML Use Case Diagram...