benchmark dataset - Robuta Search

https://www.sandia.gov/research/publications/details/binsimdb-benchmark-dataset-construction-for-fine-grained-binary-code-simila-2026-01-01/ BinSimDB: Benchmark Dataset Construction for Fine-Grained Binary Code Similarity Analysis –... benchmark dataset fine grained binary code construction similarity https://ieeexplore.ieee.org/document/8917818/ An Underwater Image Enhancement Benchmark Dataset and Beyond | IEEE Journals & Magazine | IEEE... Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image en ieee journals magazine image enhancement benchmark dataset underwater beyond https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security open source benchmark evaluating llms offensive security nyu ctf https://aclanthology.org/2026.eacl-long.244/ ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational... Ofer Meshi, Krisztian Balog, Sally Goldman, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Globerson, Craig Boutilier. Proceedings of the 19th Conference... benchmark dataset validation framework user simulators https://www.semanticscholar.org/search?q=ALHD%3A+A+Large-Scale+and+Multigenre+Benchmark+Dataset+for+Arabic+LLM-Generated+Text+Detection. ALHD: A Large-Scale and Multigenre Benchmark Dataset for Arabic LLM-Generated Text Detection. |... An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease. large scale benchmark dataset llm generated text detection multigenre https://labs.scale.com/leaderboard/swe_bench_pro_public SWE-Bench Pro Leaderboard AI Coding Benchmark (Public Dataset) | Scale Apr 25, 2026 - Compare the resolve rates of GPT-5.4, Muse Spark, Claude Opus 4.6, and Gemini 3.1 Pro on SWE-Bench Pro. A rigorous AI software engineering benchmark for... swe bench pro ai coding leaderboard benchmark public https://research.feedzai.com/publication/benchmark-it-yourself-biy-preparing-a-dataset-and-benchmarking-ai-models-for-scatterplot-related-tasks/ Benchmark It Yourself (BIY): Preparing a Dataset and Benchmarking AI Models for Scatterplot-Related... Nov 11, 2025 - AI models are increasingly used for data analysis and visualization, yet benchmarks rarely address scatterplot-specific tasks, limiting insight into... benchmarking ai biy preparing dataset models https://writer.com/engineering/omniact-dataset-benchmark-multimodal-autonomous-agents/ OmniACT: A dataset and benchmark for enabling multimodal generalist autonomous agents for desktop... Dec 11, 2024 - Discover OmniACT, a novel dataset and benchmark for evaluating multimodal generalist autonomous agents on desktop and web applications. autonomous agents dataset benchmark enabling multimodal