Robuta

Sponsor of the Day: Jerkmate
https://www.sandia.gov/research/publications/details/binsimdb-benchmark-dataset-construction-for-fine-grained-binary-code-simila-2026-01-01/ BinSimDB: Benchmark Dataset Construction for Fine-Grained Binary Code Similarity Analysis –... benchmark datasetfine grainedbinary codeconstructionsimilarity https://ieeexplore.ieee.org/document/8917818/ An Underwater Image Enhancement Benchmark Dataset and Beyond | IEEE Journals & Magazine | IEEE... Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image en ieee journals magazineimage enhancementbenchmark datasetunderwaterbeyond https://proceedings.neurips.cc/paper_files/paper/2024/hash/69d97a6493fbf016fff0a751f253ad18-Abstract-Datasets_and_Benchmarks_Track.html NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security open source benchmarkevaluating llmsoffensive securitynyuctf https://aclanthology.org/2026.eacl-long.244/ ConvApparel: A Benchmark Dataset and Validation Framework for User Simulators in Conversational... Ofer Meshi, Krisztian Balog, Sally Goldman, Avi Caciularu, Guy Tennenholtz, Jihwan Jeong, Amir Globerson, Craig Boutilier. Proceedings of the 19th Conference... benchmark datasetvalidationframeworkusersimulators https://www.semanticscholar.org/search?q=ALHD%3A+A+Large-Scale+and+Multigenre+Benchmark+Dataset+for+Arabic+LLM-Generated+Text+Detection. ALHD: A Large-Scale and Multigenre Benchmark Dataset for Arabic LLM-Generated Text Detection. |... An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease. large scalebenchmark datasetllm generatedtext detectionmultigenre https://labs.scale.com/leaderboard/swe_bench_pro_public SWE-Bench Pro Leaderboard AI Coding Benchmark (Public Dataset) | Scale Apr 25, 2026 - Compare the resolve rates of GPT-5.4, Muse Spark, Claude Opus 4.6, and Gemini 3.1 Pro on SWE-Bench Pro. A rigorous AI software engineering benchmark for... swe bench proai codingleaderboardbenchmarkpublic https://research.feedzai.com/publication/benchmark-it-yourself-biy-preparing-a-dataset-and-benchmarking-ai-models-for-scatterplot-related-tasks/ Benchmark It Yourself (BIY): Preparing a Dataset and Benchmarking AI Models for Scatterplot-Related... Nov 11, 2025 - AI models are increasingly used for data analysis and visualization, yet benchmarks rarely address scatterplot-specific tasks, limiting insight into... benchmarking aibiypreparingdatasetmodels https://writer.com/engineering/omniact-dataset-benchmark-multimodal-autonomous-agents/ OmniACT: A dataset and benchmark for enabling multimodal generalist autonomous agents for desktop... Dec 11, 2024 - Discover OmniACT, a novel dataset and benchmark for evaluating multimodal generalist autonomous agents on desktop and web applications. autonomous agentsdatasetbenchmarkenablingmultimodal