https://huggingface.co/datasets?language=language%3Aabq
Explore datasets powering machine learning.
hugging facedatasetsabq
https://huggingface.co/datasets/Intel/polite-guard
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceintelpoliteguarddatasets
https://nij.ojp.gov/library/datasets-nij-funded-research
Since 1978, NIJ has been accumulating an archive of hundreds of data sets resulting from projects funded through research grant programs. NIJ partners with two...
funded researchnational institutedatasetsnijjustice
https://mlcommons.org/working-groups/data/datasets/
Aug 15, 2025 - The Datasets working group creates new datasets to fuel innovation in machine learning.
datasets
https://huggingface.co/datasets/google-research-datasets/mbpp
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
google researchhugging facedatasets
https://huggingface.co/datasets/HiDream-ai/ReCo-Data
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceairecodata
https://huggingface.co/datasets/Pageshift-Entertainment/LongPage
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceentertainmentdatasets
https://huggingface.co/datasets/facebook/crv
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facefacebookcrvdatasets
https://laravel-news.com/handling-large-datasets-with-pagination-and-cursors-in-laravel-mongodb
Feb 10, 2026 - Learn the difference between offset and cursor pagination in Laravel with MongoDB. Explore performance trade-offs, implementation examples, and when to use...
handlinglargedatasetspaginationcursors
https://huggingface.co/datasets/google/LoraxBench
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facegoogledatasets
https://huggingface.co/datasets/sentence-transformers/stsb
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facesentencetransformersdatasets
https://www.thinkautonomous.ai/blog/lidar-datasets/
A lot of things impress me about nature. The perfect symmetry of humans. The food chain and how every single animal or even insect is useful to the entire...
learn ampbestlidardatasetsprocess
https://huggingface.co/datasets/AI-companionship/INTIMA
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceaicompanionshipdatasets
https://huggingface.co/datasets/Nanbeige/ToolMind
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facedatasets
https://data.gov.au/data/dataset/?organization=city-of-greater-bendigo
datasetsgovau
https://data.gov.au/data/dataset/?organization=glenelg-shire-council
datasetsgovau
https://www.interline.io/blog/geojsonl-extracts/
GeoJSON Lines (geojsonl) is a simple, newline-delimited variant of GeoJSON that allows large datasets to be loaded with a much lower memory footprint and...
optimizedformatlargegeographicdatasets
https://docs.letta.com/guides/evals/concepts/datasets/
Create and manage evaluation datasets with test cases for systematic agent testing.
datasetslettadocs
https://huggingface.co/datasets/allenai/dolma3_longmino_mix-100B-1125
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
allenaimixdatasetshugging
https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
chatgpt promptshugging faceawesomedatasets
https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
evolinstructcodedatasetshugging
https://epi2me.nanoporetech.com/simulating-datasets/
Bioinformatics tools and pipelines, as with any analysis methods, need to be tested to ensure they…
sequencingdatasetsblog
https://huggingface.co/datasets/allenai/dolma3_dolmino_pool
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceallenaipooldatasets
https://agenthunt.io/agent/detail/huggingface-co/
Discover Hugging Face, the ultimate AI platform for machine learning innovation. Access thousands of pre-trained models, collaborate with a global community,...
open source aihugging faceamp toolsmodelsdatasets
https://huggingface.co/datasets/facebook/jepa-wms
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facefacebookwmsdatasets
https://www.edgeaifoundation.org/edgeai-content/leveling-the-playing-field-for-edge-ai-research-through-high-quality-datasets
Published by EDGE AI FOUNDATION Datasets & Benchmarks Working Group: Adam Fuks – NXP, Chair Petrut Bogdan – Innatera Vijay Janappa Reddi – Harvard...
edge ailevelingplayingfieldresearch
https://data.gov.au/data/dataset/?organization=surf-coast-shire-council
datasetsgovau
https://data.gov.au/data/dataset/?organization=city-of-boroondara
datasetsgovau
https://huggingface.co/datasets/microsoft/ChatBench
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facemicrosoftdatasets
https://hix-compare.org/
HIX Compare datasets provide information on nearly all plans offered in the health insurance marketplaces. The data provide information on premiums,...
health insurance plansdatasets
https://syntheticaidata.com/blog/introducing-visiondatasets/
syntheticAIdata helps businesses overcome the challenge of acquiring high-quality synthetic data for training their vision AI models.
computer visionlaunchsyntheticdatasets
https://huggingface.co/datasets/ma-xu/fine-t2i
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facexufinedatasets
https://huggingface.co/datasets/grammarly/medit
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facegrammarlymeditdatasets
https://huggingface.co/datasets/MiniMaxAI/OctoCodingBench
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceminimaxaidatasets
https://thegradient.pub/new-datasets-to-democratize-speech-recognition-technology-2/
MLCommons.org introduces two new public datasets for speech recognition. The People’s Speech is the first large-scale, permissively licensed ASR dataset that...
speech recognitionnewdatasetstechnology
https://data.gov.au/data/dataset/?organization=macedon-ranges-shire-council
datasetsgovau
https://www.bcorporation.net/en-us/news/blog/research-fellowship-call-for-proposals/
Apply to become a 1-year virtual research “fellow” at B Lab global, exploring our datasets on corporate sustainability and business governance efforts.
applybecomeyearvirtualresearch
https://www.amazon.science/code-and-datasets?trk=7c626d53-99ef-45c5-a7a1-ab10a9203963&sc_channel=el
Find the latest code and datasets from Amazon scientists and researchers, which have been released across GitHub and other platforms.
amazon sciencecodedatasets
https://www.404media.co/archivists-work-to-identify-and-save-the-thousands-of-datasets-disappearing-from-data-gov/
More than 2,000 datasets have disappeared from data.gov since Trump was inaugurated. But analyzing exactly what happened and where it went is going to take...
archivistsworkidentifysavethousands
https://huggingface.co/collections/amazon/chronos-models-and-datasets
Collection of artifacts related to Chronos pretrained models for time series forecasting.
chronosmodelsampdatasetsamazon
https://huggingface.co/datasets/Gourieff/ReActor
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facereactordatasets
https://huggingface.co/datasets/facebook/sam-3d-body-dataset
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facefacebooksambodydataset
https://mlops.community/%f0%9f%94%ad-improving-your-ml-datasets-with-galileo/
Apr 3, 2023 - The MLOps Community fills the swiftly growing need to share real-world Machine Learning Operations best practices from engineers in the field.
improvingmldatasetsgalileocommunity
https://huggingface.co/datasets/cais/hle
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facecaishledatasets
https://ru-brightdata.com/products/datasets/for-journalists
Sep 28, 2025 - Bright Data offers a free program to support journalists’ needs to retrieve public web data.
datasetsjournalists
https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
gpt ossalibabasuperiorreasoningsft
https://huggingface.co/datasets/Daniellesry/TransPhy3D
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facedatasets
https://mlcommons.org/datasets/
Dec 4, 2024 - Evaluating AI systems depends on rigorous, standardized test datasets. MLCommons builds open, large-scale, and diverse datasets. View more.
viewdatasetsprovided
https://ourworldindata.org/energy-missing-data
Nov 8, 2021 - What are the key datasets on global energy that the world needs, but are not publicly available?
missingdataenergylist
https://huggingface.co/datasets/roneneldan/TinyStories
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facedatasets
https://huggingface.co/datasets/bigai/TongSIM-Asset
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceassetdatasets
https://data.gov.au/data/dataset/?organization=city-of-port-phillip
datasetsgovau
https://langfuse.com/docs/evaluation/experiments/datasets
Use Langfuse Datasets to create structured experiments to test and benchmark LLM applications.
datasetslangfuse
https://huggingface.co/datasets/Idavidrein/gpqa
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facedatasets
https://humansignal.com/
Full-service dataset creation and enterprise data annotation software to build novel, compliant AI systems.
builddatasetsoneelse
https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-500k
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceodamixturedatasets
https://brightdata.co.kr/products/datasets/real-estate
Dec 14, 2025 - Buy real estate datasets. Tens of millions of records available from websites such as Zillow and Realtor. Gather accurate real estate data.
real estatedatasetsbuyrecords
https://www.bankofengland.co.uk/statistics/research-datasets
We have published a selection of datasets to crowdsource answers to our key research questions and support collaboration between our staff and external...
research datasetsbankengland
https://huggingface.co/datasets/microsoft/mediflow
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facemicrosoftdatasets
https://dev.ubiqsecurity.com/docs/datasets
Step-by-step instructions for creating Datasets
datasets
https://brightdata.es/products/datasets/for-journalists
Sep 28, 2025 - Bright Data offers a free program to support journalists’ needs to retrieve public web data.
datasetsjournalists
https://data.gov.au/data/dataset/?organization=yarra-ranges-council
datasetsgovau
https://data.gov.au/data/dataset/?organization=warrnambool-city-council
datasetsgovau
https://www.kdnuggets.com/tips-handling-large-datasets-python
Working with large datasets is common but challenging. Here are some tips to make working with large datasets in Python simpler.
tipshandlinglargedatasetspython
https://data.gov.au/data/dataset/?organization=central-goldfields-shire-council
datasetsgovau
https://pharma.molecularconnections.com/gold-standard-datasets/
Nov 20, 2023 - Gold Standard Datasets are proven to be highly accepted, accurate and reliable references with advanced data. Gold Standard Datasets can be successfully used
gold standarddatasetsmolecularconnections
https://imerit.net/resources/blog/from-edge-cases-to-exploits-why-red-teaming-needs-expert-vetted-datasets/
Oct 30, 2025 - Discover why expert-vetted red-teaming datasets are key to AI safety. Learn how iMerit empowers AI red-teaming with Ango Hub and Scholars.
red teamingexpertvetteddatasets
https://help.nightfall.ai/nightfall_policy_templates/sample_data
Use sample data sets provided by Nightfall to test Nightfall's detection capabilities.
sampledatasetsnightfalldocumentation
https://sdtimes.com/ai/sonar-announces-new-solution-to-optimize-training-datasets-for-coding-llms/
Oct 21, 2025 - Software Development News
sonarannouncesnewsolutionoptimize
https://huggingface.co/datasets/huggingface/CADS-dataset
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facehuggingfacedataset
https://data.gov.au/data/dataset/?organization=wyndham-city-council
datasetsgovau
https://voxel51.com/customers/kitro
FiftyOne is a key ingredient for Kitro in developing their ML models and curating datasets for training.
keyingredientdevelopingml
https://huggingface.co/datasets?language=language%3Aab
Explore datasets powering machine learning.
hugging facedatasetsab
https://gowinston.ai/setting-new-standards-in-ai-content-detection/
Dec 19, 2024 - Detailed analysis of Winston AI's latest model, highlighting our industry-leading approach in accurately identifying AI-generated texts.
fulltransparencyreleasingaccuracyrate
https://data.gov.au/data/dataset/?organization=alpine-shire-council
datasetsgovau
https://generated.photos/datasets
Discover high-quality image datasets for machine learning (ML). Optimized for accurate and efficient model training, free for academic research.
machine learningimagedatasets
https://huggingface.co/datasets/microsoft/SWE-Sharp-Bench
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging facemicrosoftswesharpbench
https://huggingface.co/datasets/Anthropic/hh-rlhf
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
hugging faceanthropichhrlhfdatasets
https://mlcommons.org/datasets/unsupervised-peoples-speech/
Mar 4, 2025 - The MLCommons People’s Speech Dataset contains 30,000 hours of conversational English speech recognition licensed for academic and commercial machine...
peoplespeechdataset
https://huggingface.co/datasets?language=language%3Aen
Explore datasets powering machine learning.
hugging facedatasetsen
https://data.gov.au/data/dataset/?organization=horsham-rural-city-council
datasetsgovau
https://data.gov.au/data/dataset/?organization=loddon-shire-council
datasetsgovau
https://data.gov.au/data/dataset/?organization=brimbank-city-council
datasetsgovau
https://huggingface.co/datasets?modality=modality%3Aimage
Explore datasets powering machine learning.
hugging faceimagedatasets