https://commoncrawl.org/
Common Crawl - Open Repository of Web Crawl Data
We build and maintain an open repository of web crawl data that can be accessed and analyzed by anyone.
common crawlopen repositorywebdata
https://commoncrawl.github.io/cc-crawl-statistics/
Statistics of Common Crawl Monthly Archives by commoncrawl
common crawlmonthly archivesstatistics
https://data.commoncrawl.org/crawl-data/CC-MAIN-2022-33/index.html
Common Crawl August 2022 Crawl Archive (CC-MAIN-2022-33)
common crawlaugustarchiveccmain
https://commoncrawl.github.io/cc-webgraph-statistics/
Common Crawl Web Graph Statistics
Interactive visualisations, domain and host rankings, and graph metrics from the Common Crawl Web Graph dataset. Explore PageRank, harmonic centrality, and...
common crawlwebgraphstatistics
https://huggingface.co/commoncrawl
commoncrawl (Common Crawl Foundation)
Crawled data and metadata
common crawlfoundation
https://groups.google.com/g/common-crawl
Common Crawl - Google Groups
common crawlgooglegroups
https://data.commoncrawl.org/crawl-data/CC-MAIN-2020-29/index.html
Common Crawl July 2020 Crawl Archive (CC-MAIN-2020-29)
common crawljulyarchiveccmain
https://commoncrawl.org/blog/november-2025-crawl-archive-now-available
Common Crawl - Blog - November 2025 Crawl Archive Now Available
We are pleased to announce that the crawl archive for November 2025 is now available, containing 2.29 billion web pages or 378 TiB of uncompressed content.
common crawlblognovemberarchiveavailable
https://commoncrawl.org/blog/providing-authenticity-data-provenance-for-common-crawl-using-blockchain-our-work-with-constellation-network
Common Crawl - Blog - Providing Authenticity & Data Provenance for Common Crawl Using Blockchain:...
In 2024, the Common Crawl Foundation and Constellation Network announced a groundbreaking partnership to enhance data integrity and transparency across the...
common crawldata provenanceblogprovidingauthenticity
https://data.commoncrawl.org/crawl-data/CC-MAIN-2018-34/index.html
Common Crawl August 2018 Crawl Archive (CC-MAIN-2018-34)
common crawlaugustarchiveccmain
https://data.commoncrawl.org/crawl-data/CC-MAIN-2025-38/index.html
Common Crawl September 2025 Crawl Archive (CC-MAIN-2025-38)
common crawlseptemberarchiveccmain
https://blog.commoncrawl.org/errata/warc-content-type-header-in-revisit-records
Common Crawl - Erratum - WARC Content-Type header in revisit records
Common Crawl's WARC revisit records use Content-Type: message/http (following the WARC 1.1 spec's example), but per iipc/warc-specifications#55 it should be...
common crawlcontent typeerratumwarcheader
https://www.johnswaterproofing.com/crawl-space-repair/crawl-space-problems.html
Common Crawl Space Issues in Greater Portland | Wood Rot & Uneven Floors
John's Waterproofing can solve a variety of crawl space problems, including wood damage, uneven floors, and more in Portland, Vancouver, Salem, and nearby....
crawl space issuesgreater portlandwood rotcommon
https://data.commoncrawl.org/crawl-data/CC-MAIN-2016-18/index.html
Common Crawl April 2016 Crawl Archive (CC-MAIN-2016-18)
common crawlaprilarchiveccmain
https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-39/index.html
Common Crawl September 2019 Crawl Archive (CC-MAIN-2019-39)
common crawlseptemberarchiveccmain
https://data.commoncrawl.org/crawl-data/CC-MAIN-2021-21/index.html
Common Crawl May 2021 Crawl Archive (CC-MAIN-2021-21)
common crawlmayarchiveccmain
https://www.commoncrawl.org/blog/august-2017-crawl-archive-now-available
Common Crawl - Blog - August 2017 Crawl Archive Now Available
The crawl archive for August 2017 is now available! The archive contains 3.28 billion+ web pages and over 280 TiB of uncompressed content.
common crawlblogaugustarchiveavailable
https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-26/index.html
Common Crawl June 2019 Crawl Archive (CC-MAIN-2019-26)
common crawljunearchiveccmain
https://commoncrawl.github.io/cc-crawl-statistics/plots/domains.html
Statistics of Common Crawl Monthly Archives by commoncrawl
common crawlmonthly archivesstatistics
https://data.commoncrawl.org/crawl-data/CC-MAIN-2022-21/index.html
Common Crawl May 2022 Crawl Archive (CC-MAIN-2022-21)
common crawlmayarchiveccmain
https://www.commoncrawl.org/blog/february-2019-crawl-archive-now-available
Common Crawl - Blog - February 2019 crawl archive now available
The crawl archive for February 2019 is now available! It contains 2.9 billion web pages or 225 TiB of uncompressed content, crawled between February 15th and...
common crawlblogfebruaryarchiveavailable
https://data.commoncrawl.org/crawl-data/CC-MAIN-2022-05/index.html
Common Crawl January 2022 Crawl Archive (CC-MAIN-2022-05)
common crawljanuaryarchiveccmain
https://data.commoncrawl.org/crawl-data/CC-MAIN-2021-43/index.html
Common Crawl October 2021 Crawl Archive (CC-MAIN-2021-43)
common crawloctoberarchiveccmain
https://data.commoncrawl.org/crawl-data/CC-MAIN-2018-30/index.html
Common Crawl July 2018 Crawl Archive (CC-MAIN-2018-30)
common crawljulyarchiveccmain
https://data.commoncrawl.org/crawl-data/CC-MAIN-2024-46/index.html
Common Crawl November 2024 Crawl Archive (CC-MAIN-2024-46)
common crawlnovemberarchiveccmain
https://commoncrawl.org/blog/february-2020-crawl-archive-now-available
Common Crawl - Blog - February 2020 crawl archive now available
The crawl archive for February 2020 is now available! It contains 2.6 billion web pages or 240 TiB of uncompressed content, crawled between February 16th and...
common crawlblogfebruaryarchiveavailable
https://data.commoncrawl.org/crawl-data/CC-MAIN-2021-04/index.html
Common Crawl January 2021 Crawl Archive (CC-MAIN-2021-04)
common crawljanuaryarchiveccmain
https://commoncrawl.org/blog
Common Crawl - Blog
Explore Common Crawl's latest updates, insights, and stories. Stay informed on web data trends and our community's impact.
common crawlblog
https://data.commoncrawl.org/crawl-data/CC-MAIN-2026-12/index.html
Common Crawl March 2026 Crawl Archive (CC-MAIN-2026-12)
common crawlmarcharchiveccmain
https://commoncrawl.org/web-graphs
Common Crawl - Web Graphs
Detailing Common Crawl's Web Graph releases, the technology behind them, and how to use them.
common crawlwebgraphs
https://data.commoncrawl.org/crawl-data/CC-MAIN-2018-22/index.html
Common Crawl May 2018 Crawl Archive (CC-MAIN-2018-22)
common crawlmayarchiveccmain
https://data.commoncrawl.org/crawl-data/CC-MAIN-2020-50/index.html
Common Crawl November/December 2020 Crawl Archive (CC-MAIN-2020-50)
common crawlnovember decemberarchiveccmain
https://www.commoncrawl.org/blog/june-2017-crawl-archive-now-available
Common Crawl - Blog - June 2017 Crawl Archive Now Available
The crawl archive for June 2017 is now available! The archive contains 3.16 billion+ web pages and over 260 TiB of uncompressed content.
common crawlblogjunearchiveavailable
https://www.redeemersgroup.com/crawl-space-repair/crawl-space-problems.html
Common Crawl Space Issues in Tri-State Area | Wood Rot & Uneven Floors
Redeemers Group can solve a variety of crawl space problems, including wood damage, uneven floors, and more in Memphis, Jonesboro, Little Rock, and nearby....
crawl space issues
https://www.heliumscraper.com/forum/viewtopic.php?f=18&t=37787&p=42450&sid=b82589235c718b95ff0472bb3ff37c8d
Common Crawl (URL Finder) - Helium Scraper
common crawlurlfinderheliumscraper
https://commoncrawl.github.io/cc-examples/
Examples & Resources | Common Crawl
Browse tools, code examples, articles, videos and presentations for working with Common Crawl open web crawl data.
examplesresourcescommoncrawl
https://www.digisaurier.de/common-crawl-der-wirklich-komplette-web-kosmos-in-einer-datenbank/cc_google/
Auch Google bietet Tools zur Analyse der Common-Crawl-Datenbank (Screenshot) - Der Digisaurier
Aug 15, 2023 - Auch Google bietet Tools zur Analyse der Common-Crawl-Datenbank (Screenshot)
zur analyse
https://commoncrawl.org/blog/host--and-domain-level-web-graphs-may-june-and-july-2024
Common Crawl - Blog - Host- and Domain-Level Web Graphs May, June, and July 2024
We are pleased to announce a new release of host-level and domain-level Web Graphs based on the crawls of May, June, and July 2024.