Common Crawl URL index for August 2019 with Last-Modified timestamps
Data and Resources
Additional Info
Field | Value |
---|---|
Contact point | ht@inf.ed.ac.uk |
Dataset privacy | Public |
Dataset access requirements | |
Landing Page | https://doi.org/10.48550/arXiv.2404.09770 |
Creator | |
Tags | |
Publisher | |
Geographical coverage | |
Start of time period covered by this dataset | |
End of time period covered by this dataset | |
Theme / Category | |
Access rights | |
Conforms To | |
Documentation | |
Publishing frequency | |
Language | |
Other identifiers | |
Provenance | Combines material from Common Crawl dataset CC-MAIN-2019-35 (see 'related dataset' metadata below): a) the columnar index and b) Last-Modified header values from those Response records having one in the WARC component |
Qualified Attribution | |
Qualified Relation | |
Related resources | |
Release or publication Date | |
Sample distribution of the dataset | |
A related dataset from which this dataset is derived | https://commoncrawl.org/blog/august-2019-crawl-archive-now-available |
Minimum spatial separation resolvable in the dataset (measured in metres) | |
Minimum time period | |
Dataset type | |
the most recent date on which the dataset was changed or modified | |
Version | |
A description of the differences between this version and a previous version of this dataset | |
Activity that generated the dataset |