Master index
URL: https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx
Dataset description:
This dataset consists of a complete set of augmented index files for CC-MAIN-2019-35 [1]. This version of the index contains one additional field, lastmod, in about 18% of the entries,...
Source: Common Crawl URL index for August 2019 with Last-Modified timestamps
Additional Information
| Field | Value | 
|---|---|
| Data last updated | April 30, 2024 | 
| Metadata last updated | October 28, 2024 | 
| Created | April 30, 2024 | 
| Has views | True | 
| Id | 7e485f0c-d480-43e9-8cb7-9540a3d3dbc9 | 
| Package id | 51508b86-8bbe-43aa-aa07-4980baea1af6 | 
| Position | 0 | 
| Resource:access url | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx | 
| Resource:description | Master index for CC-MAIN-2019-35 augmented with lastmod timestamp | 
| Resource:documentation | https://doi.org/10.48550/arXiv.2404.09770 | 
| Resource:download url | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx | 
| Resource:format | TSV | 
| Resource:identifier | cluster.idx | 
| Resource:licence | CC-BY 2024 Henry S. Thompson | 
| State | active | 
| Access URL | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx | 
| Description | Master index for CC-MAIN-2019-35 augmented with lastmod timestamp | 
| Format | TSV | 
| Unique Identifier | cluster.idx | 
| Licence | CC-BY 2024 Henry S. Thompson | 
| Documentation | https://doi.org/10.48550/arXiv.2404.09770 | 
| Download URL | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx |