Master index
URL: https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx
Dataset description:
This dataset consists of a complete set of augmented index files for CC-MAIN-2019-35 [1]. This version of the index contains one additional field, lastmod, in about 18% of the entries,...
Source: Common Crawl URL index for August 2019 with Last-Modified timestamps
Additional Information
| Field | Value |
|---|---|
| Data last updated | April 30, 2024 |
| Metadata last updated | October 28, 2024 |
| Created | April 30, 2024 |
| Has views | True |
| Id | 7e485f0c-d480-43e9-8cb7-9540a3d3dbc9 |
| Package id | 51508b86-8bbe-43aa-aa07-4980baea1af6 |
| Position | 0 |
| Resource:access url | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx |
| Resource:description | Master index for CC-MAIN-2019-35 augmented with lastmod timestamp |
| Resource:documentation | https://doi.org/10.48550/arXiv.2404.09770 |
| Resource:download url | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx |
| Resource:format | TSV |
| Resource:identifier | cluster.idx |
| Resource:licence | CC-BY 2024 Henry S. Thompson |
| State | active |
| Access URL | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx |
| Description | Master index for CC-MAIN-2019-35 augmented with lastmod timestamp |
| Format | TSV |
| Unique Identifier | cluster.idx |
| Licence | CC-BY 2024 Henry S. Thompson |
| Documentation | https://doi.org/10.48550/arXiv.2404.09770 |
| Download URL | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index/cluster.idx |