Index shards directory
URL: https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index?prefix=idx
Dataset description:
This dataset consists of a complete set of augmented index files for CC-MAIN-2019-35 [1]. This version of the index contains one additional field, lastmod, in about 18% of the entries,...
Source: Common Crawl URL index for August 2019 with Last-Modified timestamps
Additional Information
| Field | Value |
|---|---|
| Data last updated | April 30, 2024 |
| Metadata last updated | October 28, 2024 |
| Created | April 30, 2024 |
| Has views | True |
| Id | d32f9d4e-1cc5-42f1-91f5-db3eb35a1577 |
| Package id | 51508b86-8bbe-43aa-aa07-4980baea1af6 |
| Position | 1 |
| Resource:access url | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index?prefix=idx |
| Resource:description | Index shard 000 through 299 for CC-MAIN-2019-35 augmented with lastmod timestamp |
| Resource:documentation | https://doi.org/10.48550/arXiv.2404.09770 |
| Resource:download url | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index?prefix=idx |
| Resource:format | gzipped space-separated + json |
| Resource:identifier | idx/ |
| Resource:licence | CC-BY 2024 Henry S. Thompson |
| State | active |
| Access URL | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index?prefix=idx |
| Description | Index shard 000 through 299 for CC-MAIN-2019-35 augmented with lastmod timestamp |
| Format | gzipped space-separated + json |
| Unique Identifier | idx/ |
| Licence | CC-BY 2024 Henry S. Thompson |
| Documentation | https://doi.org/10.48550/arXiv.2404.09770 |
| Download URL | https://s3.eidf.ac.uk/eidf125-cc-main-2019-35-augmented-index?prefix=idx |