Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

403 Forbidden when downloading common crawl data #34

Open
velocityCavalry opened this issue Jul 11, 2022 · 3 comments
Open

403 Forbidden when downloading common crawl data #34

velocityCavalry opened this issue Jul 11, 2022 · 3 comments

Comments

@velocityCavalry
Copy link

Bug description
Hi, I was trying to download the supporting documents by running wget https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2018-34/wet.paths.gz, but it keeps on telling me

Resolving commoncrawl.s3.amazonaws.com (commoncrawl.s3.amazonaws.com)... 52.217.87.76
Connecting to commoncrawl.s3.amazonaws.com (commoncrawl.s3.amazonaws.com)|52.217.87.76|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-07-11 15:01:56 ERROR 403: Forbidden.

I've tried on different machines and none of them works.

Expected behavior
Succesfull downloads.

Thank you in advance!

@SunYuanKang
Copy link

Hi, I have the same problem as you, could you please tell me how to deal with it? Thanks a lot.

@llllooong
Copy link

same problem

@yidong72
Copy link

yidong72 commented Feb 6, 2023

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants