-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloads failed as "No content-length header" #8
Comments
I'm not sure to be honest. The setup is a bit jank. I would check the extension's console and see if it can give more insight. Content-length header is usually done by the extension running Also, what is the URL of the direct download link that the ZIP file resolves to? I was able to whitelist EU and US in the worker but I am a bit uncertain about AUS. This message could come from the proxy blocking it. You can neuter it yourself or just give it to me after 30 minutes either by email or just posting it here. Right now it's whitelisting these. |
Thanks for your response! I tailed Cf logs and I can see it is trying to HEAD a google endpoint that starts with |
The HEAD is supposed to be from the extension itself. It doesn't need to go through the proxy. Hold up, are you trying to transload from drive? |
I can see Cf logs having a 403 response to a HEAD req that is sent to In terms of where I am transloading from - I did a Takepout archive (of Google Photos). I understand that ultimately photos in GPhotos are stored in GDrive, but I didn't do it directly via Drive. |
I did and the extension successfully intercepts the download. |
Hmm, I can reproduce the issue. I'm not sure what's going on at the moment, but I'll see what I can find out and fix it. |
Thanks! Just a bit more information I guess: I tried downloading another Takeout archive from another account - same 403 resp and same error message. However, I can see that the HEAD req was sent to a different endpoint: |
The format seems to have changed slightly. There's now a note: these are both expired of course.
|
Hmm, an initial hack with that assumption doesn't work. I'll have to test this out more over next weekend with https://github.com/nelsonjchen/put-block-from-url-esc-issue-demo-server and get down to how exactly azure is mangling Google's URL. |
It's definitely something to do with Observe the RawPath values here. Just a
Percent encoded, note the server sees
Proxy with no encoding: https://gtr-proxy.677472.xyz/p/put-block-from-url-esc-issue-demo-server-3vngqvvpoq-uc.a.run.app/Me+You.txt
Proxy with encoding: https://gtr-proxy.677472.xyz/p/put-block-from-url-esc-issue-demo-server-3vngqvvpoq-uc.a.run.app/Me%2BYou.txt
Proxy with "azure armoring": https://gtr-proxy.677472.xyz/p/put-block-from-url-esc-issue-demo-server-3vngqvvpoq-uc.a.run.app/Me%252BYou.txt
Hmm, this is a bit frustrating. I'll have to experiment a bit with CF and see what can be done to work around this CF bug. |
addEventListener("fetch", event => {
event.respondWith(handleRequest(event.request))
})
async function handleRequest(request) {
const url = "https://put-block-from-url-esc-issue-demo-server-3vngqvvpoq-uc.a.run.app/Me%2BYou.txt"
const originalResponse = await fetch((new URL(url)).toString())
return originalResponse
} https://cloudflareworkers.com/#a4c100303434d48b75d700b7adf8e794:http://about:blank/ apparently the URL object will cause this. It preserves %2F but will clobber %2B. |
Probably closes nelsonjchen/gargantuan-takeout-rocket#8
Going to reopen this issue. I queued up a Takeout for myself but it's like "Start in 2 days". OK, whatever, I'll test it then and if it works, this will close for sure.
|
You'll need to pull or update your GTR proxy. |
I have my own Google Takeout in progress right now on schedule. I'm traveling but I'll check it out when I can. |
gtr-proxy is returning a 403 in background.js
appreciate it, let me know if you can reproduce. If it helps this is a google workspace account, in this case for colorado.edu |
I think it has to do with this function in gtr-proxy/src/handler.ts export function validGoogleTakeoutUrl(url: URL): boolean {
return (
url.hostname.endsWith('apidata.googleusercontent.com') &&
(url.pathname.startsWith('/download/storage/v1/b/dataliberation/o/') ||
url.pathname.startsWith('/download/storage/v1/b/takeout'))
)
} perhaps you could instead use regex something like export function validGoogleTakeoutUrl(url: URL): boolean {
return /^.*\.(googleusercontent\.com|googleapis\.com|google\.com)$/.test(url.hostname)
} or export function validGoogleTakeoutUrl(url: URL): boolean {
return /^.*\.(googleusercontent\.com|googleapis\.com|google\.com)$/.test(url.hostname) && /^((\/download\/storage\/v1\/b\/.*)|\/takeout-eu\/.*)/.test(url.pathname)
} I did not run these expressions through regexr to validate htem, but I hope you can understand the gist A more broad approach could simply check that the cert authority organization is Google Trust Services LLC, though i could think of a few reasons why you wouldn't want to proxy any google url |
I'm handy with regex but I also am very aware of how their terseness can sometimes have holes so I reach for a larger, more obvious self-documentation version. Yeah, the few reasons why you wouldn't want to proxy any Google URL is right and the realization that's just what one could think of. Was this a takeout that worked in the past or is this the first time with a Workspace account? |
Also, I thought and your profile says you're in colorado. "takeout-eu"? |
yeah no clue why that is the case. Basically CU Boulder finally cracked down on the once 'unlimited' google drive storage, so here I am trying to download 2tb over the past 3 weeks lol |
yes it worked in the past. I exceeded the retry attempts when using the extension, If you used something like preact to inject into the dom on any page when a download is intercepted (and/or fails) it would be fairly informative. I forgot I installed this extension earlier and was very confused when all of my unrelated downloads weren't working. Here you can see takeout downloads working, except when chrome uses 40% of my ryzen 5 5600x to write files and crashes when i drag a tab out of a window. sorry for the trouble I would have just patched the cloudflare worker myself, but I am kind of averse to using cloudflare's dev tools as I used to work for a competitor 😅 |
Yeah the archive downloads have a lifespan of 5 attempts. This ephemeral-ness is one of the reasons why I made GTR too. Out of danger ASAP please thanks. OK, I'll see about adding those URLs as allowable to the proxy. It must be unique to workspaces. I'll probably make it allow anything starting with takeout on that domain. Can you check if those are URLs have all that is needed to download the archives with a small takeout? Like, could you copy a URL and download an archive with it shortly after it is generated with an incognito window? |
Yeah I could do more, but I explicitly elected to take out that requirement since I didn't feel like fighting a heavily post-processed page and RE-ing it. Intercepting the downloads with the downloads API in Chromium felt much more stable. @AskAlice I've updated the gtr-proxy I host to include those domains and path prefixes. Please give it a try and let me know if that works. I can't see the full URLs from your screenshot, but if they're signed and don't need cookies, then we should be good. Fingers crossed! 🤞 |
After a few tries, and changing nothing, I got a download to start! but other files are giving me 502s takeout-20230609T011058Z-026.tgz - failed - Failed to stage block: 502
<html> <head><title>502 Bad Gateway</title></head>
<body> <center><h1>502 Bad Gateway</h1></center>
<hr><center>cloudflare</center>
</body> </html>
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page --> and others I see takeout-20230609T011058Z-016.tgz - failed - Failed to stage block: 400
<?xml version="1.0" encoding="utf-8"?><Error>
<Code>CannotVerifyCopySource</Code>
<Message>Bad Request RequestId:81365698-301e-0032-4752-9d31c6000000 Time:2023-06-12T17:22:31.2050574Z</Message></Error> also (might be from retrying, not sure)
and also
I hope I can just retry those and see if it works but I've got a good portion of these downloaded though which should supplement the ones that I downloaded before chrome died under the sheer load of writing files to my hard drive. So thank you for this awesome tool! |
Ok, probably as good as I can get it |
Hi Nelson,
I wanted to thank you for your amazing work on the [project name] extension. I'm also interested in using R2 and/or B2 as potential destinations.
I followed your tutorial using a self-hosted Cloudflare Proxy and tried to download a 50G archive from Google Takeout. However, I received an error message from the extension that said,
takeout.zip - failed - No content-length header.
Based on my initial assessment, I believe this issue might be related to the Cloudflare worker.Could you please provide me with some insights on what could be causing this issue and what steps I can take to resolve it?
Thank you again for all your hard work!
Cheers!
Adam
The text was updated successfully, but these errors were encountered: