Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Targets e.g. Cloudflare R2 for Staging, S3 for AWS Deep Archive, Backblaze for lukewarm, Wasabi for lukewarm, and etc. #7

Open
nelsonjchen opened this issue Feb 18, 2023 · 12 comments

Comments

@nelsonjchen
Copy link
Owner

nelsonjchen commented Feb 18, 2023

General Issues to tackle:

Targets:

  • Hot
    • R2
      • For Staging for local backup. No Archive Tier available unfortunately. Lifecycle rules. Will be personally using for local backup staging if available. $15/mo/TB
      • Real and Authentic Free Download/Upload
  • Lukewarm
  • Cold
    • S3
      • For Deep Archive Tier (Equivalent Pricing to Azure)
      • Probably more comfy for AWS-natives
      • $1/mo/TB
@nelsonjchen nelsonjchen changed the title Cloudflare R2 Staging S3 Targets e.g. Cloudflare R2 for Staging, Wasabi Feb 21, 2023
@mderazon
Copy link

mderazon commented Mar 5, 2023

What is the reason for waiting for Lifecycle rules ? These destinations are quite cheap as they are no ?
Also, uploading to R2 seems like a small step as CF proxy is used anyway no ?

@nelsonjchen
Copy link
Owner Author

The biggest concern there for me is that it'll cost $15 to host 1TB of data on R2. That blows my budget by quite a lot. I want to make sure Cloudflare has some safeguards that a guide can guide to setup to prevent that in case someone forgets to delete their staging area.

@nelsonjchen
Copy link
Owner Author

I fleshed out the issue description a lot @mderazon .

@nelsonjchen nelsonjchen changed the title S3 Targets e.g. Cloudflare R2 for Staging, Wasabi S3 Targets e.g. Cloudflare R2 for Staging, S3 for AWS Deep Archive, Backblaze for lukewarm, Wasabi for lukewarm, and etc. Mar 5, 2023
@mderazon
Copy link

mderazon commented Mar 5, 2023

fwiw, this is what I was trying to do with Workers:
https://community.cloudflare.com/t/backup-directly-from-google-drive-to-r2/440132/5

@nelsonjchen
Copy link
Owner Author

fwiw, this is what I was trying to do with Workers: https://community.cloudflare.com/t/backup-directly-from-google-drive-to-r2/440132/5

Hmm, that's such a weird usage of some APIs. You pass in a body which is just a ReadableStream, but then there's also queue size and part size. Doesn't that require some sort of seekable buffer or something? Maybe it blew up because those aren't compatible things you can do with a simple byte stream or a representation of a byte stream.

@nelsonjchen
Copy link
Owner Author

nelsonjchen commented Mar 6, 2023

You're doing a lot more orchestration in the worker than what I did my approach as well. In the prototype GTR Azure Transload from Cloudware Workers where the worker itself does the transloading, a lot of the orchestration happens on the extension, where it isn't bound by the silly 10ms CPU limit. The worker or the many worker instances really is just pretty much given two fetches, a response body from one to stick into the other, and no fat libraries doing stuff like part size and queues are used; the worker stays very dumb.

@nelsonjchen
Copy link
Owner Author

On that note about fat libraries, if I do try to tackle this, I'll probably be using https://github.com/mhart/aws4fetch and maybe just the raw stuff in there.

@mderazon
Copy link

mderazon commented Mar 6, 2023

I don't think the size of the library makes any difference, as it could be one line in the library that does some CPU and that would be it.
In the case of the library I used, the culprit might be somewhere around these lines of code
https://github.com/aws/aws-sdk-js-v3/blob/ce7cc58b15fd7ba0bd2b10c7a471b4c8ce95b7d9/lib/lib-storage/src/Upload.ts#L309-L355

There's also this:
https://community.cloudflare.com/t/streaming-large-remote-files/14501/3

I will try the lib you mentioned in my code to see if it makes a difference

@nelsonjchen
Copy link
Owner Author

Just noting this down here: https://developers.cloudflare.com/workers/platform/limits/#simultaneous-open-connections

There is a limit of 6 simultaneous connections. Theoretically, I can do 3/10s the speed of the current Azure transloading from one worker call.

@nelsonjchen
Copy link
Owner Author

https://developers.cloudflare.com/r2/buckets/object-lifecycles/

lifecycle rules have been added

@mderazon
Copy link

I'm keeping an eye on this project, wanted to ask, now that lifecycle rules have been added, the last missing piece to send it to any S3 compatible storage is remote fetch feature that Azure storage has ?

@nelsonjchen
Copy link
Owner Author

The last missing piece is acceptable performance. The 100MB POST limit inside workers was extremely annoying. Is it still there? It cuts the speed to a top speed of 3/10s of Azure's and causes request count to spike to the point where it smashes into the free account limit's ceiling.

I haven't touched this issue in some time, I might resurrect it now that I got a new 8 TB drive to backup to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants