Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split file each 4GB for BigQuery Quota Policy #6

Open
sakama opened this issue Apr 23, 2015 · 3 comments
Open

Split file each 4GB for BigQuery Quota Policy #6

sakama opened this issue Apr 23, 2015 · 3 comments

Comments

@sakama
Copy link
Contributor

sakama commented Apr 23, 2015

BigQuery has following Quota Policy.

So, It's better to split output file each 4GB.

File Type Compressed Uncompressed
CSV 4 GB With new-lines in strings: 4 GB
Without new-lines in strings: 5 TB
JSON 4 GB 5TB

Problems

  • Have to split newline(CRLF/LF/CR) at EOL, not only filesize.
  • Split before output beforehand is better way than split output file, Because Embulk run multiple tasks with multiple CPU cores.
@sakama sakama changed the title Split file each 4GB for BigQuery Quota Split file each 4GB for BigQuery Quota Policy Apr 23, 2015
@kosukekurimoto
Copy link

I have encountered this problem.

Caused by: org.jruby.exceptions.RaiseException: (Error) failed during waiting a Load job, get_job(myproject, embulk_load_job_513c2da9-2e73-498d-b57a-493ab53860af), errors:[{:reason=>"invalid", :message=>"Error while reading table: XXXX, error message: Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 7505312411. Max allowed size is: 4294967296."}]

@hiroyuki-sato
Copy link
Member

Hello, @kosukekurimoto
Have you ever tried uncompress mode?
It limits up to 5TB.

@kosukekurimoto
Copy link

@hiroyuki-sato

Hello, @kosukekurimoto
Have you ever tried uncompress mode?
It limits up to 5TB.

アドバイスをありがとうございます。私は該当のドキュメントを発見しました。
https://cloud.google.com/bigquery/quotas?hl=ja#load_jobs

compression: NONEで再度トライしてみます。

giwa added a commit to giwa/embulk-output-bigquery that referenced this issue May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants