Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading with AsyncWritter can lead to OOM errors #172

Open
Kaldie opened this issue Jun 23, 2021 · 3 comments
Open

Uploading with AsyncWritter can lead to OOM errors #172

Kaldie opened this issue Jun 23, 2021 · 3 comments

Comments

@Kaldie
Copy link

Kaldie commented Jun 23, 2021

Currently when using the AsyncWritter, it is possible to have an OOM error due to the queue being huge.

For instance this snippet will fill up the queue faster than it can be send via https to hdfs

import string
import random
import hdfs

client = hdfs(<valid arguments>)

with client.write("filename", encoding="utf-8") as file_handle:
  writer = csv.writer(file_handle)

  # creates 25 pseudo lines of csv junk
  for element in [["".join(random.choice(string.ascii_letters) for _ in range(100)) for _ in range(25)] for _ in range(25)]:
    writer.writerows(element)

Leading to a unmanageable large memory usage.

Is it possible to have a limit on the queue size when creating a file_handle?
If you like I would like to create a PR with a possible solution?

@mtth
Copy link
Owner

mtth commented Jun 28, 2021

Hi @Kaldie. A PR for this would be welcome.

@Kaldie
Copy link
Author

Kaldie commented Jul 6, 2021

Created a PR, however can't seem to link it in here 😢

@Kaldie
Copy link
Author

Kaldie commented Sep 1, 2021

Hi @mtth, could you have a look at the corrisponding PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants