Uploading with AsyncWritter can lead to OOM errors #172

Kaldie · 2021-06-23T14:19:53Z

Currently when using the AsyncWritter, it is possible to have an OOM error due to the queue being huge.

For instance this snippet will fill up the queue faster than it can be send via https to hdfs

import string
import random
import hdfs

client = hdfs(<valid arguments>)

with client.write("filename", encoding="utf-8") as file_handle:
  writer = csv.writer(file_handle)

  # creates 25 pseudo lines of csv junk
  for element in [["".join(random.choice(string.ascii_letters) for _ in range(100)) for _ in range(25)] for _ in range(25)]:
    writer.writerows(element)

Leading to a unmanageable large memory usage.

Is it possible to have a limit on the queue size when creating a file_handle?
If you like I would like to create a PR with a possible solution?

mtth · 2021-06-28T14:38:23Z

Hi @Kaldie. A PR for this would be welcome.

Kaldie · 2021-07-06T06:36:07Z

Created a PR, however can't seem to link it in here 😢

Kaldie · 2021-09-01T11:17:49Z

Hi @mtth, could you have a look at the corrisponding PR?

Kaldie mentioned this issue Jul 6, 2021

Added a bounded Async writer that prevents oom errors #174

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uploading with AsyncWritter can lead to OOM errors #172

Uploading with AsyncWritter can lead to OOM errors #172

Kaldie commented Jun 23, 2021

mtth commented Jun 28, 2021

Kaldie commented Jul 6, 2021

Kaldie commented Sep 1, 2021

Uploading with AsyncWritter can lead to OOM errors #172

Uploading with AsyncWritter can lead to OOM errors #172

Comments

Kaldie commented Jun 23, 2021

mtth commented Jun 28, 2021

Kaldie commented Jul 6, 2021

Kaldie commented Sep 1, 2021