Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should write with an END_TOKEN call finalize on the stream to prevent memory leaks? #117

Open
KristofferC opened this issue Feb 10, 2022 · 2 comments

Comments

@KristofferC
Copy link
Member

KristofferC commented Feb 10, 2022

The desire here is to close the TranscodingStream without closing the underlying buffer. This is documented in https://juliaio.github.io/TranscodingStreams.jl/latest/examples/#Explicitly-finish-transcoding-by-writing-TOKEN_END-1 and says that you should write a TOKEN_END token to the stream. However, an issue with that is that it only flushes the stream but it doesn't finalize it which leads to memory leaks in code written like:

using CodecZlib
using TranscodingStreams

function leak()
    buf = IOBuffer()
    data = rand(10^6)
    while true
        zWriter = ZlibCompressorStream(buf)
        write(zWriter, data)
        write(zWriter, TranscodingStreams.TOKEN_END)
        flush(zWriter)
    end
end

leak()

which will indefinitely leak. Manually calling finalize on the zWriter fixes the issue but it is not clear from the documentation that this is required. There are a few possible solutions:

  • Attach a finalizer to the stream that calls finalize. This is not ideal because you want a more eager cleanup than whenever the GC gets to it.
  • Make writing a TOKEN_END call finalize on the stream.
  • Make writing a TOKEN_END set the stream mode to :closed and thereby allowing close on the wrapper stream to not close the underlying wrapped stream :
    function Base.close(stream::TranscodingStream)
    stopped = stream.state.mode == :stop
    if stream.state.mode != :panic
    changemode!(stream, :close)
    end
    if !stopped
    close(stream.stream)
    end
    return nothing
    end

Alternatively, it is also possible that the code that shows the leak above is "faulty" but generally, normal Julia code shouldn't leak like this so at least a finalizer might be a good idea.

@nhz2
Copy link
Member

nhz2 commented Mar 16, 2024

The stream is expected to still be writable after writing TOKEN_END. For example, https://github.com/BioJulia/FASTX.jl/blob/v2.1.4/src/fastq/writer.jl#L53 uses TOKEN_END in a flush function.

With #178 you can do:

using CodecZlib
using TranscodingStreams

function no_leak()
    buf = IOBuffer()
    data = rand(10^6)
    while true
        zWriter = ZlibCompressorStream(seekstart(buf); stop_on_end=true)
        write(zWriter, data)
        close(zWriter)
    end
end

no_leak()

Adding a finalizer is still a good idea.

@nhz2
Copy link
Member

nhz2 commented Dec 18, 2024

I am adding finalizers for CodecBzip2 in JuliaIO/CodecBzip2.jl#43

@KristofferC what do you mean by "This is not ideal because you want a more eager cleanup than whenever the GC gets to it."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants