Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteString.flushLastBuffer() only resets buffer to zero length if it was full #19747

Open
archiecobbs opened this issue Dec 19, 2024 · 0 comments
Assignees
Labels

Comments

@archiecobbs
Copy link

What version of protobuf and what language are you using?

Version: v3.29.2
Language: Java

What operating system (Linux, Windows, ...) and version?

MacOS 15.1.1

What runtime / compiler are you using (e.g., python version or gcc version)

OpenJDK 23

What did you do?

Code inspection of ByteString.java.

What did you expect to see

Bug-free code.

What did you see instead?

A bug.

Consider the method flushLastBuffer() that is in the class ByteString.Output:

    /**
     * Internal function used by {@link #toByteString()}. The current buffer may or may not be full,
     * but it needs to be flushed.
     */
    private void flushLastBuffer() {
      if (bufferPos < buffer.length) {
        if (bufferPos > 0) {
          byte[] bufferCopy = Arrays.copyOf(buffer, bufferPos);
          flushedBuffers.add(new LiteralByteString(bufferCopy));
        }
        // We reuse this buffer for further writes.
      } else {
        // Buffer is completely full.  Huzzah.
        flushedBuffers.add(new LiteralByteString(buffer));
        // 99% of the time, we're not going to use this OutputStream again.
        // We set buffer to an empty byte stream so that we're handling this
        // case without wasting space.  In the rare case that more writes
        // *do* occur, this empty buffer will be flushed and an appropriately
        // sized new buffer will be created.
        buffer = EMPTY_BYTE_ARRAY;
      }
      flushedBuffersTotalBytes += bufferPos;
      bufferPos = 0;
    }

Note the long comment near the bottom. Basically what this is saying is:

  • This method flushLastBuffer() is only called from toByteString()
  • toByteString() is almost always the last method invoked on a ByteString.Output object
  • Therefore, after flushLastBuffer() returns, the buffer field is most likely just going to be wasting space
  • So reset buffer to a zero length array to free up some space
  • But in the rare instance where the output is reused, things will still function properly

OK, fine, that makes sense... although I have a question: If the Output is no longer going to be used, wouldn't the entire object be garbage collectable just as soon as the old buffer array? So how does this optimization actually help anything?

So it's possible this optimization does nothing useful, in which case it should be removed.

But let's suppose that the logic is sound. Now notice how the code is only actually applying that logic when the buffer just happens to be exactly full, which presumably is a small percentage of overall cases where you get "lucky".

In most cases you aren't going to get lucky, so this optimization is actually mostly doing nothing. Therefore, it should either be corrected or removed.

To correct this, we should reset the buffer field to zero length in all cases, like this:

diff --git a/java/core/src/main/java/com/google/protobuf/ByteString.java b/java/core/src/main/java/com/google/protobuf/ByteString.java
index 1903a4deb..c18e7cd5c 100644
--- a/java/core/src/main/java/com/google/protobuf/ByteString.java
+++ b/java/core/src/main/java/com/google/protobuf/ByteString.java
@@ -1201,15 +1201,15 @@ public abstract class ByteString implements Iterable<Byte>, Serializable {
       } else {
         // Buffer is completely full.  Huzzah.
         flushedBuffers.add(new LiteralByteString(buffer));
-        // 99% of the time, we're not going to use this OutputStream again.
-        // We set buffer to an empty byte stream so that we're handling this
-        // case without wasting space.  In the rare case that more writes
-        // *do* occur, this empty buffer will be flushed and an appropriately
-        // sized new buffer will be created.
-        buffer = EMPTY_BYTE_ARRAY;
       }
       flushedBuffersTotalBytes += bufferPos;
       bufferPos = 0;
+      // 99% of the time, we're not going to use this OutputStream again.
+      // We set buffer to an empty byte stream so that we're handling this
+      // case without wasting space.  In the rare case that more writes
+      // *do* occur, this empty buffer will be flushed and an appropriately
+      // sized new buffer will be created.
+      buffer = EMPTY_BYTE_ARRAY;
     }
   }
 

Anything else we should know about your project / environment

@archiecobbs archiecobbs added the untriaged auto added to all issues by default when created. label Dec 19, 2024
@zhangskz zhangskz added java and removed untriaged auto added to all issues by default when created. labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants