Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow PointCloud2 message creation in Python #177

Closed
Flova opened this issue Mar 13, 2022 · 2 comments
Closed

Extremely slow PointCloud2 message creation in Python #177

Flova opened this issue Mar 13, 2022 · 2 comments

Comments

@Flova
Copy link
Contributor

Flova commented Mar 13, 2022

The message serialization is very slow (> hundred seconds for large point clouds), when setting the data property of a PointCloud2 message in Python by passing a byte string (e.g. by NumPy asbytes()).

This does not happen if the PointCloud2 message constructor is used (see #176 for the inconsistent behavior) or if a array.array is passed (does not work with the constructor because of #176).

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • apt
  • Version or commit hash:
    • rolling
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

  • Create a large Numpy Array called nparr
  • Create a matching PointCloud2 message called msg with parameters matching the NumPy array format
  • Set the data msg.data = nparr.tobytes()
  • Time this process and compare it to setting the data directly in the constructor

The second one is much slower:

  • 2.7484822273254395 seconds # byte string set via the constructor
  • 412.08923530578613 seconds # byte string set via the property

Expected behavior

Both should be equally fast

Actual behavior

The second one is much slower, because an additional assertion is performed.
This assertion is skipped in the constructor because of the bug in #176. It therefore performs much better.

The constructor directly casts the data and sends it to the setter

self.data = array.array('B', kwargs.get('data', []))

The setter itself checks if the cast is already done and skips the casing, including the following assertion

assert \
                ((isinstance(value, Sequence) or
                  isinstance(value, Set) or
                  isinstance(value, UserList)) and
                 not isinstance(value, str) and
                 not isinstance(value, UserString) and
                 all(isinstance(v, int) for v in value) and
                 all(val >= 0 and val < 256 for val in value)), \
                "The 'data' field must be a set or sequence and each value of type 'int' and each unsigned integer in [0, 255]"

The following part of the assertion is critical because it massively slows it down by iterating >2x over all values.

all(isinstance(v, int) for v in value)  and
all(val >= 0 and val < 256 for val in value))

This happens only in __debug__ is set, but that is the case for many dev machines, and this effectively crashes the code by freezing for many seconds.

Implementation considerations

Drop the check for byte strings. As it does not make any sense in the case of this datatype ether way.

I honestly don't know how to change this since the code seems to be auto generated. In this case it would be even worse as the issue is not only happening for the PointCloud2.

@Flova
Copy link
Contributor Author

Flova commented Mar 13, 2022

This issue was discovered in the following PR #175 have a look at it if you want to perform the tests yourself.

@Flova
Copy link
Contributor Author

Flova commented Mar 20, 2022

Closed in favor of ros2/rosidl_python#156

@Flova Flova closed this as completed Mar 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant