isaac-sim · jsmith-bdai · Sep 25, 2024 · Sep 10, 2024 · Sep 10, 2024 · Sep 10, 2024
diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
@@ -40,6 +40,7 @@ Guidelines for modifications:
 * Calvin Yu
 * Chenyu Yang
 * David Yang
+* Gary Lvov
 * HoJin Jeon
 * Jean Tampon
 * Jia Lin Yuan

@@ -0,0 +1,119 @@
+.. _how-to-estimate-how-cameras-can-run:
+
+
+Find How Many/What Cameras You Should Train With
+================================================
+
+.. currentmodule:: omni.isaac.lab
+
+Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras,
+and Ray Caster cameras. These camera types differ in functionality and performance. The ``benchmark_cameras.py``
+script can be used to understand the difference in cameras types, as well to characterize their relative performance
+at different parameters such as camera quantity, image dimensions, and data types.
+
+This utility is provided so that one easily can find the camera type/parameters that are the most performant
+while meeting the requirements of the user's scenario. This utility also helps estimate
+the maximum number of cameras one can realistically run, assuming that one wants to maximize the number
+of environments while minimizing step time.
+
+This utility can inject cameras into an existing task from the gym registry,
+which can be useful for benchmarking cameras in a specific scenario. Also,
+if you install ``pynvml``, you can let this utility automatically find the maximum
+numbers of cameras that can run in your task environment up to a
+certain specified system resource utilization threshold (without training; taking zero actions
+at each timestep).
+
+This guide accompanies the ``benchmark_cameras.py`` script in the ``IsaacLab/source/standalone/tutorials/04_sensors``
+directory.
+
+.. dropdown:: Code for benchmark_cameras.py
+   :icon: code
+
+   .. literalinclude:: ../../../source/standalone/tutorials/04_sensors/benchmark_cameras.py
+      :language: python
+      :linenos:
+
+
+Possible Parameters
+-------------------
+
+First, run
+
+.. code-block:: bash
+
+   ./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py -h
+
+to see all possible parameters you can vary with this utility.
+
+
+See the command line parameters related to ``autotune`` for more information about
+automatically determining maximum camera count.
+
+
+Compare Performance in Task Environments and Automatically Determine Task Max Camera Count
+------------------------------------------------------------------------------------------
+
+Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.
+
+For example, to see how your system could handle 100 tiled cameras in
+the cartpole environment, with 2 cameras per environment (so 50 environments total)
+only in RGB mode, run
+
+.. code-block:: bash
+   ./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
+   --task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
+   --task_num_cameras_per_env 2 \
+   --tiled_camera_data_types rgb
+
+If you have pynvml installed, (``./isaaclab.sh -p -m pip install pynvml``), you can also
+find the maximum number of cameras that you could run in the specified environment up to
+a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent,
+max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras
+you can run with cartpole, you could run:
+
+.. code-block:: bash
+   ./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
+   --task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
+   --task_num_cameras_per_env 2 \
+   --tiled_camera_data_types rgb --autotune \
+   --autotune_max_percentage_util 100 80 50 50
+
+Autotune may lead to the program crashing, which means that it tried to run too many cameras at once.
+However, the max percentage utilization parameter is meant to prevent this from happening.
+
+The output of the benchmark doesn't include the overhead of training the network, so consider
+decreasing the maximum utilization percentages to account for this overhead. The final output camera
+count is for all cameras, so to get the total number of environments, divide the output camera count
+by the number of cameras per environment.
+
+
+Compare Camera Type and Performance (Without a Specified Task)
+--------------------------------------------------------------
+
+This tool can also asses performance without a task environment.
+For example, to view 100 random objects with 2 standard cameras, one could run
+
+.. code-block:: bash
+
+   ./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
+   --height 100 --width 100 --num_standard_cameras 2 \
+   --standard_camera_data_types instance_segmentation_fast normals --num_objects 100 \
+   --experiment_length 100
+
+If your system cannot handle this due to performance reasons, then the process will be killed.
+It's recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get
+an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like ``htop`` and ``nvtop``
+to live monitor resources while running this script, and in Windows, you can use the Task Manager.
+
+If your system has a hard time handling the desired cameras, you can try the following
+
+   - Switch to headless mode (supply ``--headless``)
+   - Ensure you are using the GPU pipeline not CPU!
+   - If you aren't using Tiled Cameras, switch to Tiled Cameras
+   - Decrease camera resolution
+   - Decrease how many data_types there are for each camera.
+   - Decrease the number of cameras
+   - Decrease the number of objects in the scene
+
+If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal.
+After the simulations stops it can be closed with CTRL C.
@@ -46,6 +46,17 @@ This guide explains how to save the camera output in Isaac Lab.
 
     save_camera_output
 
+Estimate How Many Cameras Can Run On Your Machine
+-------------------------------------------------
+
+This guide demonstrates how to estimate the number of cameras one can run on their machine under the desired parameters.
+
+.. toctree::
+    :maxdepth: 1
+
+    estimate_how_many_cameras_can_run
+
+
 Drawing Markers
 ---------------
 

@@ -2,6 +2,18 @@ Changelog
 ---------
 
 
+0.24.14 (2024-09-20)
+~~~~~~~~~~~~~~~~~~~~
+
+Added
+^^^^^
+
+* Added :meth:`convert_perspective_depth_to_orthogonal_depth`. :meth:`unproject_depth` assumes
+that the input depth image is orthogonal. The new :meth:`convert_perspective_depth_to_orthogonal_depth`
+can be used to convert a perspective depth image into an orthogonal depth image, so that the point cloud
+can be unprojected correctly with :meth:`unproject_depth`.
+
+
 0.24.13 (2024-09-08)
 ~~~~~~~~~~~~~~~~~~~~
 

@@ -988,7 +988,12 @@ def transform_points(
 
 @torch.jit.script
 def unproject_depth(depth: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tensor:
-    r"""Unproject depth image into a pointcloud.
+    r"""Unproject depth image into a pointcloud. This method assumes that depth
+    is provided orthagonally relative to the image plane, as opposed to absolutely relative to the camera's
+    principal point (perspective depth). To unproject a perspective depth image, use
+    :meth:`convert_perspective_depth_to_orthogonal_depth` to convert
+    to an orthogonal depth image prior to calling this method as otherwise the
+    created point cloud will be distorted, especially around the edges.
 
     This function converts depth images into points given the calibration matrix of the camera.
 
@@ -1059,6 +1064,105 @@ def unproject_depth(depth: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tens
     return points_xyz
 
 
+@torch.jit.script
+def convert_perspective_depth_to_orthogonal_depth(
+    perspective_depth: torch.Tensor, intrinsics: torch.Tensor
+) -> torch.Tensor:
+    r"""Provided depth image(s) where depth is provided as the distance to the principal
+    point of the camera (perspective depth), this function converts it so that depth
+    is provided as the distance to the camera's image plane (orthogonal depth).
+
+    This is helpful because `unproject_depth` assumes that depth is expressed in
+    the orthogonal depth format.
+
+    If `perspective_depth` is a batch of depth images and `intrinsics` is a single intrinsic matrix,
+    the same calibration matrix is applied to all depth images in the batch.
+
+    The function assumes that the width and height are both greater than 1.
+
+    Args:
+        perspective_depth: The depth measurement obtained with the distance_to_camera replicator.
+            Shape is (H, W) or or (H, W, 1) or (N, H, W) or (N, H, W, 1).
+        intrinsics: A tensor providing camera's calibration matrix. Shape is (3, 3) or (N, 3, 3).
+
+    Returns:
+        The depth image as if obtained by the distance_to_image_plane replicator. Shape
+            matches the input shape of depth
+
+    Raises:
+        ValueError: When depth is not of shape (H, W) or (H, W, 1) or (N, H, W) or (N, H, W, 1).
+        ValueError: When intrinsics is not of shape (3, 3) or (N, 3, 3).
+    """
+
+    # Clone inputs to avoid in-place modifications
+    perspective_depth_batch = perspective_depth.clone()
+    intrinsics_batch = intrinsics.clone()
+
+    # Check if inputs are batched
+    is_batched = perspective_depth_batch.dim() == 4 or (
+        perspective_depth_batch.dim() == 3 and perspective_depth_batch.shape[-1] != 1
+    )
+
+    # Track whether the last dimension was singleton
+    add_last_dim = False
+    if perspective_depth_batch.dim() == 4 and perspective_depth_batch.shape[-1] == 1:
+        add_last_dim = True
+        perspective_depth_batch = perspective_depth_batch.squeeze(dim=3)  # (N, H, W, 1) -> (N, H, W)
+    if perspective_depth_batch.dim() == 3 and perspective_depth_batch.shape[-1] == 1:
+        add_last_dim = True
+        perspective_depth_batch = perspective_depth_batch.squeeze(dim=2)  # (H, W, 1) -> (H, W)
+
+    if perspective_depth_batch.dim() == 2:
+        perspective_depth_batch = perspective_depth_batch[None]  # (H, W) -> (1, H, W)
+
+    if intrinsics_batch.dim() == 2:
+        intrinsics_batch = intrinsics_batch[None]  # (3, 3) -> (1, 3, 3)
+
+    if is_batched and intrinsics_batch.shape[0] == 1:
+        intrinsics_batch = intrinsics_batch.expand(perspective_depth_batch.shape[0], -1, -1)  # (1, 3, 3) -> (N, 3, 3)
+
+    # Validate input shapes
+    if perspective_depth_batch.dim() != 3:
+        raise ValueError(f"Expected perspective_depth to have 2, 3, or 4 dimensions; got {perspective_depth.shape}.")
+    if intrinsics_batch.dim() != 3:
+        raise ValueError(f"Expected intrinsics to have shape (3, 3) or (N, 3, 3); got {intrinsics.shape}.")
+
+    # Image dimensions
+    im_height, im_width = perspective_depth_batch.shape[1:]
+
+    # Get the intrinsics parameters
+    fx = intrinsics_batch[:, 0, 0].view(-1, 1, 1)
+    fy = intrinsics_batch[:, 1, 1].view(-1, 1, 1)
+    cx = intrinsics_batch[:, 0, 2].view(-1, 1, 1)
+    cy = intrinsics_batch[:, 1, 2].view(-1, 1, 1)
+
+    # Create meshgrid of pixel coordinates
+    u_grid = torch.arange(im_width, device=perspective_depth.device, dtype=perspective_depth.dtype)
+    v_grid = torch.arange(im_height, device=perspective_depth.device, dtype=perspective_depth.dtype)
+    u_grid, v_grid = torch.meshgrid(u_grid, v_grid, indexing="xy")
+
+    # Expand the grids for batch processing
+    u_grid = u_grid.unsqueeze(0).expand(perspective_depth_batch.shape[0], -1, -1)
+    v_grid = v_grid.unsqueeze(0).expand(perspective_depth_batch.shape[0], -1, -1)
+
+    # Compute the squared terms for efficiency
+    x_term = ((u_grid - cx) / fx) ** 2
+    y_term = ((v_grid - cy) / fy) ** 2
+
+    # Calculate the orthogonal (normal) depth
+    normal_depth = perspective_depth_batch / torch.sqrt(1 + x_term + y_term)
+
+    # Restore the last dimension if it was present in the input
+    if add_last_dim:
+        normal_depth = normal_depth.unsqueeze(-1)
+
+    # Return to original shape if input was not batched
+    if not is_batched:
+        normal_depth = normal_depth.squeeze(0)
+
+    return normal_depth
+
+
 @torch.jit.script
 def project_points(points: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tensor:
     r"""Projects 3D points into 2D image plane.

@@ -376,6 +376,24 @@ def iter_old_quat_rotate_inverse(q: torch.Tensor, v: torch.Tensor) -> torch.Tens
                 iter_old_quat_rotate_inverse(q_rand, v_rand),
             )
 
+    def test_depth_perspective_conversion(self):
+        # Create a sample perspective depth image (N, H, W)
+        perspective_depth = torch.tensor([[[10.0, 0.0, 100.0], [0.0, 3000.0, 0.0], [100.0, 0.0, 100.0]]])
+
+        # Create sample intrinsic matrix (3, 3)
+        intrinsics = torch.tensor([[500.0, 0.0, 5.0], [0.0, 500.0, 5.0], [0.0, 0.0, 1.0]])
+
+        # Convert perspective depth to orthogonal depth
+        orthogonal_depth = math_utils.convert_perspective_depth_to_orthogonal_depth(perspective_depth, intrinsics)
+
+        # Manually compute expected orthogonal depth based on the formula for comparison
+        expected_orthogonal_depth = torch.tensor(
+            [[[9.9990, 0.0000, 99.9932], [0.0000, 2999.8079, 0.0000], [99.9932, 0.0000, 99.9964]]]
+        )
+
+        # Assert that the output is close to the expected result
+        torch.testing.assert_close(orthogonal_depth, expected_orthogonal_depth)
+
 
 if __name__ == "__main__":
     run_tests()