Unable to Replicate Depth Estimation Results on NYUv2 Depth Dataset (Including After Applying Eigen Crop) #484

erikjagnandan · 2024-11-22T20:31:53Z

I am trying to run depth estimation with the small backbone size and DPT decoder on the NYUv2 Depth Dataset, using the general framework presented in the depth_estimation.ipynb notebook. I downloaded the NYUv2 dataset from Kaggle at https://www.kaggle.com/datasets/soumikrakshit/nyu-depth-v2?resource=download and ran depth estimation on the ~50k training images by adding the code below to the end of depth_estimation.ipynb.

At first, I was simply computing RMSE on the entire image (without Eigen crop or min/max thresholding) and computed the RMSE as 0.433. However, after reading issue #227, I looked into the pre_eval and evaluate methods in the Monocular-Depth-Estimation-Toolbox repository and added min/max thresholding (with min threshold 1e-3 and max threshold 10, as these were the default values for the thresholds given in the repository), as well as Eigen crop at the same pixel range [45:471, 41:601] as was used in the repository. However, even after making this change, the RMSE is still 0.410, far from the value of 0.356 reported in the paper. For clarity, I am calculating the RMSE by computing the MSE for each image separately, averaging the MSEs, and then taking the square root of this average, which is (to my understanding) the correct implementation of RMSE.

I understand that I am running on training data, whereas the performance reported in the paper is for the validation data, but the disparity should not be this severe (~16% increase in RMSE), especially as validation performance should be, in general, no better than training performance.

Now that I have incorporated Eigen crop and min/max thresholding and the results are still not being replicated, is there any step which was used in the paper which I have left out here? From looking into the Monocular-Depth-Estimation-Toolbox repository, it looks that I have performed all of the steps included there. Alternatively, is there some simple way that could I import code from the Monocular-Depth-Estimation-Toolbox repository and use it to evaluate the DINOv2 Depth Estimator? From looking at their ReadMe, it seems that this would be quite nontrivial, as DINOv2 is not listed as a supported backbone.

Some notes about my code:

data_list is a list containing one element per training sample, with each element being a two-element list in which the first element is the name of the folder (e.g. basement_0001a_out) and the second element is the index of the sample within the folder

I multiply by 10.0 when loading the ground truth depth map since the ground truth depth maps are provided at 1/10th scale. When loading the ground truth depth maps as is, without multiplication, the depth predictions are on average almost exactly 10 times the ground truth depth, and after multiplying by 10, the histogram of depth predictions and ground truth depths lines up accurately.

`mse = torch.zeros(len(data_list))

print_increment = 100

for i in range(len(data_list)):

selected_dataset, selected_image = data_list[i]

ground_truth_depth = 10.0 * torch.tensor(np.array(Image.open(data_directory + "/" + selected_dataset + "/" + 
str(selected_image+1) + ".png")).astype(float)/255.0, dtype=torch.float32)

image = Image.open(data_directory + "/" + selected_dataset + "/" + str(selected_image+1) + ".jpg")

rescaled_image = image.resize((image.width, image.height))
transformed_image = transform(rescaled_image)
batch = transformed_image.unsqueeze(0).cuda() # Make a batch of one image

with torch.inference_mode():
    result = model.whole_inference(batch, img_meta=None, rescale=True).squeeze()

eigen_crop = True
if eigen_crop:
    min_threshold = 1e-3
    max_threshold = 10
    valid_mask = (ground_truth_depth > min_threshold) & (ground_truth_depth < max_threshold)
    eigen_mask = torch.zeros(ground_truth_depth.shape, dtype=torch.bool)
    eigen_mask[45:471, 41:601] = True
    eval_mask = torch.logical_and(valid_mask, eigen_mask)
    mse[i] = F.mse_loss(result.cpu()[eval_mask], ground_truth_depth[eval_mask]).item()
else:
    mse[i] = F.mse_loss(result.cpu(), ground_truth_depth).item()

if i % print_increment == print_increment - 1:
    print("Images " + str(i-print_increment+2) + "-" + str(i+1) + ": MSE = " + str(mse[i-print_increment+1:i].mean().item()))

print("Avg MSE = " + str(mse.mean().item()))
print("Avg RMSE = " + str(torch.sqrt(mse.mean()).item()))`

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Replicate Depth Estimation Results on NYUv2 Depth Dataset (Including After Applying Eigen Crop) #484

Unable to Replicate Depth Estimation Results on NYUv2 Depth Dataset (Including After Applying Eigen Crop) #484

erikjagnandan commented Nov 22, 2024 •

edited

Loading

Unable to Replicate Depth Estimation Results on NYUv2 Depth Dataset (Including After Applying Eigen Crop) #484

Unable to Replicate Depth Estimation Results on NYUv2 Depth Dataset (Including After Applying Eigen Crop) #484

Comments

erikjagnandan commented Nov 22, 2024 • edited Loading

erikjagnandan commented Nov 22, 2024 •

edited

Loading