Clarification on Wavelength Units and Metadata Representation for SAR Data #338

mmarks13 · 2024-12-05T17:51:36Z

Description

I am seeking to fine-tune the Clay model using NASA's UAVSAR L-Band radar. While reviewing the metadata for Sentinel-1, I noticed the following inconsistencies:

1. Wavelength Units

The Sentinel-1 metadata lists VV and VH polarization wavelengths as 3.5 and 4.0, which do not align with the standard C-Band wavelength (~5.6 cm or 56,000 µm).
These units also appear inconsistent with other platforms like Sentinel-2, where wavelengths are expressed in micrometers (µm).

2. Handling UAVSAR Data

UAVSAR data contains four polarizations (HH, HV, VH, VV), all at the same wavelength of 23.84 cm (238,400 µm).
The Sentinel-1 metadata assigns distinct wavelength values to its polarizations, which raises questions about how to represent UAVSAR’s consistent wavelength in the metadata.

Key Questions

What units or scaling were used for the Sentinel-1 wavelengths (3.5 and 4.0) in the metadata?
How should UAVSAR’s four polarizations and shared wavelength be represented in the metadata for compatibility with the Clay model?

Any guidance on these points would be greatly appreciated to ensure proper metadata representation and fine-tuning alignment.

The text was updated successfully, but these errors were encountered:

weiji14 · 2024-12-05T21:29:25Z

What units or scaling were used for the Sentinel-1 wavelengths (3.5 and 4.0) in the metadata?

Hi @mmarks13, you are correct that Sentinel-1 C-band indeed has a central wavelength of ~5.6cm. In my review of #240, I did make sure that this was expressed correctly (vv and vh both were 55465.76µm), based on the formula (speed of light / frequency = wavelength):

$$ 299792458 \text{m} \text{s}^{-1} / 5.405GHz = 0.05546576 \text{m} = 55465.76\micro\text{m} $$

However, the PR which superseded #240 at #253 used 3.5 and 4.0. Looking at the code, around here:

model/src/model.py

Line 463 in 7fe5e68

waves = torch.tensor(list(self.metadata[platform].bands.wavelength.values()))

model/src/model.py

Lines 39 to 40 in 7fe5e68

    
           self.patch_embedding = DynamicEmbedding( 
        
               wave_dim=128,

model/src/factory.py

Lines 86 to 87 in 7fe5e68

    
           self.weight_generator = WavesTransformer( 
        
               wave_dim,

You'll notice that a WavesTransformer class is used, which was adapted from DOFA, somewhere here if I'm not mistaken: https://github.com/zhu-xlab/DOFA/blob/50995d749b39532afbbcaf2529f4dea512212a93/pretraining/wave_dynamic_layer.py#L43-L51

If you read the DOFA paper, there is this sentence:

For SAR images from Sentinel-1, the wavelength is uniquely larger than other bands, and therefore the μm unit is not reasonable. We thus set the λ to 3.75 to distinguish it from different bands.

and you can see in the DOFA README.md at https://github.com/zhu-xlab/DOFA/blob/50995d749b39532afbbcaf2529f4dea512212a93/README.md?plain=1#L170-L171 that they do use 3.75 and 3.75 for Sentinel-1 VV and VH. As for why we used 3.5 and 4.0 in Clay, I'm guessing these are just values +/- 0.25 of 3.75 to differentiate between the VV and VH bands. @srmsoumya or @yellowcap, could you maybe elaborate on this a little bit more?

How should UAVSAR’s four polarizations and shared wavelength be represented in the metadata for compatibility with the Clay model?

Before jumping in to use Clay with UAVSAR, I will just caution that the v1.5 model itself has not been trained with L-band quad-pol SAR data, and the wavelength/frequency of L-band is quite far beyond that of C-band. We've started with C-band because Sentinel-1 was readily available, but X-band and L-band would be nice to get into too (somewhat hinted on at #19 (comment)), especially with so many commercial X-band SAR companies putting out open data now, and NISAR due to launch 'soon'.

All that said, if you're still keen to experiment with this, you could try to set the wavelength to an arbitrarily large value like 9.0 or something, and see how the finetuning results look like. If you're keen to get the Clay model pre-trained on UAVSAR L-band, P-band, Ka-band, etc, we could also discuss this 😄

yellowcap · 2024-12-06T14:39:41Z

Great summary @weiji14 I think that was the rationale. To not have a gap that is too large between systems, we chose values that "stand out" when compared to values in the optical range. So I agree with using large values and test the model that way. This should trigger the model into looking at this more from the Sentinel-1 / SAR lens. Keen to hear if that works, but I think it could, as a lot of the learning is in the spatial patterns which are hopefully kind of similar to S1 in your data as well. Ensure that the data is properly normalized before passing it to the model.

mmarks13 · 2024-12-06T18:44:30Z

Thank you both for the detailed explanations!

I’ll likely start by testing proxy wavelengths like 8.5, 9, 9.5, and 10 for the four UAVSAR polarizations. I appreciate the rationale behind using larger values for SAR wavelengths to distinguish them from optical ranges, and I’ll see how these adjustments work during fine-tuning.

While these thoughts are fresh, I wanted to share a couple of ideas for potential future development (for documentation purposes):

Standardize wavelength units and apply logarithmic scaling to normalize values across sensors. This would allow the actual values for both SAR and optical bands to be used while ensuring they remain on a consistent and reasonable scale. (Note: if micrometers are kept as the units, you'd end up with negative values after logarithmic scaling of some wavelengths)
Represent polarization directly as a metadata argument, which could help ensure clarity and consistency across sensors.
Include the incidence, zenith and/or azimuth angles as a metadata argument, as it could play a critical role in certain applications, especially for SAR data.

Thanks again for the insights, and I’ll follow up in this thread with any observations or results as I experiment further!

brunosan · 2024-12-06T19:31:13Z

These are great ideas, thanks @mmarks13.

Adding metadata that is not that relevant for the majority of data, like polarization, zenith, sun angle, DEM!... is a great idea and also one that probably needs us to be smart so the model is not only using the instrument specs to "translate inputs" but also relate across instruments (...e.g. can be expect something in SAR if we know RGB? That way other bands serve as "soft labels" across instruments whenever we have co-observations of same place and time on different instrument). We tried to make the model "cross-modal" in that way (i.e. to train it to reconstruct one instrument from another, giving the metadata of both) but we didn´t implement it. I believe that that should help the model to pay more attention to the instrument specs. In the end instruments are lenses to see different aspects of the same location [and time] (that´s why we also drop the entire input 10% of the time, to force the model to have an expectation based only on location and time). ... it´s also true that different bands can see wildly different things, like in shallow waters...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification on Wavelength Units and Metadata Representation for SAR Data #338

Clarification on Wavelength Units and Metadata Representation for SAR Data #338

mmarks13 commented Dec 5, 2024

weiji14 commented Dec 5, 2024

yellowcap commented Dec 6, 2024

mmarks13 commented Dec 6, 2024

brunosan commented Dec 6, 2024

Clarification on Wavelength Units and Metadata Representation for SAR Data #338

Clarification on Wavelength Units and Metadata Representation for SAR Data #338

Comments

mmarks13 commented Dec 5, 2024

Description

1. Wavelength Units

2. Handling UAVSAR Data

Key Questions

weiji14 commented Dec 5, 2024

yellowcap commented Dec 6, 2024

mmarks13 commented Dec 6, 2024

brunosan commented Dec 6, 2024