Atom at the origin cause incorrect res_atom_elements results #296

wanggaa · 2024-09-26T07:06:00Z

Hello lucidrain, firstly thank you very much for the pytorch reproduction of alphafold3.

In file
inputs.py
function
extract_canonical_molecules_from_biomolecule_chains

in some cif_file, certain key ion coordinates are set to the origin, which causes this line of code
res_atom_positions = atom_positions[res_ligand_atom_mask]
can not get the corresponding result correctly, its return
res_atom_elements is null.

when use in later function create_mol_from_atom_positions_and_types, it raise an exception
ValueError: The length of atom_elements and xyz_coordinates must be the same.

You can reproduce this problem using the 1qyl_assembly1.cif file as input, which has two vanadium ions that are each at the origin with 25% probability.

The text was updated successfully, but these errors were encountered:

amorehead · 2024-09-29T17:38:58Z

Hi, @wanggaa. Thanks for your kind words on this project!

My intuition tells me that this issue is caused by said ions having a zero vector for their coordinates, as one can see is possible in the construction of Biomolecule objects here (which are subsequently used to build PDBInputs -> MoleculeInputs -> AtomInputs):

alphafold3-pytorch/alphafold3_pytorch/common/biomolecule.py

Line 813 in a04e0cc

pos[residue_constants.atom_order[atom_name]] = atom.coord

For such ions, their singular atom_mask value is 1, even though their coordinates may be all zeros. Subsequently, as you noticed, in extract_canonical_molecules_from_biomolecule_chains, we filter for only the atom elements in a given molecule that are associated with an atom possessing non-null coordinates. This is what usually causes these element count-coordinates count mismatches to get caught later on, which then cause the PDB structure to be "rejected" by our dataloader and replaced with another example for training/validation.

I've noticed this occurrence for other PDB IDs, and in such cases, it's an open question how best to handle such PDB structures. Ideas or pull requests for better ways to handle such edge cases are very much welcome.

Best,
Alex

lucidrains added the good first issue Good for newcomers label Sep 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atom at the origin cause incorrect res_atom_elements results #296

Atom at the origin cause incorrect res_atom_elements results #296

wanggaa commented Sep 26, 2024

amorehead commented Sep 29, 2024 •

edited

Loading

Atom at the origin cause incorrect res_atom_elements results #296

Atom at the origin cause incorrect res_atom_elements results #296

Comments

wanggaa commented Sep 26, 2024

amorehead commented Sep 29, 2024 • edited Loading

amorehead commented Sep 29, 2024 •

edited

Loading