Error while reading old trained models and dimer descriptors issue #627

OlgaChalykh · 2023-12-16T21:39:56Z

Dear QUIP developers,

I am going into details of Max Veit's paper https://pubs.acs.org/doi/abs/10.1021/acs.jctc.8b01242 and exploring the archives with the trained models, datasets, and training scripts published at https://www.repository.cam.ac.uk/items/f8cfd6c4-4323-4d29-b05d-177928150a45.

It looks like QUIP commands have changed since the paper publication. Regarding this, I have some questions:

when I am trying to use any trained model, I get error:
SYSTEM ABORT: Potential_read_params_xml: could not initialize potential from xml_label
Can I modify trained models files to use them?
I was trying to modify the training script for 6-D dimer GAP. I excluded core_param_file and excluded substituted general_dimer descriptor with A2_dimer keeping other parameters and training set the same, but got error:

SYSTEM ABORT: Traceback (most recent call last)
File "/project/src/libAtoms/Topology.f95", line 2536 kind unspecified
Cannot find pair for atom index 2           
STOP 1

Seems like it is a problem with monomer cutoff, but when I increased those parameters it did not solve the problem. Unfortunately, I did not find detailed instructions on how to work with dimer descriptors. How can I solve this problem?

Training script I use:

!gap_fit at_file=./bulk-methane-fit-dimer/repo-fit-dimer/me-rigid-shortaug3-gscc.xyz \
gap={A2_dimer cutoff=6.0 \
     cutoff_transition_width=1.0 \
     signature_one={6 1 1 1 1} \
     signature_two={6 1 1 1 1} \
     monomer_one_cutoff=1.5 \
     monomer_two_cutoff=1.5 \
     atom_ordercheck=F \
     strict=F\
     mpifind=T \
     theta_uniform=1.0 \
     covariance_type=ARD_SE \
     n_sparse=2000\
     delta=0.02 \
     sparse_method=CUR_COVARIANCE } \
default_sigma={0.0002 0.002 0.0 0.0}\
sparse_jitter=1e-10 \
energy_parameter_name=energy \
force_parameter_name=force \
e0=0.0 \
gp_file=./models/gp.xml \
do_copy_at_file=F

The original training script from paper's supplementary material:

teach_sparseat_file=me-rigid-shortaug3-mp2-avqz-intnonan.xyz core_param_file={../empirical-pots/ljrep_quip_params.xml} core_ip_args={IPLJ} gap={ general_dimercutoff=6.0 cutoff_transition_width=1.0 signature_one={{61 11 1}} signature_two={{6 11 1 1}} monomer_one_cutoff=1.5monomer_two_cutoff=1.5atom_ordercheck=F strict=Fmpifind=T theta_uniform=1.0covariance_type=ARD_SE n_sparse=2000delta=0.02 sparse_method=CUR_COVARIANCE } default_sigma={0.0002 0.0020.0 0.0}sparse_jitter=1e-10 energy_parameter_name=energyforce_parameter_name=forcee0=0.0 gp_file=gp-merig-mp2-gendim-shortaug3.xml do_copy_at_file=F

The text was updated successfully, but these errors were encountered:

gabor1 · 2023-12-16T22:23:25Z

the original command with correct spaces is this:

teach_sparse at_file=$AT_FILE core_param_file={ljrep_quip_params.xml} ip_args={IP LJ} descriptor_str={ \
    general_dimer cutoff=6.0 cutoff_transition_width=1.0 monomer_one_cutoff=1.5 monomer_two_cutoff=1.5 atom_ordercheck=F strict=F signature_one={{6 1 1 1 1}} signature_two={{6 1 1 1 1}} \
        theta_uniform=1.0 covariance_type=ARD_SE n_sparse=2000 delta=0.02} \
    default_sigma={0.002 0.02 0.0} sparse_jitter=1e-10 gp_file=$GP_FILE do_copy_at_file=F sparse_separate_file=T\
    energy_parameter_name=energy force_parameter_name=force e0=0.0

I thin this should work , if you place the ljrep_quip_params.xml file in the current directory. It is quite possible that the missing spaces were the problem. For example, the core_ip_args has to be {IP LJ}, not {IPLJ}

The error you got says that although you asked to look for methane dimers, in which each of the 4 Hs are within 1.5A of the C, it could not find one of the Hs. are you sure your initial geometry file is correct? can you post your initial structure here?

OlgaChalykh · 2023-12-17T14:12:04Z

When I run the original command with corrected spaces (substituted teach_sparse with gap_fit because got error param_read_line: unknown key teach_sparse),

!gap_fit at_file=./datasets/dimer_dataset.xyz core_param_file={ljrep_quip_params.xml} ip_args={IP LJ} descriptor_str={ \
general_dimer cutoff=6.0 cutoff_transition_width=1.0 monomer_one_cutoff=1.5 monomer_two_cutoff=1.5 atom_ordercheck=F strict=F signature_one={{6 1 1 1 1}} signature_two={{6 1 1 1 1}} \
theta_uniform=1.0 covariance_type=ARD_SE n_sparse=2000 delta=0.02} \
default_sigma={0.002 0.02 0.0} sparse_jitter=1e-10 gp_file=gap.xml do_copy_at_file=F sparse_separate_file=T\
energy_parameter_name=energy force_parameter_name=force e0=0.0

I get error:

param_read_line: unknown key ip_args
SYSTEM ABORT: Exit: Mandatory argument(s) missing...   
STOP 1

With ljrep_quip_params.xml potential used as core_param_file in the command above (as well as with any other trained model I got from here ) when I run:

from quippy.potential import Potential
gap = Potential(param_filename='./bulk-methane-fit-dimer/repo-fit-dimer/ljrep_quip_params.xml')

I get error
Potential_read_params_xml: could not initialise potential from xml_label. param_str= <LJREP> <Potential label="ljrep" init_args="IP LJ"/> <LJ_params n_types="2" only_inter_resid="T" label="ljrep-lj"> <per_type_data type="1" atomic_num="1" /> <per_type_data type="2" atomic_num="6" /> <per_pair_data type1="2" type2="2" sigma="3.52608449" eps6="0.0540000776" eps12="0.0540000776" cutoff="14.0" energy_shift="F" linear_force_shift="F" /> <per_pair_data type1="1" type2="2" sigma="1.0" eps6="0.0" eps12="517.030" cutoff="10.0" energy_shift="F" linear_force_shift="F" /> <per_pair_data type1="1" type2="1" sigma="1.0" eps6="0.0" eps12="23.4878" cutoff="10.0" energy_shift="F" linear_force_shift="F" /> </LJ_params> </LJREP>

One of structures from the dataset I used to fit gap with A2_dimer descriptor:

10
energy=-0.00998655 ediff_cc=-0.00109638 cutoff=-1.00000000 nneightol=1.20000000 pbc="T T T" Lattice="30.00000000       0.00000000       0.00000000       0.00000000      30.00000000       0.00000000       0.00000000       0.00000000      30.00000000" Properties=species:S:1:pos:R:3:Z:I:1:resid:I:1:force:R:3
C               0.67655467      1.53558322     -1.64751401       6     141      0.00152822     -0.00057094     -0.00049239
H               0.62038120      1.03628756     -0.68231701       1     141     -0.00331393     -0.00627386      0.00891613
H               0.38843306      0.83959819     -2.43258845       1     141     -0.00145188     -0.00151616     -0.00092832
H               1.69528466      1.87629793     -1.82028809       1     141      0.00139596      0.00108906     -0.00065731
H               0.00216686      2.38948973     -1.65395806       1     141     -0.00122184      0.00083637     -0.00013244
C              -0.67655467     -1.53558322      1.64751401       6     148      0.00278940      0.00189125      0.00103092
H              -0.15292844     -1.79565316      2.56522617       1     148      0.00099199      0.00006716      0.00114083
H               0.00113054     -1.63653049      0.80215612       1     148      0.00310865     -0.00134432     -0.00668266
H              -1.02845731     -0.50776289      1.70952859       1     148     -0.00314982      0.00507783     -0.00042915
H              -1.52591635     -2.20229213      1.51312161       1     148     -0.00067677      0.00074360     -0.00176559

Probably the dataset has no problems because I can successfuly train on it this model:

!gap_fit at_file=./bulk-methane-fit-dimer/repo-fit-dimer/me-rigid-shortaug3-gscc.xyz \
gap={distance_Nb order=2 \
                 cutoff=10.0 \
                 covariance_type=ARD_SE \
                 theta_uniform=1.0 \
                 n_sparse=15 \
                 delta=1.0} \
e0=0.0 \
default_sigma={0.01 0.5 0.0 0.0} \
do_copy_at_file=F sparse_separate_file=F \
gp_file=./models/gap_2b.xml

gabor1 · 2023-12-17T16:35:50Z

yes, that is a change, replace "ip_args" with "core_ip_args".

a single frame of your training data doesn't help, because ONE of your frames triggered the error you reported, but we don't know which. can you send your entire training set? (or a subset of it which triggers the error about the atom not being found)

gabor1 · 2023-12-17T16:36:23Z

the reason the simple training with the distance_Nb descriptor works is because it is not trying to find methane molecules.

OlgaChalykh · 2023-12-17T17:47:57Z

Full dataset : https://github.com/libAtoms/QUIP/files/13697218/me-rigid-shortaug3-gscc.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while reading old trained models and dimer descriptors issue #627

Error while reading old trained models and dimer descriptors issue #627

OlgaChalykh commented Dec 16, 2023

gabor1 commented Dec 16, 2023

OlgaChalykh commented Dec 17, 2023

gabor1 commented Dec 17, 2023

gabor1 commented Dec 17, 2023

OlgaChalykh commented Dec 17, 2023

Error while reading old trained models and dimer descriptors issue #627

Error while reading old trained models and dimer descriptors issue #627

Comments

OlgaChalykh commented Dec 16, 2023

gabor1 commented Dec 16, 2023

OlgaChalykh commented Dec 17, 2023

gabor1 commented Dec 17, 2023

gabor1 commented Dec 17, 2023

OlgaChalykh commented Dec 17, 2023