Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gap_fit config_file with carriage returns breaks writing of potential #664

Open
bernstei opened this issue Aug 13, 2024 · 18 comments
Open
Assignees

Comments

@bernstei
Copy link
Contributor

If you create config_file that has carriage returns, it seems to work at the stage of generating the potential, but when it tries to write the command line into the tmp_... xml file, there is an error in FoX

At line 221 of file m_common_buffer.F90 (unit = 10, file = 'tmp_GAP_2024_8_13_-240_17_2_28_661.xml')
Fortran runtime error: End of record

Error termination. Backtrace:
#0  0x1456a43ac0c0 in ???
#1  0x1456a43acb65 in ???
#2  0x1456a43ad51b in ???
#3  0x1456a45c38e8 in ???
#4  0x1456a45cf635 in ???
#5  0x1456a45c6e84 in ???
#6  0x1456a45c703b in ???
#7  0x1456a45c37c5 in ???
#8  0xaa00d5 in ???
#9  0xa83878 in ???
#10  0x417faf in __gap_fit_module_MOD_gap_fit_print_xml
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/GAP/gap_fit_module.f95:1622
#11  0x405b5a in gap_fit_program
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/GAP/gap_fit.f95:108
#12  0x40518e in main
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/GAP/gap_fit.f95:38

The end of the tmp_... file is

      <sparseX i="999" alpha="181.57367096770884" sparseCutoff="1.0000000000000000"/>
      <sparseX i="1000" alpha="3078.4077507996994" sparseCutoff="1.0000000000000000"/>
    </gpCoordinates>
  </gpSparse>
  <command_line><![CDATA[atoms_filename=../old/fitting_database.combined.GAP_iter_10.stage_1.extxyz

I can work around that by doing gap_fit $( cat config_file ) instead of gap_fit config_file=config_file, but it'd be nice if that code removed the carriage returns in the command line that it saves in the xml file

@jameskermode
Copy link
Member

@albapa or @Sideboard - would one of you be willing to take a look at this?

@bernstei
Copy link
Contributor Author

bernstei commented Aug 14, 2024

I only have my one example, but I'd expect that you should be able to reproduce it with any config_file=... input that contains a carriage return. However, in case it's more specific, here's the one I'm using.

atoms_filename=../old/fitting_database.combined.GAP_iter_10.stage_1.extxyz
gap=" Z={{13 13}} compact_clusters=T cutoff=4.3 cutoff_transition_width=0.87 distance_Nb=T order=2 covariance_type=ard_se f0=0.0 n_sparse=15 sparse_method=uniform theta_uniform=0.87 delta=0.26079262946888204 add_species=F : Z={{13 29}} compact_clusters=T cutoff=4.1 cutoff_transition_width=0.82 distance_Nb=T order=2 covariance_type=ard_se f0=0.0 n_sparse=15 sparse_method=uniform theta_uniform=0.82 delta=0.26079262946888204 add_species=F : Z={{29 29}} compact_clusters=T cutoff=3.9 cutoff_transition_width=0.78 distance_Nb=T order=2 covariance_type=ard_se f0=0.0 n_sparse=15 sparse_method=uniform theta_uniform=0.78 delta=0.26079262946888204 add_species=F : alpha_max={{8 8}} amplitude_scaling={{1.0 1.0}} atom_sigma_r={{0.55 0.55}} atom_sigma_r_scaling={{0.0 0.0}} atom_sigma_t={{0.55 0.55}} atom_sigma_t_scaling={{0.0 0.0}} central_index=1 central_weight={{1.0 1.0}} compress_mode=trivial l_max=4 n_species=2 radial_enhancement=1 rcut_hard=4.4 rcut_soft=3.3 soap_turbo=T species_Z={{13 29}} covariance_type=dot_product f0=0.0 n_sparse=1000 print_sparse_index=T sparse_method=cur_points zeta=6 delta=0.08681690690518018 add_species=F : alpha_max={{8 8}} amplitude_scaling={{1.0 1.0}} atom_sigma_r={{0.8 0.8}} atom_sigma_r_scaling={{0.0 0.0}} atom_sigma_t={{0.8 0.8}} atom_sigma_t_scaling={{0.0 0.0}} central_index=1 central_weight={{1.0 1.0}} compress_mode=trivial l_max=4 n_species=2 radial_enhancement=1 rcut_hard=6.5 rcut_soft=4.9 soap_turbo=T species_Z={{13 29}} covariance_type=dot_product f0=0.0 n_sparse=1000 print_sparse_index=T sparse_method=cur_points zeta=6 delta=0.08681690690518018 add_species=F : alpha_max={{8 8}} amplitude_scaling={{1.0 1.0}} atom_sigma_r={{0.49 0.49}} atom_sigma_r_scaling={{0.0 0.0}} atom_sigma_t={{0.49 0.49}} atom_sigma_t_scaling={{0.0 0.0}} central_index=2 central_weight={{1.0 1.0}} compress_mode=trivial l_max=4 n_species=2 radial_enhancement=1 rcut_hard=3.9 rcut_soft=2.9 soap_turbo=T species_Z={{13 29}} covariance_type=dot_product f0=0.0 n_sparse=1000 print_sparse_index=T sparse_method=cur_points zeta=6 delta=0.08681690690518018 add_species=F : alpha_max={{8 8}} amplitude_scaling={{1.0 1.0}} atom_sigma_r={{0.75 0.75}} atom_sigma_r_scaling={{0.0 0.0}} atom_sigma_t={{0.75 0.75}} atom_sigma_t_scaling={{0.0 0.0}} central_index=2 central_weight={{1.0 1.0}} compress_mode=trivial l_max=4 n_species=2 radial_enhancement=1 rcut_hard=5.9 rcut_soft=4.4 soap_turbo=T species_Z={{13 29}} covariance_type=dot_product f0=0.0 n_sparse=1000 print_sparse_index=T sparse_method=cur_points zeta=6 delta=0.08681690690518018 add_species=F"
default_sigma="0.0025 0.0625 0.125 0.125"
energy_parameter_name=REF_energy
force_parameter_name=REF_forces
stress_parameter_name=REF_stress
rnd_seed=1106622296
gp_file=CuAl_GAP.xml

in case some other feature in there (e.g. the quotes in the gap field) triggers this issue.

@albapa albapa self-assigned this Aug 14, 2024
@albapa
Copy link
Member

albapa commented Aug 14, 2024

I have tried to reproduce it with a small training XYZ of Ti, and kept the format of your config_file as you provided above. I even added carriage returns within the gap string. It worked fine for me resulting in a final XML file. I tried this both on my mac and linux.

I am wondering if there is something specific about the carriage returns you are using - could it be some sort of DOS/Windows CR problem?

@Sideboard
Copy link
Contributor

Should I look at it as well?

@bernstei
Copy link
Contributor Author

No Windows were involved. I create the file on Linux with

cat << EOF > config_file
some
options
here
.
.
EOF

and then run gap_fit config_file=config_file and it fails, but gap_fit $(cat config_file) and it works

@bernstei
Copy link
Contributor Author

I guess it'd probably be easy enough to add "-g" to the FoX compilation, which is where the error happens, and is currently missing from the stack trace.

@albapa
Copy link
Member

albapa commented Aug 16, 2024

From what I see in my tests is that this%config_string holds the contents of the file (or the command line) together with any line end characters. The command line in my XML still has the line ends and mirrors the format of the config_file.

image

I agree with you that it could be FoX causing the problem (and for some reason it doesn't cause it on my computers). So trying it with -g would be very useful. Alternatively, could you please try changing src/GAP/gap_fit_module.f95:235 such that keep_lf=.false.?

@bernstei
Copy link
Contributor Author

A more complete stack trace now

Error termination. Backtrace:
#0  0x152761e270c0 in ???
#1  0x152761e27b65 in ???
#2  0x152761e2851b in ???
#3  0x15276203e8e8 in ???
#4  0x15276204a635 in ???
#5  0x152762041e84 in ???
#6  0x15276204203b in ???
#7  0x15276203e7c5 in ???
#8  0xaa0105 in __m_common_buffer_MOD_add_to_buffer
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/fox/common/m_common_buffer.F90:221
#9  0xa838a8 in __m_wxml_core_MOD_xml_addcharacters_ch
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/fox/wxml/m_wxml_core.F90:1174
#10  0x417faf in __gap_fit_module_MOD_gap_fit_print_xml
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/GAP/gap_fit_module.f95:1622
#11  0x405b5a in gap_fit_program
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/GAP/gap_fit.f95:108
#12  0x40518e in main
        at /home/cluster/bernstei/src/work/QUIP/source/QUIP_github/src/GAP/gap_fit.f95:38

I'll investigate more.

@bernstei
Copy link
Contributor Author

The deepest line (m_common_buffer.F90:221) is indeed in a bit of code that says something about writing a newline.

@bernstei
Copy link
Contributor Author

From the way my debug prints are going, I think there's just some memory corruption. Maybe a gfortran version issue? Basically, the literal strings in my debugging prints are replaced with other text in the output.

@bernstei
Copy link
Contributor Author

I found my actual debugging statements. They're OK. There's definitely something fishy with FoX, though. They make a big deal out of having a max record length between newlines, but the code that dies seems to ignore all that reasoning and just dump an arbitrarily long string if it finds a newline inside the string.

@bernstei
Copy link
Contributor Author

See also andreww/fox#82

@bernstei
Copy link
Contributor Author

bernstei commented Aug 16, 2024

That issue has my proposed fix to FoX, which seems to work, and I'm trying keep_lf=.false. right now, just for completeness. But I'm 99% it'll also work.

@bernstei
Copy link
Contributor Author

keep_lf=.false. also provided a workaround to the FoX bug.

@albapa
Copy link
Member

albapa commented Aug 17, 2024

I see... So it's not just the line breaks, but also the line lengths.

Rather than relying on a FoX PR, I am happy to do push the keep_lf=.false. workaround, unless there is an obvious disadvantage.

@bernstei
Copy link
Contributor Author

I'm happy with the keep_lf workaround. @jameskermode do you think there's enough of a chance that it'll be fixed upstream that we should avoid the workaround?

@jameskermode
Copy link
Member

Haven't had much contact with the FoX author so I think we should make our own fix independenlty.

@bernstei
Copy link
Contributor Author

Should we just fork it, if the original repo is basically inactive? I'll think about it a bit more, but I'm now reasonably confident of the fix I suggested in that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants