Quick prediction of partition functions, Q, using trained machine learning estimators.
This repository contains the final products of the works: [DOI]
To use the software, install and activate the conda environment, and then install the package.
conda create --file environment.yml --name <new_env_name>
conda activate <new_env_name>
pip install .
Once installed, the package can be used at the command line as python quickq.py <arguments>
:
usage: quickq.py [-h] (-q | -t | -d) files_dir
predict partition functions
positional arguments:
files_dir path to data directory containing structure files or reaction directories
optional arguments:
-h, --help show this help message and exit
-q, --qest use Qest to predict partition functions of molecules
-t, --qests use QesTS to predict partition functions of unknown transition states
-d, --double use Qest and QesTS to predict partition functions of unknown transition states
Note that structures may only contain carbon, hydrogen, nitrogen and/or oxygen atoms. The predictor will function for systems with more than 7 C, N, O atoms due to the size independence of the EncodedBonds featurizer [1]. However, it was not tested for these systems. See sections below for accepted input data format details.
Qest is a predictor of partition functions for arbitrary structures.
Input data: two files are required for each system
- an extended XYZ file with the extension
.extxyz
- contains atom types and positions - units [angstrom] - a comma seperated value file with extension
.csv
- contains temperatures at which to predict the partition function - units [Kelvin]. This column must either have label "T" or start with "T ". Other columns will not be modified.
Both files must have the same file name header, eg molecule1.extxyz
and molecule1.csv
.
Place both files for each system in a directory with a path we call files_dir
.
The directory structure should then look like, where XX
and YY
represent an arbitrary names, note that no name should be repeated between systems:
/files_dir/
|-XX.extxyz
|-XX.csv
|-YY.extxyz
|-YY.csv
|-...
The predictions can then be executed by the following command python quickq.py <files_dir> -q
, after which each csv file will have a new column log_qpart_predicted
corresponding to the natural log of the predicted partition functions.
QesTS is a predictor of unknown transition state partition functions.
Input data: Four files are required for each reaction, where XX
is an arbitrary name:
- an extended XYZ file with the name
rXX.extxyz
- contains atom types and positions for the reactant - units [angstrom] - an extended XYZ file with the name
pXX.extxyz
- contains atom types and positions - units [angstrom] for the product - a comma seperated value file with name
rXX.csv
- contains at least two columns. One column is a column of temperatures in Kelvin and it must be in the first position. This column must either have label "T" or start with "T ". The second column must have the label "log_qpart" with values of the natural logarithm of the reactant partition functions at the specified temperatures. Other columns will not be modified. - a comma seperated value file with name
pXX.csv
with the same format asrXX.csv
except with product logged partition functions.
Note that the temperatures in the two csv files must be identical. This will not be checked.
Place these files alone in a directory entitled "rxnXX". This directory represents the reaction with identifier XX. Place as many reactions as interested in alone in a directory whose path we call files_dir
.
The directory structure should then look like, where XX
and YY
represent a arbitrary names, note that no name should be repeated between systems:
/files_dir/
|-rxnXX/
| |-rXX.extxyz
| |-pXX.extxyz
| |-rXX.csv
| |-pXX.csv
|-rxnYY/
| |-rYY.extxyz
| |-pYY.extxyz
| |-rYY.csv
| |-pYY.csv
...
The predictions can then be executed by the following command python quickq.py <files_dir> -t
, after which each reaction directory will contain a csv file entitled "tsXX.csv" where XX is that reactions identifier. This csv file contains the temperatures used to predict the partition function, and a column entitled "log_qpart_predicted" associated with the logarithm of the partition functions of the unknown transition state at those temperatures.
The Double model prediction utilizes Qest and QesTS to predict partition functions of unknown transition states using only structure and temperature. The directory structure required is identical to QesTS prediction (See 2. QesTS Usage) except that the "log_qpart" columns in the reactant and product csv files are no longer ncessary. Note that the temperatures column still must be present.
The predictions can then be executed by the following command python quickq.py <files_dir> -d
, after which each reaction directory will contain a csv file entitled "tsXX.csv" where XX is that reactions identifier. This csv file contains the temperatures used to predict the partition function, and a column entitled "log_qpart_predicted" associated with the logarithm of the partition functions of the unknown transition state at those temperatures.
The repository contains a directory "toy_data" containing three datasets "qest_test", "qests_test" and "double_test". These datasets each contain the minumum amount of information needed to make predictions (note that the extra information in the extended xyz files are not necessary, only the atoms and positions) using each of the three models.
[1] C. R. Collins, G. J. Gordon, O. A. Von Lilienfeld, and D. J. Yaron, J. Chem. Phys. 148, 241718 (2018).