From d7ada0215316995bf87de89667d8609fd4106633 Mon Sep 17 00:00:00 2001 From: yaniguan <168690533+yaniguan@users.noreply.github.com> Date: Wed, 6 Nov 2024 16:51:13 -0800 Subject: [PATCH] add OpenLAM Q2 and Q3 (#113) Co-authored-by: Yani Guan --- source/_posts/OpenLAM-2024Q2.md | 65 +++++++++++++++++++++++++++ source/_posts/OpenLAM-2024Q3.md | 79 +++++++++++++++++++++++++++++++++ 2 files changed, 144 insertions(+) create mode 100644 source/_posts/OpenLAM-2024Q2.md create mode 100644 source/_posts/OpenLAM-2024Q3.md diff --git a/source/_posts/OpenLAM-2024Q2.md b/source/_posts/OpenLAM-2024Q2.md new file mode 100644 index 0000000..fe748ba --- /dev/null +++ b/source/_posts/OpenLAM-2024Q2.md @@ -0,0 +1,65 @@ +--- +title: "OpenLAM | 2024 Q1 Report Infrastructure Upgrade and Release of Pre-trained Models Compatible with DeePMD-kit v3" +date: 2024-11-06 +categories: +- OpenLAM +--- + +On the journey toward developing a Large Atomic Model (LAM), the core Deep Potential development team has launched the OpenLAM initiative for the community. OpenLAM’s slogan is "Conquer the Periodic Table!" The project aims to create an open-source ecosystem centered on microscale large models, providing new infrastructure for microscopic scientific research and driving transformative advancements in microscale industrial design across fields such as materials, energy, and biopharmaceuticals. + + + +## Codes + +The DeePMD-kit v3-beta version (v3.0.0b3) has been released. Compared to the previously released alpha version, the beta version has undergone more comprehensive code refactoring, offers broader support for various model modules, optimizes the support of the DPA-2 model for LAMMPS, and includes many new features. The v3.0.0b3 version further fixes known bugs from v3.0.0b0, v3.0.0b1, and v3.0.0b2, has passed systematic testing, and also includes the release of pre-trained DPA-2 models compatible with v3.0.0b3 (see below). Detailed feature support and optimizations can be found in the release notes: + +https://github.com/deepmodeling/deepmd-kit/releases/tag/v3.0.0b0 + +https://github.com/deepmodeling/deepmd-kit/releases/tag/v3.0.0b1 + +https://github.com/deepmodeling/deepmd-kit/releases/tag/v3.0.0b2 + +https://github.com/deepmodeling/deepmd-kit/releases/tag/v3.0.0b3 (recommended) + +The ABACUS software has released the v3.7 version update. The new version of ABACUS has undergone comprehensive testing for alloy applications, ensuring calculation accuracy and pseudopotential reliability. Additionally, to support the OpenLAM large atomic model initiative, extensive optimizations have been made to enhance performance on domestic DCU hardware, including improvements in memory usage efficiency and calculation speed, further expanding its application range in first-principles calculations for large atomic models. For more details, please refer to ABACUS 3.7 Release + +## Data +High-Pressure Hydrogen-Rich Compound Data This dataset contains 140,000 data points on ternary hydrogen-rich compounds involving 29 elements, covering a pressure range of 150 - 250 GPa. The high-pressure hydrogen-rich compound database can be accessed via the following link: https://www.aissquare.com/datasets/detail?pageType=datasets&name=High-Pressure-Hydrogen-Rich-Compound&id=257 + +Solid-State Electrolyte Data (SSE-ABACUS) This dataset comprises 127,000 data points on solid-state electrolytes involving 27 elements and 365 systems (including sulfides, halides, doped materials, and interfaces). The SSE-ABACUS data can be accessed via the following link: https://www.aissquare.com/datasets/detail?pageType=datasets&name=SSE-abacus&id=260 + +## Models +### Pre-trained Models Based on DeePMD-kit v3-beta Version + +Updated multi-task DPA-2 pre-trained models compatible with DeePMD-kit beta version (v3.0.0b3). Model link: https://www.aissquare.com/models/detail?pageType=models&name=DPA-2.2.0-beta3&id=272 and single-task DPA-2 pre-trained models. + +### Domain Models + +Free Energy Perturbation Model (https://arxiv.org/pdf/2406.09817) + +On the NNP, using the DPA-2 + semi-empirical method achieves accuracy comparable to DFT. A new NNP-tuning molecular force field process was designed, ensuring the potential energy surface accuracy at the NNP level and the computational efficiency at the MM level. + +### Solid-State Batteries + +The pre-trained large model for solid-state electrolytes, DPA-SSE (SSE-ABACUS dataset), has been constructed. The energy prediction accuracy of the model is an order of magnitude higher than that of the CHGNET and M3GNET force fields, and the force prediction accuracy is improved by 1 to 5 times. In terms of dynamic properties, the DPA-SSE's predicted diffusion coefficient and conductivity of Li ions are in good agreement with experimental results. The pre-trained model for bulk materials (chalcogenides) is now available on arXiv (http://arxiv.org/abs/2406.18263). A dedicated app for developing pre-trained models for solid-state electrolytes has also been developed (https://bohrium.dp.tech/apps/voltcraft). + +### Alloys + +A pre-trained large atomic model for alloys, comprising 53 elements, has been constructed using the domestic ABACUS software and DCU machines for first-principles calculations, resulting in 24,000 data points. The configurations include elemental, compound, high-entropy alloy crystal structures, defects such as vacancies, interstitials, and surfaces, covering a pressure range of -0.5 to 5 GPa and a temperature range of 50 to 3000 K. The trained model achieves an energy accuracy of less than 35 meV/atom and a force accuracy of less than 0.25 eV/Å. Compared to the existing MACE pre-trained large atomic model, the developed alloy domain model performs better in properties such as elastic constants, moduli, surface energies, and point defect formation energies. + +### High-Pressure Hydrogen-Rich Compounds + +An atomic-scale model suitable for predicting the structure of high-pressure hydrogen-rich superconducting compounds has been constructed (including ternary hydrogen-rich compounds involving 29 elements, covering a pressure range of 150 - 250 GPa). This DPA-1 model, based on the 2024Q1 version, can currently stably optimize the structures of 28 known hydrogen-rich superconducting compounds, maintaining the same space group before and after optimization (https://www.aissquare.com/models/detail?pageType=models&name=High-Pressure-Hydrogen-Rich-Compound&id=258). + +### Dynamic Catalysis + +A preliminary universal potential function model, DPA-DynaCat, for dynamic catalytic elementary reactions on cluster surfaces has been constructed. Its training set includes common small molecule-related elementary reactions occurring on the surfaces of pure metals and binary alloy clusters of Au/Ag/Cu/Pt/Pd/Ni, involving H2/O2/H2O/CO/CO2/CH4. Compared to DFT calculations, the model can accurately predict potential energy surfaces and achieve approximate reaction free energies similar to those of machine learning potential functions targeted at single systems. This model can be used in conjunction with molecular dynamics for enhanced sampling, considering contributions from various isomers during the reaction process, thus enabling the calculation of reaction barriers and reaction free energies (https://www.aissquare.com/models/detail?pageType=models&name=DPA-DynaCat&id=262). + +## Community + +"Crystal Structure Collection Competition Across the Periodic Table" Officially Kicked Off on July 1st. + +This will be a long-term competition aimed at providing a platform for testing and iterating crystal structure search and generation algorithms, while also enriching the chemical space for OpenLAM updates. For more details, please visit: https://bohrium.dp.tech/competitions/8821838186?tab=introduce. + +Based on the Crystal Structure Collection Competition, the OpenLAM Crystal Structure Database has been constructed, currently containing over 500,000 stable structures. All structures in the database can be accessed through the official API (https://github.com/deepmodeling/openlam). To facilitate the effective connection between experimental scientists' language and perspective with the advancements in current AI technology and databases, we have developed the Crystal Craft APP (https://bohrium.dp.tech/apps/crystalcraft). This app provides more possibilities for experimental scientists in structure retrieval and generation, particularly in considering element substitution and template structures based on crystal symmetry. + diff --git a/source/_posts/OpenLAM-2024Q3.md b/source/_posts/OpenLAM-2024Q3.md new file mode 100644 index 0000000..fe69a6a --- /dev/null +++ b/source/_posts/OpenLAM-2024Q3.md @@ -0,0 +1,79 @@ +--- +title: "2024 Q3 OpenLAM Report | Release of More Accurate and Faster Pre-trained Model, Crystalline Structure Competition in Full Swing" +date: 2024-11-07 +categories: +- OpenLAM +--- + +On the journey toward developing a Large Atomic Model (LAM), the core Deep Potential development team has launched the OpenLAM initiative for the community. OpenLAM’s slogan is "Conquer the Periodic Table!" The project aims to create an open-source ecosystem centered on microscale large models, providing new infrastructure for microscopic scientific research and driving transformative advancements in microscale industrial design across fields such as materials, energy, and biopharmaceuticals. + + + +## Codes + +**DeePMD-kit V3.0.0 beta4 Release: New Features and Optimizations** + +1. **Comprehensive Optimization of DPA-2:** + + - **Model Structure Enhancements:** + - Introduced three-body encoding information and optimized the message-passing update process, significantly improving model training accuracy. + - Added three model configurations: small, medium, and large, allowing users to choose the appropriate model based on different application scenarios. For details, visit: [DPA-2 example](https://github.com/deepmodeling/deepmd-kit/tree/v3.0.0b4/examples/water/dpa2). + + - **Accuracy and Performance Improvements:** + - Compared to the DPA-2 model in beta3, the beta4 DPA-2-medium model shows approximately a 30% improvement in energy accuracy and about a 14% improvement in force accuracy when trained from scratch. Training and inference efficiency have nearly doubled. + - **Accuracy Benchmark (DPA-2-b4-medium vs. DPA-2-b3):** Average test error after training on 27 datasets for 1 million steps, batch size set to "auto:256." (Table 1) + - **Training Speed:** (Example: 192-atom water system, single 40G A800 card) (Table 2) + +Table 1. +| | Energy Weighted RMSE (meV/atom) | Force Weighted RMSE (meV/Å) | +|------------------------|---------------------------------|-----------------------------| +| **DPA-2-b4-medium** | **13.1** | **113.1** | +| **DPA-2-b3** | 18.5 | 130.8 | + +Table 2. +| Model | DPA-2-b4-medium | DPA-2-b3 | +|----------------------|-----------------|----------| +| Training speed (s/100 steps) | **8.4** | 15.9 | + + +2. **New Property Prediction Feature:** + - Supports direct training and prediction on data containing various properties, expanding the model’s application range. Detailed examples can be found at: [Property example](https://github.com/deepmodeling/deepmd-kit/tree/v3.0.0b4/examples/property). + +3. **New Descriptor:** + - Introduced "three-body type embedding" (se_t_tebd). Usage examples available at: [three-body embedding example](https://github.com/deepmodeling/deepmd-kit/tree/v3.0.0b4/examples/water/se_e3_tebd). + +4. **Code Structure Optimization and Stability Improvements:** + - Fixed known issues encountered in daily use, enhancing the stability and maintainability of the code. + +For detailed release notes, visit: [DeePMD-kit v3.0.0 beta4 release notes](https://github.com/deepmodeling/deepmd-kit/releases/tag/v3.0.0b4). + +## Data + +**New Materials Project Trajectory Dataset (MPtraj) in DeepMD Mixed Type Format:** +- **Dataset Link:** [MPtraj Dataset](https://www.aissquare.com/datasets/detail?pageType=datasets&name=MPtraj&id=2781) + +## Models +**1. Pre-trained Model:** + - **a. New Version:** The DPA-2 medium multi-task pre-trained model (2.3.0-b4-medium) adapted for V3.0.0 beta4 code. [Model Link](https://www.aissquare.com/models/detail?pageType=models&name=DPA-2.3.0-v3.0.0b4&id=279). + - **b. Performance Improvement:** Compared to the Q2-released pre-trained model (2.2.0-b3), this new model, which incorporates the MPtraj dataset, achieves double the inference speed while maintaining model accuracy. + + - **Inference Speed Comparison** (192-atom water system, single 40G A800 GPU): (Table 3) + + - **c. Upcoming Releases:** Pre-trained DPA-2 small and large models adapted for V3.0.0 beta4 will be released soon. More information is available on the [AIS Square homepage](https://www.aissquare.com/openlam). + +Table 3. +| Pretrained Model | 2.3.0-b4-medium | 2.2.0-b3 | +|-----------------------|-----------------|----------| +| Python inference speed (s/100 times) | **3.7** | 6.3 | + +**2. Alloy Domain Model:** + - A property-driven fine-tuning workflow has been developed for large models, supporting fine-tuning across multiple properties to create customized potential models for specific applications. The corresponding Bohrium Notebook, *Finetune Alloy Property using APEX + DPGen2*, is available. See DPA-1&2 alloy large model details at: [Alloy Multi-task Model](https://www.aissquare.com/models/detail?pageType=models&name=DPA-1%262-53-alloy-multitask-400w&id=280). + +## Community +**OpenLAM Crystal Structure Competition:** + - The competition has hosted over 20 sessions, yielding numerous outstanding structure generation algorithms, including the Con-CDVAE algorithm, which has led the rankings across several sessions ([Con-CDVAE competition page](https://bohrium.dp.tech/competitions/8821838186?tab=discuss&postId=8415617783)). To encourage algorithm diversity, we adjusted evaluation standards based on submitted data analysis ([Evaluation Standards Update](https://bohrium.dp.tech/competitions/8821838186?tab=discuss&postId=4170270451)). + - The competition database now contains over 13 million crystal structures, with more than 5 million contributions from participants. All structure information in the database is open-source, with two access methods available: + - **Python API:** Full access to structures and predicted energy data ([GitHub - OpenLAM API](https://github.com/deepmodeling/openlam)). + - **App:** Supports multiple search functions and structure analysis ([CrystalCraft App](https://bohrium.dp.tech/apps/crystalcraft)). + +We invite community members to help refine this database as a shared resource. We welcome feedback, feature requests, and data contributions from the community. \ No newline at end of file