Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update trust_region_step_exact_ solvers #1363

Merged
merged 42 commits into from
Dec 18, 2024
Merged

Update trust_region_step_exact_ solvers #1363

merged 42 commits into from
Dec 18, 2024

Conversation

YigitElma
Copy link
Collaborator

@YigitElma YigitElma commented Nov 13, 2024

  • Updates the initial guess for $\alpha$ to be 0 instead of 0.001* $\alpha$ _upper. I found out that the solution of $\alpha$ is usually pretty small. This reduces the total amount of QR's in the while loop.
  • Removes the last QR outside the while loop of trust_region_step_exact_qr. The loop condition already satisfies the step norm to be around trust radius.

Misc

  • Adds execute_on_cpu flag to desc.examples.get

@YigitElma YigitElma added performance New feature or request to make the code faster enhancement General label for enhancement. Please also tag with "Speed", "Interface", "Functionality", etc labels Nov 13, 2024
@YigitElma YigitElma self-assigned this Nov 13, 2024
Copy link
Contributor

github-actions bot commented Nov 13, 2024

|             benchmark_name             |         dt(%)          |         dt(s)          |        t_new(s)        |        t_old(s)        | 
| -------------------------------------- | ---------------------- | ---------------------- | ---------------------- | ---------------------- |
 test_build_transform_fft_lowres         |     +0.40 +/- 4.70     | +2.13e-03 +/- 2.49e-02 |  5.33e-01 +/- 2.2e-02  |  5.31e-01 +/- 1.1e-02  |
 test_equilibrium_init_medres            |     -0.00 +/- 1.82     | -2.87e-05 +/- 7.53e-02 |  4.14e+00 +/- 6.4e-02  |  4.14e+00 +/- 4.0e-02  |
 test_equilibrium_init_highres           |     +0.69 +/- 1.06     | +3.73e-02 +/- 5.73e-02 |  5.46e+00 +/- 3.9e-02  |  5.42e+00 +/- 4.2e-02  |
 test_objective_compile_dshape_current   |     +0.11 +/- 1.31     | +4.34e-03 +/- 5.16e-02 |  3.95e+00 +/- 3.4e-02  |  3.94e+00 +/- 3.9e-02  |
 test_objective_compute_dshape_current   |     +0.11 +/- 1.31     | +5.55e-06 +/- 6.72e-05 |  5.13e-03 +/- 4.2e-05  |  5.13e-03 +/- 5.2e-05  |
 test_objective_jac_dshape_current       |     -2.76 +/- 5.23     | -1.21e-03 +/- 2.28e-03 |  4.24e-02 +/- 1.3e-03  |  4.36e-02 +/- 1.9e-03  |
 test_perturb_2                          |     +0.13 +/- 1.75     | +2.57e-02 +/- 3.44e-01 |  1.97e+01 +/- 2.7e-01  |  1.97e+01 +/- 2.2e-01  |
 test_proximal_freeb_jac                 |     -0.17 +/- 1.19     | -1.25e-02 +/- 8.80e-02 |  7.40e+00 +/- 7.6e-02  |  7.42e+00 +/- 4.5e-02  |
 test_solve_fixed_iter                   |     -4.16 +/- 1.77     | -1.38e+00 +/- 5.86e-01 |  3.18e+01 +/- 3.6e-01  |  3.31e+01 +/- 4.6e-01  |
 test_LinearConstraintProjection_build   |     +0.75 +/- 2.75     | +7.66e-02 +/- 2.82e-01 |  1.03e+01 +/- 2.0e-01  |  1.03e+01 +/- 2.0e-01  |
 test_build_transform_fft_midres         |     +0.62 +/- 2.91     | +3.80e-03 +/- 1.79e-02 |  6.21e-01 +/- 1.5e-02  |  6.17e-01 +/- 9.4e-03  |
 test_build_transform_fft_highres        |     +0.18 +/- 2.79     | +1.75e-03 +/- 2.74e-02 |  9.83e-01 +/- 1.4e-02  |  9.81e-01 +/- 2.4e-02  |
 test_equilibrium_init_lowres            |     -0.65 +/- 1.75     | -2.54e-02 +/- 6.88e-02 |  3.92e+00 +/- 4.0e-02  |  3.94e+00 +/- 5.6e-02  |
 test_objective_compile_atf              |     +0.72 +/- 3.91     | +5.86e-02 +/- 3.18e-01 |  8.19e+00 +/- 6.2e-02  |  8.13e+00 +/- 3.1e-01  |
 test_objective_compute_atf              |     +0.44 +/- 3.08     | +7.02e-05 +/- 4.90e-04 |  1.60e-02 +/- 3.7e-04  |  1.59e-02 +/- 3.2e-04  |
 test_objective_jac_atf                  |     +0.91 +/- 2.53     | +1.79e-02 +/- 4.97e-02 |  1.99e+00 +/- 4.0e-02  |  1.97e+00 +/- 2.9e-02  |
 test_perturb_1                          |     +0.05 +/- 1.07     | +8.17e-03 +/- 1.61e-01 |  1.50e+01 +/- 8.0e-02  |  1.49e+01 +/- 1.4e-01  |
 test_proximal_jac_atf                   |     -0.32 +/- 0.72     | -2.65e-02 +/- 5.97e-02 |  8.26e+00 +/- 4.4e-02  |  8.29e+00 +/- 4.1e-02  |
 test_proximal_freeb_compute             |     +0.18 +/- 1.14     | +3.71e-04 +/- 2.29e-03 |  2.02e-01 +/- 2.0e-03  |  2.01e-01 +/- 1.2e-03  |
+test_solve_fixed_iter_compiled          |     -7.13 +/- 1.19     | -1.58e+00 +/- 2.63e-01 |  2.05e+01 +/- 1.1e-01  |  2.21e+01 +/- 2.4e-01  |

@f0uriest
Copy link
Member

isn't this the same as the cho option but using LU instead of cholesky?

@YigitElma
Copy link
Collaborator Author

YigitElma commented Nov 13, 2024

isn't this the same as the cho option but using LU instead of cholesky?

Yes, pretty similar. jax.scipy.linlag.solve with symmetric A should call LAPACK ?sysv (docs), and it is something like LU. For finding alpha, I used Secant method.

Cholesky is also really fast (maybe faster than this one), but maybe this is more accurate, that's why I wanted to test it. If I remember correctly the problem with Cholesky was the accuracy.

@f0uriest
Copy link
Member

Generally the innaccuracy comes from explitly forming J^T J when J is poorly conditioned, so I don't think you'll see any major improvements with this. Also cholesky should be faster.

@dpanici
Copy link
Collaborator

dpanici commented Nov 20, 2024

run on multiple threads

Copy link

codecov bot commented Nov 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.61%. Comparing base (93d2564) to head (fad3062).
Report is 43 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1363      +/-   ##
==========================================
- Coverage   95.61%   95.61%   -0.01%     
==========================================
  Files          98       98              
  Lines       25420    25417       -3     
==========================================
- Hits        24306    24303       -3     
  Misses       1114     1114              
Files with missing lines Coverage Δ
desc/examples/__init__.py 100.00% <100.00%> (ø)
desc/optimize/tr_subproblems.py 99.43% <100.00%> (-0.02%) ⬇️

... and 3 files with indirect coverage changes

@@ -1414,7 +1414,8 @@ def test_optimize_coil_currents(DummyCoilSet):
verbose=2,
copy=True,
)
# check that optimized coil currents changed by more than 15% from initial values
# check that on average optimized coil currents changed by more than
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1406 made the same change I used here, the change of comment is from my version.

dpanici
dpanici previously approved these changes Dec 13, 2024
@YigitElma YigitElma marked this pull request as draft December 14, 2024 19:03
@YigitElma YigitElma marked this pull request as ready for review December 17, 2024 03:51
@YigitElma
Copy link
Collaborator Author

YigitElma commented Dec 17, 2024

It is a bit hard to give proper speed improvement from this PR. It depends on the case. For example, currently our longest regression test, test_ATF_results used to take 4.5 mins on my laptop with GPU, now it takes 3mins. The final equilibrium is slightly different around the axis but the force error is lower.

eq0 = get("ATF")
eq = Equilibrium(
    Psi=eq0.Psi,
    NFP=eq0.NFP,
    L=eq0.L,
    M=eq0.M,
    N=eq0.N,
    L_grid=eq0.L_grid,
    M_grid=eq0.M_grid,
    N_grid=eq0.N_grid,
    pressure=eq0.pressure,
    iota=eq0.iota,
    surface=eq0.get_surface_at(rho=1),
    sym=eq0.sym,
    spectral_indexing=eq0.spectral_indexing,
)
eqf = EquilibriaFamily.solve_continuation_automatic(
    eq, verbose=2, jac_chunk_size=500
)

ATF_continuation_pr_vs_master

Force error after pr: 1.1110e-04
Force error after master: 2.7513e-04

However, the HELIOTRON test is not affected as much as ATF. 2min 42 sec to 2min 30sec.

@YigitElma YigitElma changed the title Update trust_region_step_exact_qr Update trust_region_step_exact_ solvers Dec 17, 2024
@dpanici
Copy link
Collaborator

dpanici commented Dec 17, 2024

It is a bit hard to give proper speed improvement from this PR. It depends on the case. For example, currently our longest regression test, test_ATF_results used to take 4.5 mins on my laptop with GPU, now it takes 3mins. The final equilibrium is a bit different but the force error is lower. ATF_continuation_pr_vs_master

Force error after pr: 1.1110e-04
Force error after master: 2.7513e-04

However, the HELIOTRON test is not affected as much as ATF. 2min 42 sec to 2min 30sec.

was the difference in ATF due to some bug?

@YigitElma
Copy link
Collaborator Author

was the difference in ATF due to some bug?

I don't think it is a bug. Just slight numerical differences cause small discrepancy around axis region of ATF.

@YigitElma YigitElma merged commit 34dbac0 into master Dec 18, 2024
25 checks passed
@YigitElma YigitElma deleted the yge/tr_direct branch December 18, 2024 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance New feature or request to make the code faster skip_changelog No need to update changelog on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants