Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Restore bit repro using FMA in selected runs
This patch modifies select calculations of PR#1616 in order to preserve bit reproducibility when FMA optimization is enabled. We add parentheses and reorder terms in selected expressions which either direct or suppress FMAs, ensuring equivalence with the previous release. We address two specific equations in the PR. The first is associated with vertical friction coupling coupling coefficient. The diff is shown below. - a_cpl(i,K) = Kv_tot(i,K) / (h_shear*GV%H_to_Z + I_amax*Kv_tot(i,K)) + a_cpl(i,K) = Kv_tot(i,K) / (h_shear + I_amax*Kv_tot(i,K)) The denominator is of the form `a*b + c*d`. A compiler may favor an FMA of the form `a*b + (c*d)`. However, the modified equation is of form which favors the `a + c*d` FMA. Each form gives different results in the final bits. We resolve this by expliciting wrapping the RHS in parentheses: a_cpl(i,K) = Kv_tot(i,K) / (h_shear + (I_amax*Kv_tot(i,K))) Although this disables the FMA, it produces the same bit-equivalent answer as the original expression. ---- The second equation for TKE due to kappa shear is shown below. - tke_src = dz_Int(K) *(((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - & - (TKE(k) - q0)*TKE_decay(k)) - & + tke_src = h_Int(K) * (dz_h_Int(K)*((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - & + (TKE(k) - q0)*TKE_decay(k)) - & ... The outer equation was of the form `b + c` but is promoted to `a*b + c`, transforming it to an FMA. We resolve this by suppressing this FMA optimization: tke_src = h_Int(K) * ((dz_h_Int(K) * ((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K))) - & (TKE(k) - q0)*TKE_decay(k)) - & ... ---- The following two changes are intended to be the smallest modification which preserves answers for known testing on target compilers. It does not encompass all equation changes in this PR. If needed, we could extend these changes to similar modifications of PR#1616. We do not expect to support bit reproducibility when FMAs are enabled. But this is an ongoing conversation, and the rules around FMAs should be expected to change as we learn more and agree on rules of reproducibility.
- Loading branch information