osr-correctness.tex

% !TEX root = thesis.tex

%\section{Proving On-Stack Replacement Sound}
\section{Towards a Provably Sound On-Stack Replacement}
\label{se:osr-a-la-carte}

We have seen that OSR is employed in modern adaptive compilation systems to dynamically switch between different versions of a function depending on the program's run-time state. Traditionally, code optimizers are responsible for marking the points where such transitions can take place, and generating required meta-data or ad-hoc code to get the program state to a correct resumption point. OSR is usually at the core of large and complex JIT compilers employed by popular production virtual machines. The engineering effort to implement OSR in a language runtime can be daunting, thus making it rarely accessible to the research community.%Indeed, the engineering effort to integrate this technique in a language runtime can be daunting, making it rarely accessible to the research community.

\paragraph*{Contributions.} In this thesis we investigate how to provide VM builders with a rich ``menu'' of possible program points where OSR can safely occur, relieving code optimizers from the burden of generating all the required machinery to realign the program state during an OSR transition.

To capture OSR in its full generality, we define a notion of {\em multi-program}, which is a collection of different versions of a program along with support to dynamically transfer execution between them. Execution in a multi-program starts from a designated base version. At any time, an oracle decides whether execution should continue in the current version, or an OSR should divert it to a different version, modeling any conceivable OSR-firing strategy.

One of the goals of our work is to characterize sufficient conditions for a multi-program to be {\em deterministic}, yielding the same result regardless of the oracle's decisions. This captures the intuitive idea that any sequence of OSR transitions is {\em correct} if it does not alter the intended semantics of a program: this is the case when the different versions are generated by applying different compiler transformations.

In our formalization, we distill the essence of OSR to an abstract program morphing problem over a simple imperative calculus with an operational semantics. %We show how to make single transformations OSR-aware in isolation, and then flexibly combine them by exploiting the composability of glue code. 
Using program bisimulation, we prove that an OSR can correctly divert execution from one program version to the other if they are {\em live-variable bisimilar}, i.e., the live variables they have in common at any corresponding execution states are equal. As prominent examples of how bisimulation can be used to prove this property, we consider classic optimizations that eliminate or move code around, such as dead code elimination, constant propagation, and code hoisting.

We show how to construct OSR machinery by devising an algorithm that automatically generates compensation code to reconstruct the values of the variables that are live at the OSR target, but not at the source. We make single transformations OSR-aware in isolation, and flexibly combine them by exploiting the composability of compensation code. Finally, we discuss an implementation of these ideas in LLVM.
%We thus make single transformations OSR-aware in isolation, and then flexibly combine them by exploiting the composability of compensation code.

\input{osr-language}
\input{osr-language-reasoning}

\subsection{OSR Framework}
OSR consists in dynamically transferring execution from a point $l$ in a program $\pi$ to a point $l'$ in a program $\pi'$ so that execution can transparently continue from $\pi'$ without altering the original intended semantics of $\pi$. To model this behavior, we assume there exists a function that maps each point $l$ in $\pi$ where OSR can safely be fired to the corresponding point $l'$ in $\pi'$ from which execution can continue.
As we observed in \mysection\ref{ss:osr-llvm-approach}, the OSR practice often makes the conservative assumption that $\pi'$ can always continue from the very same memory store as $\pi$. However, this assumption may reduce the number of points where sound OSR transitions can be fired. To overcome this limitation and support more aggressive OSR transitions, our model includes a {\em store compensation code} $\chi$ to be executed during an OSR transition from point $l$ in $\pi$ to point $l'$ in $\pi'$. The goal of the compensation code is to fix the memory store of $\pi$ at $l$ so that execution can safely continue in $\pi'$ from $l'$ with the fixed store. Note that, if no compensation is needed for an OSR transition, $\mysem{\chi}$ is simply the identity function. We formalize these concepts in the next sections.

\input{osr-mapping}
\input{osr-mapping-correctness}
\input{osr-lve-examples}
\input{osr-composition}
\input{osr-multiprogram}
\input{buildcomp-llvm}

\subsection{Discussion}
The techniques described in the previous pages represent a first step towards a provably sound methodological framework for on-stack replacement. OSR is not only a great engineering problem, but also an intellectually challenging endeavor. We think that our formalization, by distilling OSR to an abstract program morphing problem, may help to prototype better continuous optimizations.

A key innovation we introduce is the ability to make single passes OSR-aware in isolation, and then flexibly combine them in any order by exploiting the composability of glue code. Think for instance of applying a speculative optimization (e.g., aggressive inlining), followed by further passes enabled by that optimization. Without a principled approach to composing glue code, dynamically jumping to a safe version by undoing all the applied optimizations when an assumption fails becomes a daunting engineering task that only production runtimes such as HotSpot dare to pursue. Demystifying OSR can allow the community to contribute.

%Ideally, the presence of an OSR point should be completely transparent, without slowing down the performance of the code in any way. However, this may not always be possible as a glue code may require state portions to be logged during the program's execution. Our work lies at the performance-preserving end of the spectrum, as we do not impose any barriers to optimizations, which run unhindered, and we do not require any state logging. We assess the practical impact of this design choice in \mysection\ref{se:eval-OSR-alC}, in which we experimentally analyze in prominent benchmarks the fraction of program locations where OSR can be efficiently fired in the presence of typical optimization passes from the LLVM compiler toolchain.

Ideally, the presence of an OSR point should be completely transparent, without slowing down the performance of the code in any way. However, this may not always be possible as a glue code may require state portions to be logged during the program's execution. Our work lies at the performance-preserving end of the spectrum, as we do not impose any barriers to optimizations, which run unhindered, and we do not require any state logging. We assess the practical impact of this design choice in \mysection\ref{se:eval-OSR-alC}, in which we experimentally analyze in prominent benchmarks the fraction of program locations where OSR can be efficiently fired in the presence of typical optimization passes from the LLVM compiler toolchain. %As observed in the previous section, the effort required to instrument existing LLVM optimizations to support OSR is typically little.

%we present promising results on real benchmarks to which extent glue code can be generated by only relying on the live state at the OSR origin.
%There is a trade-off between the number of points where OSR can be correctly fired and the price to pay in terms of space and time in order to support them. Our work lies at the performance-preserving end of the spectrum, supporting OSR transitions in constant time and space. The approach we propose does not impose any barriers to optimizations, which run unhindered, and does not require any state logging. To assess the practical impact of this design choice, in \mysection\ref{se:evaluation} we analyze experimentally on prominent benchmarks - and across several optimization passes in the LLVM compiler toolchain - to which extent glue code can be generated by only relying on the live state at the OSR origin.

A deep understanding of OSR's flexibility/performance trade-offs remains nonetheless a compelling goal. How can we perform fine-grained OSR transitions across transformations that significantly change the structure of a program, as aggressive loop optimizations do? How can we handle situations where the landing location of an OSR transition may not be unique, as in software pipelining~\cite{Kundu09}? %We probably open more questions than we solve.

\input{osr-correctness-related}