-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintroduction.tex
64 lines (37 loc) · 11.1 KB
/
introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
% !TEX root = thesis.tex
\chapter{Introduction}
Translating programming languages into a form that can {\em efficiently} execute on a target platform is a very challenging problem for computer scientists. Historically, there are two approaches to translation: interpretation and compilation. An interpreter reads the source code of a program, stepping through its expressions to determine which operation to perform next. A compiler instead translates a program into a form that is more amenable to execution, analyzing its source code only once and generating code that would give the same effects as interpreting it.
The two approaches have different benefits in terms of execution speed, portability, footprint, and optimization opportunities. Compiled programs typically execute faster, as a compiler can devote an arbitrary amount of time to {\em static} (i.e., prior to run-time) code analysis and optimization. On the other hand, an interpreter can access run-time information such as taken control-flow, input parameter values, and variable types, thus enabling optimizations that static compilation would miss. Indeed, this information may be subject to changes across different runs, or may not be obtainable in general solely through source code inspection.
Additionally, the evolution of programming languages over the years has provided software developers with a plethora of useful features such as dynamic typing and class loading, reflection, and closures that may hinder efficient code generation in a static compiler. In response, industry and academia have significantly invested in {\em adaptive optimization} technology, which consists in observing the run-time behavior of a program in order to drive optimization decisions.
\section{Context and Motivations}
The past two decades have witnessed the widespread adoption of programming languages designed to run on {\em application virtual machines} (VMs). Compared to statically compiled software, these execution environments provide several advantages from a software engineering perspective, including portability, automatic memory and concurrency management, safety, and ease of implementation for {\em dynamic} features of a programming language such as adding new code, extending object definitions, and modifying the type system.
Application virtualization technology has been brought to the mainstream market by the Java programming language and later by the Common Language Runtime for the execution of .NET programs. Virtual machines are nowadays available for many popular languages, including JavaScript, MATLAB, Python, R, and Ruby.
Modern virtual machines typically implement a mixed-mode execution environment, in which an interpreter is used for executing portions of a program until it becomes profitable to compile them through {\em just-in-time} (JIT) compilation and continue the execution in native code. For efficiency reasons, source code is usually translated into an {\em intermediate representation} (IR) - also known as {\em bytecode} - that is easier to analyze and process. Multiple levels of JIT compilation are possible, each with a different trade-off between compilation time and expected code quality.
Adaptive optimization technology is a central element for the performance of runtime systems. JIT compilation indeed does not come for free: a virtual machine should be able to exploit run-time information to tailor optimized code generation to the current workload, so that the expected performance gains can counterbalance the overhead coming from collecting the profile and performing the optimizations.
%a virtual machine should be able to exploit run-time information to tailor optimized code generation to the current workload in order to achieve effective performance improvements.
Analyzing the run-time behavior of a program is useful also in the context of statically compiled code. {\em Profile-guided optimization} (PGO) techniques adopt a dual-compilation model in which a program is compiled and executed on representative input sets during an initial training stage, and is eventually recompiled using feedback information to generate the final optimized version.
\section{Addressed Problems}
Collecting accurate profiling information with a minimal impact on a running program is a key factor for an effective deployment of adaptive optimization techniques. In principle, developers and VM builders may leverage hardware performance counters provided by modern processors to collect low-level profiling information with no impact on the performance of a running program. However, the difficulty in mapping low-level counter data to high-level constructs such as classes and objects discourages their use for implementing complex analyses in runtime systems.
In the past three decades many sophisticated software-based techniques have been proposed for collecting fine-grained information regarding individual statements, objects, or control-flow paths. These techniques are typically based on program instrumentation, sampling, or a combination of both. For some problems, however, the size of the domain can be particularly large and extant techniques do not scale well or may even run out of space when analyzing real-world programs. In this thesis we investigate how data structure-based techniques from the algorithmic community can be used to devise efficient performance profiling tools. % dire dell'aspetto di focalizzare l'attenzione sul codice da ottimizzare
Another key ingredient for adaptive optimization is the ability of a runtime to divert the execution to the newly generated optimized code {\em while} the original function is executing. In fact, in the presence of long-running methods it is not sustainable for a VM to wait for the original function to complete and let only subsequent invocations run the optimized version. The problem of handling transitions between different compiled versions is formally known as {\em On-Stack Replacement} (OSR). Modern VMs implement OSR techniques to dynamically replace code with more optimized one, and also to invalidate aggressive, speculative optimizations and continue in a safe code version when an assumption made at compile time no longer holds.
\noindent Supporting OSR in a runtime system raises a number of fundamental questions. What is the set of program points at which OSR can safely occur, and how is it affected by compiler optimizations? Can we guarantee the soundness of an OSR transition? What is the impact of OSR machinery on running code?
%In this thesis, we address these questions from both a methodological and a practical perspective.
\section{Contributions of the Thesis}
The contributions of this thesis aim at covering both methodological and practical aspects of the adaptive optimization cycle, ranging from performance profiling to continuous program optimization through online code generation. Methodological contributions of this thesis include:
\begin{itemize}
\item an {\em interprocedural} analysis to identify the most frequently encountered calling context across function invocations, based on data streaming algorithms that enable a reduction of space usage by orders of magnitude without sacrificing accuracy or performance;
\item an {\em intraprocedural} analysis to identify cyclic paths taken across the control-flow graph of a function, overcoming the limitations of previous approaches and enabling the profiling of very long cyclic paths using efficient data structures;
%\item a new abstraction for on-stack replacement based on compensation code, to enable OSR at places that previous approaches do not support as they expect a new function to resume from the very same state of the original function;
\item a new abstraction for on-stack replacement based on compensation code, to enable OSR at places that previous approaches do not support as they expect a new function to resume execution from the very same program state;
\item a first step towards a provably sound methodological framework for OSR, identifying sufficient conditions to determine the set of points where OSR can safely occur and devising an algorithm to automatically generate compensation code in the presence of certain classes of compiler optimizations.
\end{itemize}
\noindent We evaluate the practicability of all our techniques through extensive experimental studies on both industrial-strength benchmark and real-world applications. We also present three case studies that explore the end-to-end utility of the presented ideas.
All our techniques have been implemented as libraries for mainstream systems, including the \gcc\ compiler, Jikes RVM, and the LLVM compiler infrastructure, and their source code is publicly available. To back our results and to empower other researchers to build upon the contributions of our work, we submitted two software artifacts that have been reviewed and endorsed by the scientific community.
%We believe that some techniques presented in this thesis might have some technology transfer potential in VM construction; private conversations we had with developers from major ICT players about our OSR library for LLVM have been quite encouraging in this sense.
We believe that some techniques presented in this thesis might have some technology transfer potential in VM construction; private conversations we had with developers from major ICT players about our OSR library for LLVM have been quite encouraging in this sense. We are aware that our tools are currently being used in a joint academic-industrial research project for the optimization of the runtime for the R language~\cite{vitek16}.
\section{Structure of the Thesis}
The rest of this thesis is structured as follows. \mychapter\ref{ch:literature} reviews the state of the art for adaptive program optimization technology. \mychapter\ref{ch:profiling} presents our intra- and inter-procedural techniques for performance profiling, while \mychapter\ref {ch:continuous} tackles the on-stack replacement problem to support better continuous program optimizations. \mychapter\ref{ch:experiments} illustrates the results of our experimental studies. \mychapter\ref{ch:case-studies} presents examples of applications of our techniques in program optimization and debugging. Conclusions and possible directions for future work are discussed in \mychapter\ref{ch:conclusions}.
\paragraph*{Declaration} This thesis is a presentation of original work of its author. The work was done under the guidance of Prof. Camil Demetrescu, in conjunction with Prof. Irene Finocchi in the early stage of the doctoral program, at Sapienza University of Rome. The ideas and the results presented in this work, with the exception of \Cref{se:osr-a-la-carte,se:eval-OSR-alC,se:CS-debug} which are yet unpublished, have appeared in conference proceedings and scientific journals~\cite{Delia11,Delia13,Delia15,Delia16}.
%This thesis tackles problems arising while analyzing the behavior of a running program, within and across the boundaries of a procedure,
%especially when the target form is directly executable on hardware. Interpreters on the other hand can
%Modern interpreters translate source code into an intermediate representation that is easier to work with, and can optionally perform optimizations based on the current workload.