Key points extraction | Understand JVM program compilation & code optimization

Key points extraction | Understand JVM program compilation & code optimization

This article will introduce the code optimization methods during program compilation, divided into two stages:

  • Overview
  • Early (compile time) optimization
  • Late (runtime) optimization

1. Overview

a. Due to different understandings of the compilation period of the Java language, several periods can be divided:

  • Front-end compiler
    • Role: Convert Java code into bytecode
    • Representative: Sun's Javac, Incremental Compiler (ECJ) in Eclipse JDT
    • The optimization during this period is mainly used to improve the coding efficiency of the program
  • Back-end runtime compiler/JIT compiler
    • Role: Convert bytecode to local machine code
    • Representative: C1, C2 compiler of HotSpot VM
    • The optimization during this period is mainly used to improve the operating efficiency of the program
  • Static Ahead Compiler/AOT Compiler
    • Function: directly compile Java code into local machine code
    • Representative: GNU Compiler for the Java (GCJ), Excelsior JET

b. Comparison of Java Just-in-Time Compiler and C/C++ Static Compiler

  • Just-in-time compilers need to take up the running time of the program, making optimization methods subject to compilation costs, otherwise the user will notice a significant delay in starting the program; the compilation time cost of static compilers is not the focus
  • All optimizations of static compilers are completed during compile time, while the dynamic nature of just-in-time compilers is a double-edged sword. On the one hand, it requires frequent dynamic checks of the virtual machine, which consumes a lot of running time, and it is difficult to optimize globally and can only be achieved by radical optimization. Completed, on the other hand, it has optimization measures for runtime performance monitoring, such as call frequency prediction, branch frequency prediction, and clipping of unselected branches, etc.
  • The frequency of using virtual methods in Java is much greater than that of C/C++, which means that the frequency of polymorphic selection of method receivers is greater at runtime, so it will be more difficult to perform certain optimizations
  • Java allocates memory for objects on the heap, while C/C++ can allocate on the heap and stack, reducing the pressure of memory recovery; and in C/C++, the memory is mainly recovered by user program code, and there is no screening of useless objects. It is more efficient than the garbage collection mechanism

2. Early (compile time) optimization

Almost all languages provide some syntactic sugar to facilitate development, or to improve efficiency, or to improve the rigor of syntax, or to reduce the chance of coding errors. Here are a few common syntactic sugars:

  • Generics and type erasure
    • C#'s generics are true generics : they exist in the program source code, compiled IL, or runtime CLR. List<int> and List<String> are generated during system runtime and have their own virtual Method table and type data belong to different types. This realization is called type expansion
    • Java's generics are pseudo-generics : they only exist in the source code of the program, and they have been replaced with native types in the compiled bytecode file, and the mandatory conversion code has been inserted in the corresponding place, so ArrayList<int> and ArrayList <String> is the same class, this implementation is called type erasure
  • Automatic packing and unpacking
  • Traversal loop
  • Conditional compilation : Use if statements with constant conditions

More Java Syntactic Sugar Series


3. Late (runtime) optimization

a. The HotSpot virtual machine adopts an architecture in which an interpreter and a compiler coexist , and the interaction situation:

  • When the program needs to be started and executed quickly, the interpreter can play a role first, thereby saving compilation time
  • After the program runs, as time goes by, the compiler gradually comes into play, compiling more code into local code, so as to obtain higher execution efficiency
  • If the program running environment is greatly restricted by memory resources, you can use interpreted execution to save memory, otherwise you can use compilation and execution to improve efficiency
  • The interpreter can be used as an escape door for the compiler's radical optimization . When the radical optimization is not established, such as the type inheritance structure changes after loading a new class, and a rare trap occurs, it can be returned to the interpreted state through reverse optimization to continue execution. As shown in the figure:

As can be seen from the above figure, there are two built-in just-in-time compilers in the HotSpot virtual machine: Client Compiler (C1 compiler and) and Server Compiler (C2 compiler), with mode:

  • Mixed mode (Mixed Mode): defaults interpreter and wherein a compiler cooperating virtual automatically select the operating mode and a compiler according to its hardware performance version of the host machine, the user can use -clientor -serverparameters to enforce the virtual machine is running In Client mode or Server mode.
  • Interpretation mode (Interpreted Mode): parameter -Xint, the compiler does not work, are performed using interpreted.
  • Compilation mode (Compiled Mode): parameter -Xcomp, give priority to the implementation of the compiler way, but the interpretation is still involved in the implementation process in the case of compilation can not be carried out.

b. HotSpot just-in-time compiler compilation target: hot code

  • classification:
    • Method that is called multiple times : JIT compilation method is adopted, and the entire method is used as the compilation object
    • Loop body that has been executed multiple times : using OSR compilation method, which occurs during the execution of the method, and the entire method is still used as the compilation object
  • Judgment method: through hot spot detection
    • Sampling-based hot probe (Sample Based Hot Spot Detection): periodic check of each thread stack, often appear in the top of the stack method is to hot
      • Benefits: simple, efficient, and easy to obtain method call relationships
      • Disadvantages: It is difficult to accurately confirm the popularity of a method, and it is susceptible to thread blocking or external influences to disrupt hot spot detection
    • Based on Hot Spot Detection counter establish the number of executions counter statistics for each method, the number of executions exceeds a certain threshold value is the hot method: (Counter Based Hot Spot Detection)
      • Advantages: precise and precise
      • Disadvantages: the implementation is troublesome, and the calling relationship of the method cannot be directly obtained
      • Counter type:
        • Method invocation counter (Invocation Counter): count the number of times a method is called, when the counter exceeds the threshold, it will trigger JIT compilation
        • Back side counter (Back Edge Counter): the number of times the loop body code execution statistical methods, when the counter exceeds the threshold triggers OSR compilation

c. The compilation process of HotSpot Just-in-Time Compiler

  • Client Compiler: Mainly perform local optimization and give up the time-consuming global optimization. Use simple and fast three-stage compilation:
    • The first stage: A platform-independent front-end constructs the bytecode into a high-level intermediate code representation (HIR). Before that, some basic optimizations are completed on the bytecode, such as method inlining and constant propagation
    • The second stage: A platform-related backend generates a low-level intermediate code representation (LIR) from HIR. Before that, other optimizations will be completed on HIR, such as null check elimination, range check elimination, etc., so that HIR can reach More efficient code representation
    • The third stage: the platform-related back-end uses a linear scan algorithm to allocate registers on the LIR, and do peephole optimization on the LIR, and then generate machine code. The general execution process is shown in the figure:

  • Server Compiler: It is specifically oriented to typical server-side applications and has been specifically adjusted for server-side performance configuration. It is a fully optimized high-level compiler, which is reflected in:
    • Will perform all classic optimization actions: such as useless code elimination, loop unrolling, loop expression extraction, elimination of common sub-expressions, constant propagation, basic block reordering, etc.
    • Will implement optimization techniques closely related to Java features: such as range check elimination, null check elimination, etc.
    • According to the performance monitoring information provided by the interpreter or Client Compiler, some unstable radical optimizations may be carried out: such as guarding inline, branch frequency prediction, etc.

In addition, Server Compiler's register allocator is a global graph shading allocator, which can make full use of the large register set on some processor architectures. Although the compilation time of Server Compiler is relatively slow, its compilation speed is much higher than that of traditional static optimization compilers, and the quality of the compiled output code is higher than that of Client Compiler, which can reduce the execution time of native code, thereby offsetting the additional compilation time Overhead.

d. Code optimization technology used by HotSpot virtual machine just-in-time compiler when generating code :

Among the most representative optimization techniques:

  • One of the classic language-independent optimization technology: common subexpression elimination (Common Subexpression Elimination)
    • Meaning: If an expression E has been calculated and all the variable values in E have not changed in any way, then E is called a common sub-expression. At this time, there is no need to spend time calculating again, and directly replace E with the result of the previously calculated expression Can
    • Types of:
      • Local common sub-expression elimination: optimization is limited to the basic block of the program
      • Global common sub-expression elimination: the scope of optimization covers multiple basic blocks
  • Language related to one of the classic optimization techniques: array bounds checking elimination (Array Bounds Checking Elimination)
    • If the array subscript is a constant, as long as the length of the array is determined according to data flow analysis during compile time, and it is judged that the array subscript is not out of bounds, then there is no need to check again at runtime
    • If the array access occurs in a loop and the loop variable is used for array access, as long as the value range of the loop variable is always in the interval [0, array length) according to the data flow analysis during compile time, then there is no need to do it in the entire loop Check multiple times
  • One of the most important optimization techniques: method inlining (Method Inlining)
    • Meaning: Copy the code of the target method
    • In the method that initiates the call, avoid the actual method call
    • The main purpose: to remove the cost of method invocation, such as the establishment of a stack frame, etc.; to establish a good foundation for other optimizations, so that subsequent optimization methods can be adopted on a larger scale to obtain better optimization results
  • One of the most cutting-edge optimization techniques: escape analysis (Escape Analysis)
    • Basic behavior: Analyze the dynamic scope of the object
    • Types of:
      • Method escape : After an object is defined in a method, it may be referenced by an external method. Such as passing as a call parameter to other methods
      • Thread escape : After an object is defined in a method, it can be accessed by an external thread. Such as assigning to class variables or instance variables that can be accessed in other threads
    • Optimization methods that can be performed on objects that can prove that they will not escape to methods or threads:
      • Allocated on the stack (Stack Allocation): memory allocation for the object on the stack, at which point the memory space occupied by the object from the stack with the stack frame would be destroyed, reducing the pressure of the refuse collection system
      • Synchronization elimination (Synchronization Elimination): on the object will not have reading and writing competition, can eliminate the object of synchronous measures to reduce the consumption of resources
      • Scalar replacement (Scalar Replacement): If the object can be further broken down, then create it directly to several of the methods to be used to replace the member variables