Technology21 minute read

Get Your Hands Dirty With Scala JVM Bytecode

The Scala language has continued to gain popularity over the last several years, thanks to its excellent combination of functional and object-oriented software development principles, and its implementation on top of the proven Java Virtual Machine (JVM).

Creating a language that compiles to Java bytecode requires a deep understanding of the inner workings of the Java Virtual Machine. To appreciate what Scala’s developers have accomplished, it is necessary to go under the hood, and explore how Scala’s source code is interpreted by the compiler to produce efficient and effective JVM bytecode.


Toptalauthors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.

The Scala language has continued to gain popularity over the last several years, thanks to its excellent combination of functional and object-oriented software development principles, and its implementation on top of the proven Java Virtual Machine (JVM).

Creating a language that compiles to Java bytecode requires a deep understanding of the inner workings of the Java Virtual Machine. To appreciate what Scala’s developers have accomplished, it is necessary to go under the hood, and explore how Scala’s source code is interpreted by the compiler to produce efficient and effective JVM bytecode.


Toptalauthors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.
Sergey Reznick
Verified Expert in Engineering

Sergey is an experienced software developer with a background in computer science and basic algorithms and over 20 years of experience.

Expertise

PREVIOUSLY AT

Oracle Corporation
Share

The Scala language has continued to gain popularity over the last several years, thanks to its excellent combination of functional and object-oriented software development principles, and its implementation on top of the proven Java Virtual Machine (JVM).

Although Scala compiles to Java bytecode, it is designed to improve on many of the perceived shortcomings of the Java language. Offering full functional programming support, Scala’s core syntax contains many implicit structures that have to be built explicitly by Java programmers, some involving considerable complexity.

Creating a language that compiles to Java bytecode requires a deep understanding of the inner workings of the Java Virtual Machine. To appreciate what Scala’s developers have accomplished, it is necessary to go under the hood, and explore how Scala’s source code is interpreted by the compiler to produce efficient and effective JVM bytecode.

Let’s take a look at how all this stuff is implemented.

Prerequisites

Reading this article requires some basic understanding of Java Virtual Machine bytecode. Complete virtual machine specification can be obtained from Oracle’s official documentation. Reading the whole spec is not critical for understanding this article, so, for a quick introduction to the basics, I’ve prepared a short guide at the bottom of the article.

A utility is needed to disassemble the Java bytecode to reproduce the examples provided below, and to proceed with further investigation. The Java Development Kit provides its own command line utility, javap, which we will use here. A quick demonstration of how javap works is included in the guide at the bottom.

And of course, a working install of the Scala compiler is necessary for readers who want to follow along with the examples. This article was written using Scala 2.11.7. Different versions of Scala may produce slightly different bytecode.

Default Getters and Setters

Although Java convention always provides getter and setter methods for public attributes, Java programmers are required to write these themselves, despite the fact that the pattern for each has not changed in decades. Scala, in contrast, provides default getters and setters.

Let’s look at the following example:

class Person(val name:String) {
}

Let’s take a look inside the class Person. If we compile this file with scalac, then running $ javap -p Person.class gives us:

Compiled from "Person.scala"
public class Person {
  private final java.lang.String name;   // field
  public java.lang.String name();        // getter method
  public Person(java.lang.String);       // constructor
}

We can see that for each field in the Scala class, a field and its getter method are generated. The field is private and final, while the method is public.

If we replace val with var in the Person source and recompile, then the field’s final modifier is dropped, and the setter method is added, as well:

Compiled from "Person.scala"
public class Person {
  private java.lang.String name;            // field
  public java.lang.String name();           // getter method
  public void name_$eq(java.lang.String);   // setter method
  public Person(java.lang.String);          // constructor
}

If any val or var is defined inside the class body, then the corresponding private field and accessor methods are created, and initialized appropriately upon instance creation.

Note that such an implementation of class level val and var fields means that if some variables are used at the class level to store intermediate values, and are never accessed directly by the programmer, initialization of each such field will add one to two methods to the class footprint. Adding a private modifier for such fields does not mean the corresponding accessors will be dropped. They will just become private.

Variable and Function Definitions

Let’s assume that we have a method, m(), and create three different Scala-style references to this function:

class Person(val name:String) {
    def m(): Int = {
      // ...
      return 0
    }

    val m1 = m
    var m2 = m
    def m3 = m
}

How are each of these references to m constructed? When does m get executed in each case? Let’s take a look at the resulting bytecode. The following output shows the results of javap -v Person.class (omitting a lot of superfluous output):

Constant pool:
  #22 = Fieldref           #2.#21         // Person.m1:I
  #24 = Fieldref           #2.#23         // Person.m2:I
  #30 = Methodref          #2.#29         // Person.m:()I
  #35 = Methodref          #4.#34         // java/lang/Object."<init>":()V

  // ...

  public int m();
    Code:
         // other methods refer to this method
         // ...

  public int m1();
    Code:
         // get the value of field m1 and return it
         0: aload_0
         1: getfield      #22                 // Field m1:I
         4: ireturn

  public int m2();
    Code:
         // get the value of field m2 and return it
         0: aload_0
         1: getfield      #24                 // Field m2:I
         4: ireturn

  public void m2_$eq(int);
    Code:
         // get the value of this method's input argument
         0: aload_0
         1: iload_1

         // write it to the field m2 and return
         2: putfield      #24                 // Field m2:I
         5: return

  public int m3();
    Code:
         // execute the instance method m(), and return
         0: aload_0
         1: invokevirtual #30                 // Method m:()I
         4: ireturn

  public Person(java.lang.String);
    Code:
        // instance constructor ...

        // execute the instance method m(), and write the result to field m1
         9: aload_0
        10: aload_0
        11: invokevirtual #30                 // Method m:()I
        14: putfield      #22                 // Field m1:I

        // execute the instance method m(), and write the result to field m2
        17: aload_0
        18: aload_0
        19: invokevirtual #30                 // Method m:()I
        22: putfield      #24                 // Field m2:I

        25: return   

In the constant pool, we see that the reference to method m() is stored at index #30. In the constructor code, we see that this method is invoked twice during initialization, with the instruction invokevirtual #30 appearing first at byte offset 11, then at offset 19. The first invocation is followed by the instruction putfield #22 which assigns the result of this method to the field m1, referenced by index #22 in the constant pool. The second invocation is followed by the same pattern, this time assigning the value to the field m2, indexed at #24 in the constant pool.

In other words, assigning a method to a variable defined with val or var only assigns the result of the method to that variable. We can see that the methods m1() and m2() that are created are simply getters for these variables. In the case of var m2, we also see that the setter m2_$eq(int) is created, which behaves just like any other setter, overwriting the value in the field.

However, using the keyword def gives a different result. Rather than fetching a field value to return, the method m3() also includes the instruction invokevirtual #30. That is, each time this method is called, it then calls m(), and returns the result of this method.

So, as we can see, Scala provides three ways to work with class fields, and these are easily specified via the keywords val, var, and def. In Java, we would have to implement the necessary setters and getters explicitly, and such manually written boilerplate code would be much less expressive and more error-prone.

Lazy Values

More complicated code is produced when declaring a lazy value. Assume we’ve added the following field to the previously defined class:

lazy val m4 = m

Running javap -p -v Person.class will now reveal the following:

Constant pool:
  #20 = Fieldref           #2.#19         // Person.bitmap$0:Z
  #23 = Methodref          #2.#22         // Person.m:()I
  #25 = Fieldref           #2.#24         // Person.m4:I
  #31 = Fieldref           #27.#30        // scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
  #48 = Methodref          #2.#47         // Person.m4$lzycompute:()I

  // ...

  private volatile boolean bitmap$0;

  private int m4$lzycompute();
    Code:
        // lock the thread
         0: aload_0
         1: dup
         2: astore_1
         3: monitorenter

        // check the flag for whether this field has already been set
         4: aload_0
         5: getfield      #20                 // Field bitmap$0:Z

        // if it has, skip to position 24 (unlock the thread and return)
         8: ifne          24

        // if it hasn't, execute the method m()
        11: aload_0
        12: aload_0
        13: invokevirtual #23                 // Method m:()I

        // write the method to the field m4
        16: putfield      #25                 // Field m4:I

        // set the flag indicating the field has been set
        19: aload_0
        20: iconst_1
        21: putfield      #20                 // Field bitmap$0:Z

        // unlock the thread
        24: getstatic     #31                 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
        27: pop
        28: aload_1
        29: monitorexit

        // get the value of field m4 and return it
        30: aload_0
        31: getfield      #25                 // Field m4:I
        34: ireturn

        // ...

  public int m4();
    Code:
        // check the flag for whether this field has already been set
         0: aload_0
         1: getfield      #20                 // Field bitmap$0:Z

        // if it hasn't, skip to position 14 (invoke lazy method and return)
         4: ifeq          14

        // if it has, get the value of field m4, then skip to position 18 (return)
         7: aload_0
         8: getfield      #25                 // Field m4:I
        11: goto          18

        // execute the method m4$lzycompute() to set the field
        14: aload_0
        15: invokespecial #48                 // Method m4$lzycompute:()I

        // return
        18: ireturn

In this case, the value of the field m4 is not calculated until it is needed. The special, private method m4$lzycompute() is produced to calculate the lazy value, and the field bitmap$0 to track its state. Method m4() checks if this field’s value is 0, indicating that m4 has not been initialized yet, in which case m4$lzycompute() is invoked, populating m4 and returning its value. This private method also sets the value of bitmap$0 to 1, so that the next time m4() is called it will skip invoking the initialization method, and instead simply return the value of m4.

The results of the first call to a Scala lazy value.

The bytecode Scala produces here is designed to be both thread safe and effective. To be thread safe, the lazy compute method uses the monitorenter/monitorexit pair of instructions. The method remains effective since the performance overhead of this synchronization only occurs on the first read of the lazy value.

Only one bit is needed to indicate the state of the lazy value. So if there are no more than 32 lazy values, a single int field can track them all. If more than one lazy value is defined in the source code, the above bytecode will be modified by the compiler to implement a bitmask for this purpose.

Again, Scala allows us to easily take advantage of a specific kind of behavior that would have to be implemented explicitly in Java, saving effort and reducing the risk of typos.

Function as Value

Now let’s take a look at the following Scala source code:

class Printer(val output: String => Unit) {
}

object Hello {
    def main(arg: Array[String]) {
        val printer = new Printer( s => println(s) );
        printer.output("Hello");
    }
}

The Printer class has one field, output, with the type String => Unit: a function that takes a String and returns an object of type Unit (similar to void in Java). In the main method, we create one of these objects, and assign this field to be an anonymous function that prints a given string.

Compiling this code generates four class files:

The source code is compiled into four class files.

Hello.class is a wrapper class whose main method simply calls Hello$.main():

public final class Hello

  // ...

  public static void main(java.lang.String[]);
    Code:
         0: getstatic     #16                 // Field Hello$.MODULE$:LHello$;
         3: aload_0
         4: invokevirtual #18                 // Method Hello$.main:([Ljava/lang/String;)V
         7: return

The hidden Hello$.class contains the real implementation of the main method. To take a look at its bytecode, make sure that you correctly escape $ according to the rules of your command shell, to avoid its interpretation as special character:

public final class Hello$

// ...

  public void main(java.lang.String[]);
    Code:
         // initialize Printer and anonymous function
         0: new           #16                 // class Printer
         3: dup
         4: new           #18                 // class Hello$$anonfun$1
         7: dup
         8: invokespecial #19                 // Method Hello$$anonfun$1."<init>":()V
        11: invokespecial #22                 // Method Printer."<init>":(Lscala/Function1;)V
        14: astore_2

        // load the anonymous function onto the stack
        15: aload_2
        16: invokevirtual #26                 // Method Printer.output:()Lscala/Function1;

        // execute the anonymous function, passing the string "Hello"
        19: ldc           #28                 // String Hello
        21: invokeinterface #34,  2           // InterfaceMethod scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;

        // return
        26: pop
        27: return

The method creates a Printer. It then creates a Hello$$anonfun$1, which contains our anonymous function s => println(s). The Printer is initialized with this object as the output field. This field is then loaded onto the stack, and executed with the operand "Hello".

Let’s take a look at the anonymous function class, Hello$$anonfun$1.class, below. We can see that it extends Scala’s Function1 (as AbstractFunction1) by implementing the apply() method. Actually, it creates two apply() methods, one wrapping the other, which together perform type checking (in this case, that the input is a String), and execute the anonymous function (printing the input with println()).

public final class Hello$$anonfun$1 extends scala.runtime.AbstractFunction1<java.lang.String, scala.runtime.BoxedUnit> implements scala.Serializable

  // ...

  // Takes an argument of type String. Invoked second.
  public final void apply(java.lang.String);
    Code:
        // execute Scala's built-in method println(), passing the input argument
         0: getstatic     #25                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
         3: aload_1
         4: invokevirtual #29                 // Method scala/Predef$.println:(Ljava/lang/Object;)V

         7: return

  // Takes an argument of type Object. Invoked first.
  public final java.lang.Object apply(java.lang.Object);
    Code:
         0: aload_0

        // check that the input argument is a String (throws exception if not)
         1: aload_1
         2: checkcast     #36                 // class java/lang/String

        // invoke the method apply( String ), passing the input argument
         5: invokevirtual #38                 // Method apply:(Ljava/lang/String;)V

        // return the void type
         8: getstatic     #44                 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
        11: areturn

Looking back at the Hello$.main() method above, we can see that, at offset 21, execution of the anonymous function is triggered by a call to its apply( Object ) method.

Finally, for completeness, let’s look at the bytecode for Printer.class:

public class Printer

  // ...

  // field
  private final scala.Function1<java.lang.String, scala.runtime.BoxedUnit> output;

  // field getter
  public scala.Function1<java.lang.String, scala.runtime.BoxedUnit> output();
    Code:
         0: aload_0
         1: getfield      #14                 // Field output:Lscala/Function1;
         4: areturn

  // constructor
  public Printer(scala.Function1<java.lang.String, scala.runtime.BoxedUnit>);
    Code:
         0: aload_0
         1: aload_1
         2: putfield      #14                 // Field output:Lscala/Function1;
         5: aload_0
         6: invokespecial #21                 // Method java/lang/Object."<init>":()V
         9: return

We can see that the anonymous function here is treated just like any val variable. It is stored in the class field output, and the getter output() is created. The only difference is that this variable must now implement the Scala interface scala.Function1 (which AbstractFunction1 does).

So, the cost of this elegant Scala feature is the underlying utility classes, created to represent and execute a single anonymous function that can be used as a value. You should take into account the number of such functions, as well as details of your VM implementation, to figure out what it means for your particular application.

Going under the hood with Scala: Explore how this powerful language is implemented in JVM bytecode.

Scala Traits

Scala’s traits are similar to interfaces in Java. The following trait defines two method signatures, and provides a default implementation of the second one. Let’s see how it is implemented:

trait Similarity {
  def isSimilar(x: Any): Boolean
  def isNotSimilar(x: Any): Boolean = !isSimilar(x)
}

The source code is compiled into two class files.

Two entities are produced: Similarity.class, the interface declaring both methods, and the synthetic class, Similarity$class.class, providing the default implementation:

public interface Similarity {
  public abstract boolean isSimilar(java.lang.Object);
  public abstract boolean isNotSimilar(java.lang.Object);
}
public abstract class Similarity$class

  public static boolean isNotSimilar(Similarity, java.lang.Object);
    Code:
         0: aload_0

        // execute the instance method isSimilar()
         1: aload_1
         2: invokeinterface #13,  2           // InterfaceMethod Similarity.isSimilar:(Ljava/lang/Object;)Z

        // if the returned value is 0, skip to position 14 (return with value 1)
         7: ifeq          14

        // otherwise, return with value 0
        10: iconst_0
        11: goto          15

        // return the value 1
        14: iconst_1
        15: ireturn

  public static void $init$(Similarity);
    Code:
         0: return

When a class implements this trait and calls the method isNotSimilar, the Scala compiler generates the bytecode instruction invokestatic to call the static method provided by the accompanying class.

Complex polymorphism and inheritance structures may be created from traits. For example, multiple traits, as well as the implementing class, may all override a method with the same signature, calling super.methodName() to pass control to the next trait. When the Scala compiler encounters such calls, it:

  • Determines what exact trait is assumed by this call.
  • Determines the name of the accompanying class that provides static method bytecode defined for the trait.
  • Produces the necessary invokestatic instruction.

Thus we can see that the powerful concept of traits is implemented at the JVM level in way that does not lead to significant overhead, and Scala programmers may enjoy this feature without worrying that it will be too expensive at runtime.

Singletons

Scala provides for the explicit definition of singleton classes using the keyword object. Let’s consider the following singleton class:

object Config {
   val home_dir = "/home/user"
}

The compiler produces two class files:

The source code is compiled into two class files.

Config.class is a pretty simple one:

public final class Config

  public static java.lang.String home_dir();
    Code:
      // execute the method Config$.home_dir()
       0: getstatic     #16                 // Field Config$.MODULE$:LConfig$;
       3: invokevirtual #18                 // Method Config$.home_dir:()Ljava/lang/String;
       6: areturn

This is just a decorator for the synthetic Config$ class that embeds the singleton’s functionality. Examining that class with javap -p -c produces the following bytecode:

public final class Config$

  public static final Config$ MODULE$;        // a public reference to the singleton object

  private final java.lang.String home_dir;

  // static initializer
  public static {};
    Code:
         0: new           #2                  // class Config$
         3: invokespecial #12                 // Method "<init>":()V
         6: return

  public java.lang.String home_dir();
    Code:
        // get the value of field home_dir and return it
         0: aload_0
         1: getfield      #17                 // Field home_dir:Ljava/lang/String;
         4: areturn

  private Config$();
    Code:
        // initialize the object
         0: aload_0
         1: invokespecial #19                 // Method java/lang/Object."<init>":()V

        // expose a public reference to this object in the synthetic variable MODULE$
         4: aload_0
         5: putstatic     #21                 // Field MODULE$:LConfig$;

        // load the value "/home/user" and write it to the field home_dir
         8: aload_0
         9: ldc           #23                 // String /home/user
        11: putfield      #17                 // Field home_dir:Ljava/lang/String;

        14: return

It consists of following:

  • The synthetic variable MODULE$, through which other objects access this singleton object.
  • The static initializer {} (also known as <clinit>, the class initializer) and the private method Config$, used to initialize MODULE$ and set its fields to default values
  • A getter method for the static field home_dir. In this case, it is just one method. If the singleton has more fields, it will have more getters, as well as setters for mutable fields.

The singleton is a popular and useful design pattern. The Java language does not provide a direct way to specify it at the language level; rather, it is the responsibility of the developer to implement it in Java source. Scala, on the other hand, provides a clear and convenient way to declare a singleton explicitly using the object keyword. As we can see looking under the hood, it is implemented in an affordable and natural way.

Conclusion

We’ve now seen how Scala compiles several implicit and functional programming features into sophisticated Java bytecode structures. With this glimpse into the inner workings of Scala, we can gain a deeper appreciation of Scala’s power, helping us to get the most of this powerful language.

We also now have the tools to explore the language ourselves. There are many useful features of the Scala syntax that are not covered in this article, such as case classes, currying, and list comprehensions. I encourage you to investigate Scala’s implementation of these structures yourself, so you can learn how to be a next-level Scala ninja!


The Java Virtual Machine: A Crash Course

Just like the Java compiler, the Scala compiler converts source code into .class files, containing Java bytecode to be executed by the Java Virtual Machine. In order to understand how the two languages differ under the hood, it is necessary to understand the system they are both targeting. Here, we present a brief overview of some major elements of the Java Virtual Machine architecture, class file structure, and assembler basics.

Note that this guide will only cover the minimum to enable following along with the above article. Although many major components of the JVM are not discussed here, complete details can be found in the official docs, here.

Decompiling Class Files with javap
Constant Pool
Field and Method Tables
JVM Bytecode
Method Calls and the Call Stack
Execution on the Operand Stack
Local Variables
Return to Top

Decompiling Class Files with javap

Java ships with the javap command line utility, which decompiles .class files into a human-readable form. Since Scala and Java class files both target the same JVM, javap can be used to examine class files compiled by Scala.

Let’s compile the following source code:

// RegularPolygon.scala
class RegularPolygon( val numSides: Int ) {

  def getPerimeter( sideLength: Double ): Double = {
    println( "Calculating perimeter..." )
    return sideLength * this.numSides
  }
}

Compiling this with scalac RegularPolygon.scala will produce RegularPolygon.class. If we then run javap RegularPolygon.class we will see the following:

$ javap RegularPolygon.class
Compiled from "RegularPolygon.scala"
public class RegularPolygon {
  public int numSides();
  public double getPerimeter(double);
  public RegularPolygon(int);
}

This is a very simple breakdown of the class file that simply shows the names and types of the public members of the class. Adding the -p option will include private members:

$ javap -p RegularPolygon.class
Compiled from "RegularPolygon.scala"
public class RegularPolygon {
  private final int numSides;
  public int numSides();
  public double getPerimeter(double);
  public RegularPolygon(int);
}

This is still not a lot of information. To see how the methods are implemented in Java bytecode, let’s add the -c option:

$ javap -p -c RegularPolygon.class
Compiled from "RegularPolygon.scala"
public class RegularPolygon {
  private final int numSides;

  public int numSides();
    Code:
       0: aload_0
       1: getfield      #13                 // Field numSides:I
       4: ireturn

  public double getPerimeter(double);
    Code:
       0: getstatic     #23                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
       3: ldc           #25                 // String Calculating perimeter...
       5: invokevirtual #29                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
       8: dload_1
       9: aload_0
      10: invokevirtual #31                 // Method numSides:()I
      13: i2d
      14: dmul
      15: dreturn

  public RegularPolygon(int);
    Code:
       0: aload_0
       1: iload_1
       2: putfield      #13                 // Field numSides:I
       5: aload_0
       6: invokespecial #38                 // Method java/lang/Object."<init>":()V
       9: return
}

That’s a little more interesting. However, to really get the whole story, we should use the -v or -verbose option, as in javap -p -v RegularPolygon.class:

The complete contents of a Java class file.

Here we finally see what’s really in the class file. What does all this mean? Let’s take a look at some of the most important parts.

Constant Pool

The development cycle for C++ applications includes compilation and linkage stages. The development cycle for Java skips an explicit linkage stage because linkage happens at runtime. The class file must support this runtime linking. This means that when the source code refers to any field or method, the resulting bytecode must keep relevant references in symbolic form, ready to be dereferenced once the application has loaded into memory and actual addresses can be resolved by the runtime linker. This symbolic form must contain:

  • class name
  • field or method name
  • type information

The class file format specification includes a section of the file called the constant pool, a table of all the references needed by the linker. It contains entries of different types.

// ...
Constant pool:
   #1 = Utf8               RegularPolygon
   #2 = Class              #1             // RegularPolygon
   #3 = Utf8               java/lang/Object
   #4 = Class              #3             // java/lang/Object
   // ...

The first byte of each entry is a numeric tag indicating the type of entry. The remaining bytes provide information about the value of the entry. The number of bytes and rules for their interpretation depends on the type indicated by the first byte.

For example, a Java class that uses a constant integer 365 may have a constant pool entry with the following bytecode:

x03 00 00 01 6D

The first byte, x03, identifies the entry type, CONSTANT_Integer. This informs the linker that the next four bytes contain the value of the integer. (Note that 365 in hexadecimal is x16D). If this is the 14th entry in the constant pool, javap -v will render it like this:

#14 = Integer            365

Many constant types are composed of references to more “primitive” constant types elsewhere in the constant pool. For instance, our example code contains the statement:

println( "Calculating perimeter..." )

Usage of a string constant will produce two entries in the constant pool: one entry with type CONSTANT_String , and another entry of type CONSTANT_Utf8. The entry of type Constant_UTF8 contains the actual UTF8 representation of the string value. The entry of type CONSTANT_String contains a reference to the CONSTANT_Utf8 entry:

#24 = Utf8               Calculating perimeter...
#25 = String             #24            // Calculating perimeter...

Such complication is necessary because there are other types of constant pool entries that refer to entries of type Utf8 and that are not entries of type String. For example, any reference to a class attribute will produce a CONSTANT_Fieldref type, which contains a series of references to the class name, attribute name, and attribute type:

 #1 = Utf8               RegularPolygon
 #2 = Class              #1             // RegularPolygon
 #9 = Utf8               numSides
#10 = Utf8               I
#12 = NameAndType        #9:#10         // numSides:I
#13 = Fieldref           #2.#12         // RegularPolygon.numSides:I

For more details about the constant pool, see the JVM documentation.

Field and Method Tables

A class file contains a field table that contains information about each field (i.e., attribute) defined in the class. These are references to constant pool entries that describe the field’s name and type as well as access control flags and other relevant data.

A similar method table is present in the class file. However, in addition to name and type information, for each non-abstract method, it contains the actual bytecode instructions to be executed by the JVM, as well as data structures used by the method’s stack frame, described below.

JVM Bytecode

The JVM uses its own internal instruction set to execute compiled code. Running javap with the -c option includes the compiled method implementations in the output. If we examine our RegularPolygon.class file this way, we will see the following output for our getPerimeter() method:

public double getPerimeter(double);
  Code:
     0: getstatic     #23                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
     3: ldc           #25                 // String Calculating perimeter...
     5: invokevirtual #29                 // Method scala/Predef$.println:(Ljava/lang/Object;)V
     8: dload_1
     9: aload_0
    10: invokevirtual #31                 // Method numSides:()I
    13: i2d
    14: dmul
    15: dreturn

The actual bytecode might look something like this:

xB2 00 17
x12 19
xB6 00 1D
x27
...

Each instruction starts with a one-byte opcode identifying the JVM instruction, followed by zero or more instruction operands to be operated on, depending on the format of the specific instruction. These are typically either constant values, or references into the constant pool. javap helpfully translates the bytecode into a human-readable form displaying:

  • The offset, or position of the first byte of the instruction within the code.
  • The human-readable name, or mnemonic, of the instruction.
  • The value of the operand, if any.

Operands that are displayed with a pound sign, such as #23, are references to entries in the constant pool. As we can see, javap also produces helpful comments in the output, identifying what exactly is being referenced from the pool.

We’ll discuss a few of the common instructions below. For detailed information about the complete JVM instruction set, see the documentation.

Method Calls and the Call Stack

Each method call must be able to run with its own context, which includes things such as locally-declared variables, or arguments that were passed to the method. Together, these make up a stack frame. Upon invocation of a method, a new frame is created and placed on top of the call stack. When the method returns, the current frame is removed from the call stack and discarded, and the frame that was in effect before the method was called is restored.

A stack frame includes a few distinct structures. Two important ones are the operand stack and the local variable table, discussed next.

The JVM call stack.

Execution on the Operand Stack

Many JVM instructions operate on their frame’s operand stack. Rather than specifying a constant operand explicitly in the bytecode, these instructions instead take the values on the top of the operand stack as input. Typically, these values are removed from the stack in the process. Some instructions also place new values on top of the stack. In this way, JVM instructions can be combined to perform complex operations. For example, the expression:

sideLength * this.numSides

is compiled to the following in our getPerimeter() method:

 8: dload_1
 9: aload_0
10: invokevirtual #31                 // Method numSides:()I
13: i2d
14: dmul

JVM instructions can operate on the operand stack to perform complex functions.

  • The first instruction, dload_1, pushes the object reference from slot 1 of the local variable table (discussed next) onto the operand stack. In this case, this is the method argument sideLength.- The next instruction, aload_0, pushes the object reference at slot 0 of the local variable table onto the operand stack. In practice, this is almost always the reference to this, the current class.
  • This sets up the stack for the next call, invokevirtual #31, which executes the instance method numSides(). invokevirtual pops the top operand (the reference to this) off the stack to identify from what class it must call the method. Once the method returns, its result is pushed onto the stack.
  • In this case, the value returned (numSides) is in integer format. It must be converted to a double floating point format in order to multiply it with another double value. The instruction i2d pops the integer value off the stack, converts it to floating point format, and pushes it back onto the stack.
  • At this point, the stack contains the floating point result of this.numSides on top, followed by the value of the sideLength argument that was passed to the method. dmul pops these top two values from the stack, performs floating point multiplication on them, and pushes the result onto the stack.

When a method is called, a new operand stack is created as part of its stack frame, where operations will be performed. We must be careful with terminology here: the word “stack” may refer to the call stack, the stack of frames providing context for method execution, or to a particular frame’s operand stack, upon which JVM instructions operate.

Local Variables

Each stack frame keeps a table of local variables. This typically includes a reference to this object, any arguments that were passed when the method was called, and any local variables declared within the method body. Running javap with the -v option will include information about how each method’s stack frame should be set up, including its local variable table:

public double getPerimeter(double);

// ...

Code:
     0: getstatic     #23                 // Field scala/Predef$.MODULE$:Lscala/Predef$;
     3: ldc           #25                 // String Calculating perimeter...

     // ...

  LocalVariableTable:
    Start  Length  Slot  Name   Signature
        0      16     0  this   LRegularPolygon;
        0      16     1 sideLength   D

In this example, there are two local variables. The variable in slot 0 is named this, with the type RegularPolygon. This is the reference to the method’s own class. The variable in slot 1 is named sideLength, with the type D (indicating a double). This is the argument that is passed to our getPerimeter() method.

Instructions such as iload_1, fstore_2, or aload [n], transfer different types of local variables between the operand stack and the local variable table. Since the first item in the table is usually the reference to this, the instruction aload_0 is commonly seen in any method that operates on its own class.

This concludes our walkthrough of JVM basics.

Hire a Toptal expert on this topic.
Hire Now
Sergey Reznick

Sergey Reznick

Verified Expert in Engineering

Saint Petersburg, Russia

Member since October 21, 2015

About the author

Sergey is an experienced software developer with a background in computer science and basic algorithms and over 20 years of experience.

authors are vetted experts in their fields and write on topics in which they have demonstrated experience. All of our content is peer reviewed and validated by Toptal experts in the same field.

Expertise

PREVIOUSLY AT

Oracle Corporation

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

Join the Toptal® community.