Get Your Hands Dirty With Scala JVM Bytecode
The Scala language has continued to gain popularity over the last several years, thanks to its excellent combination of functional and object-oriented software development principles, and its implementation on top of the proven Java Virtual Machine (JVM).
Creating a language that compiles to Java bytecode requires a deep understanding of the inner workings of the Java Virtual Machine. To appreciate what Scala’s developers have accomplished, it is necessary to go under the hood, and explore how Scala’s source code is interpreted by the compiler to produce efficient and effective JVM bytecode.
The Scala language has continued to gain popularity over the last several years, thanks to its excellent combination of functional and object-oriented software development principles, and its implementation on top of the proven Java Virtual Machine (JVM).
Creating a language that compiles to Java bytecode requires a deep understanding of the inner workings of the Java Virtual Machine. To appreciate what Scala’s developers have accomplished, it is necessary to go under the hood, and explore how Scala’s source code is interpreted by the compiler to produce efficient and effective JVM bytecode.
Sergey is an experienced software developer with a background in computer science and basic algorithms and over 20 years of experience.
Previously At
The Scala language has continued to gain popularity over the last several years, thanks to its excellent combination of functional and object-oriented software development principles, and its implementation on top of the proven Java Virtual Machine (JVM).
Although Scala compiles to Java bytecode, it is designed to improve on many of the perceived shortcomings of the Java language. Offering full functional programming support, Scala’s core syntax contains many implicit structures that have to be built explicitly by Java programmers, some involving considerable complexity.
Creating a language that compiles to Java bytecode requires a deep understanding of the inner workings of the Java Virtual Machine. To appreciate what Scala’s developers have accomplished, it is necessary to go under the hood, and explore how Scala’s source code is interpreted by the compiler to produce efficient and effective JVM bytecode.
Let’s take a look at how all this stuff is implemented.
Prerequisites
Reading this article requires some basic understanding of Java Virtual Machine bytecode. Complete virtual machine specification can be obtained from Oracle’s official documentation. Reading the whole spec is not critical for understanding this article, so, for a quick introduction to the basics, I’ve prepared a short guide at the bottom of the article.
A utility is needed to disassemble the Java bytecode to reproduce the examples provided below, and to proceed with further investigation. The Java Development Kit provides its own command line utility, javap
, which we will use here. A quick demonstration of how javap
works is included in the guide at the bottom.
And of course, a working install of the Scala compiler is necessary for readers who want to follow along with the examples. This article was written using Scala 2.11.7. Different versions of Scala may produce slightly different bytecode.
Default Getters and Setters
Although Java convention always provides getter and setter methods for public attributes, Java programmers are required to write these themselves, despite the fact that the pattern for each has not changed in decades. Scala, in contrast, provides default getters and setters.
Let’s look at the following example:
class Person(val name:String) {
}
Let’s take a look inside the class Person
. If we compile this file with scalac
, then running $ javap -p Person.class
gives us:
Compiled from "Person.scala"
public class Person {
private final java.lang.String name; // field
public java.lang.String name(); // getter method
public Person(java.lang.String); // constructor
}
We can see that for each field in the Scala class, a field and its getter method are generated. The field is private and final, while the method is public.
If we replace val
with var
in the Person
source and recompile, then the field’s final
modifier is dropped, and the setter method is added, as well:
Compiled from "Person.scala"
public class Person {
private java.lang.String name; // field
public java.lang.String name(); // getter method
public void name_$eq(java.lang.String); // setter method
public Person(java.lang.String); // constructor
}
If any val
or var
is defined inside the class body, then the corresponding private field and accessor methods are created, and initialized appropriately upon instance creation.
Note that such an implementation of class level val
and var
fields means that if some variables are used at the class level to store intermediate values, and are never accessed directly by the programmer, initialization of each such field will add one to two methods to the class footprint. Adding a private
modifier for such fields does not mean the corresponding accessors will be dropped. They will just become private.
Variable and Function Definitions
Let’s assume that we have a method, m()
, and create three different Scala-style references to this function:
class Person(val name:String) {
def m(): Int = {
// ...
return 0
}
val m1 = m
var m2 = m
def m3 = m
}
How are each of these references to m
constructed? When does m
get executed in each case? Let’s take a look at the resulting bytecode. The following output shows the results of javap -v Person.class
(omitting a lot of superfluous output):
Constant pool:
#22 = Fieldref #2.#21 // Person.m1:I
#24 = Fieldref #2.#23 // Person.m2:I
#30 = Methodref #2.#29 // Person.m:()I
#35 = Methodref #4.#34 // java/lang/Object."<init>":()V
// ...
public int m();
Code:
// other methods refer to this method
// ...
public int m1();
Code:
// get the value of field m1 and return it
0: aload_0
1: getfield #22 // Field m1:I
4: ireturn
public int m2();
Code:
// get the value of field m2 and return it
0: aload_0
1: getfield #24 // Field m2:I
4: ireturn
public void m2_$eq(int);
Code:
// get the value of this method's input argument
0: aload_0
1: iload_1
// write it to the field m2 and return
2: putfield #24 // Field m2:I
5: return
public int m3();
Code:
// execute the instance method m(), and return
0: aload_0
1: invokevirtual #30 // Method m:()I
4: ireturn
public Person(java.lang.String);
Code:
// instance constructor ...
// execute the instance method m(), and write the result to field m1
9: aload_0
10: aload_0
11: invokevirtual #30 // Method m:()I
14: putfield #22 // Field m1:I
// execute the instance method m(), and write the result to field m2
17: aload_0
18: aload_0
19: invokevirtual #30 // Method m:()I
22: putfield #24 // Field m2:I
25: return
In the constant pool, we see that the reference to method m()
is stored at index #30
. In the constructor code, we see that this method is invoked twice during initialization, with the instruction invokevirtual #30
appearing first at byte offset 11, then at offset 19. The first invocation is followed by the instruction putfield #22
which assigns the result of this method to the field m1
, referenced by index #22
in the constant pool. The second invocation is followed by the same pattern, this time assigning the value to the field m2
, indexed at #24
in the constant pool.
In other words, assigning a method to a variable defined with val
or var
only assigns the result of the method to that variable. We can see that the methods m1()
and m2()
that are created are simply getters for these variables. In the case of var m2
, we also see that the setter m2_$eq(int)
is created, which behaves just like any other setter, overwriting the value in the field.
However, using the keyword def
gives a different result. Rather than fetching a field value to return, the method m3()
also includes the instruction invokevirtual #30
. That is, each time this method is called, it then calls m()
, and returns the result of this method.
So, as we can see, Scala provides three ways to work with class fields, and these are easily specified via the keywords val
, var
, and def
. In Java, we would have to implement the necessary setters and getters explicitly, and such manually written boilerplate code would be much less expressive and more error-prone.
Lazy Values
More complicated code is produced when declaring a lazy value. Assume we’ve added the following field to the previously defined class:
lazy val m4 = m
Running javap -p -v Person.class
will now reveal the following:
Constant pool:
#20 = Fieldref #2.#19 // Person.bitmap$0:Z
#23 = Methodref #2.#22 // Person.m:()I
#25 = Fieldref #2.#24 // Person.m4:I
#31 = Fieldref #27.#30 // scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
#48 = Methodref #2.#47 // Person.m4$lzycompute:()I
// ...
private volatile boolean bitmap$0;
private int m4$lzycompute();
Code:
// lock the thread
0: aload_0
1: dup
2: astore_1
3: monitorenter
// check the flag for whether this field has already been set
4: aload_0
5: getfield #20 // Field bitmap$0:Z
// if it has, skip to position 24 (unlock the thread and return)
8: ifne 24
// if it hasn't, execute the method m()
11: aload_0
12: aload_0
13: invokevirtual #23 // Method m:()I
// write the method to the field m4
16: putfield #25 // Field m4:I
// set the flag indicating the field has been set
19: aload_0
20: iconst_1
21: putfield #20 // Field bitmap$0:Z
// unlock the thread
24: getstatic #31 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
27: pop
28: aload_1
29: monitorexit
// get the value of field m4 and return it
30: aload_0
31: getfield #25 // Field m4:I
34: ireturn
// ...
public int m4();
Code:
// check the flag for whether this field has already been set
0: aload_0
1: getfield #20 // Field bitmap$0:Z
// if it hasn't, skip to position 14 (invoke lazy method and return)
4: ifeq 14
// if it has, get the value of field m4, then skip to position 18 (return)
7: aload_0
8: getfield #25 // Field m4:I
11: goto 18
// execute the method m4$lzycompute() to set the field
14: aload_0
15: invokespecial #48 // Method m4$lzycompute:()I
// return
18: ireturn
In this case, the value of the field m4
is not calculated until it is needed. The special, private method m4$lzycompute()
is produced to calculate the lazy value, and the field bitmap$0
to track its state. Method m4()
checks if this field’s value is 0, indicating that m4
has not been initialized yet, in which case m4$lzycompute()
is invoked, populating m4
and returning its value. This private method also sets the value of bitmap$0
to 1, so that the next time m4()
is called it will skip invoking the initialization method, and instead simply return the value of m4
.
The bytecode Scala produces here is designed to be both thread safe and effective. To be thread safe, the lazy compute method uses the monitorenter
/monitorexit
pair of instructions. The method remains effective since the performance overhead of this synchronization only occurs on the first read of the lazy value.
Only one bit is needed to indicate the state of the lazy value. So if there are no more than 32 lazy values, a single int field can track them all. If more than one lazy value is defined in the source code, the above bytecode will be modified by the compiler to implement a bitmask for this purpose.
Again, Scala allows us to easily take advantage of a specific kind of behavior that would have to be implemented explicitly in Java, saving effort and reducing the risk of typos.
Function as Value
Now let’s take a look at the following Scala source code:
class Printer(val output: String => Unit) {
}
object Hello {
def main(arg: Array[String]) {
val printer = new Printer( s => println(s) );
printer.output("Hello");
}
}
The Printer
class has one field, output
, with the type String => Unit
: a function that takes a String
and returns an object of type Unit
(similar to void
in Java). In the main method, we create one of these objects, and assign this field to be an anonymous function that prints a given string.
Compiling this code generates four class files:
Hello.class
is a wrapper class whose main method simply calls Hello$.main()
:
public final class Hello
// ...
public static void main(java.lang.String[]);
Code:
0: getstatic #16 // Field Hello$.MODULE$:LHello$;
3: aload_0
4: invokevirtual #18 // Method Hello$.main:([Ljava/lang/String;)V
7: return
The hidden Hello$.class
contains the real implementation of the main method. To take a look at its bytecode, make sure that you correctly escape $
according to the rules of your command shell, to avoid its interpretation as special character:
public final class Hello$
// ...
public void main(java.lang.String[]);
Code:
// initialize Printer and anonymous function
0: new #16 // class Printer
3: dup
4: new #18 // class Hello$$anonfun$1
7: dup
8: invokespecial #19 // Method Hello$$anonfun$1."<init>":()V
11: invokespecial #22 // Method Printer."<init>":(Lscala/Function1;)V
14: astore_2
// load the anonymous function onto the stack
15: aload_2
16: invokevirtual #26 // Method Printer.output:()Lscala/Function1;
// execute the anonymous function, passing the string "Hello"
19: ldc #28 // String Hello
21: invokeinterface #34, 2 // InterfaceMethod scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;
// return
26: pop
27: return
The method creates a Printer
. It then creates a Hello$$anonfun$1
, which contains our anonymous function s => println(s)
. The Printer
is initialized with this object as the output
field. This field is then loaded onto the stack, and executed with the operand "Hello"
.
Let’s take a look at the anonymous function class, Hello$$anonfun$1.class
, below. We can see that it extends Scala’s Function1
(as AbstractFunction1
) by implementing the apply()
method. Actually, it creates two apply()
methods, one wrapping the other, which together perform type checking (in this case, that the input is a String
), and execute the anonymous function (printing the input with println()
).
public final class Hello$$anonfun$1 extends scala.runtime.AbstractFunction1<java.lang.String, scala.runtime.BoxedUnit> implements scala.Serializable
// ...
// Takes an argument of type String. Invoked second.
public final void apply(java.lang.String);
Code:
// execute Scala's built-in method println(), passing the input argument
0: getstatic #25 // Field scala/Predef$.MODULE$:Lscala/Predef$;
3: aload_1
4: invokevirtual #29 // Method scala/Predef$.println:(Ljava/lang/Object;)V
7: return
// Takes an argument of type Object. Invoked first.
public final java.lang.Object apply(java.lang.Object);
Code:
0: aload_0
// check that the input argument is a String (throws exception if not)
1: aload_1
2: checkcast #36 // class java/lang/String
// invoke the method apply( String ), passing the input argument
5: invokevirtual #38 // Method apply:(Ljava/lang/String;)V
// return the void type
8: getstatic #44 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
11: areturn
Looking back at the Hello$.main()
method above, we can see that, at offset 21, execution of the anonymous function is triggered by a call to its apply( Object )
method.
Finally, for completeness, let’s look at the bytecode for Printer.class
:
public class Printer
// ...
// field
private final scala.Function1<java.lang.String, scala.runtime.BoxedUnit> output;
// field getter
public scala.Function1<java.lang.String, scala.runtime.BoxedUnit> output();
Code:
0: aload_0
1: getfield #14 // Field output:Lscala/Function1;
4: areturn
// constructor
public Printer(scala.Function1<java.lang.String, scala.runtime.BoxedUnit>);
Code:
0: aload_0
1: aload_1
2: putfield #14 // Field output:Lscala/Function1;
5: aload_0
6: invokespecial #21 // Method java/lang/Object."<init>":()V
9: return
We can see that the anonymous function here is treated just like any val
variable. It is stored in the class field output
, and the getter output()
is created. The only difference is that this variable must now implement the Scala interface scala.Function1
(which AbstractFunction1
does).
So, the cost of this elegant Scala feature is the underlying utility classes, created to represent and execute a single anonymous function that can be used as a value. You should take into account the number of such functions, as well as details of your VM implementation, to figure out what it means for your particular application.
Scala Traits
Scala’s traits are similar to interfaces in Java. The following trait defines two method signatures, and provides a default implementation of the second one. Let’s see how it is implemented:
trait Similarity {
def isSimilar(x: Any): Boolean
def isNotSimilar(x: Any): Boolean = !isSimilar(x)
}
Two entities are produced: Similarity.class
, the interface declaring both methods, and the synthetic class, Similarity$class.class
, providing the default implementation:
public interface Similarity {
public abstract boolean isSimilar(java.lang.Object);
public abstract boolean isNotSimilar(java.lang.Object);
}
public abstract class Similarity$class
public static boolean isNotSimilar(Similarity, java.lang.Object);
Code:
0: aload_0
// execute the instance method isSimilar()
1: aload_1
2: invokeinterface #13, 2 // InterfaceMethod Similarity.isSimilar:(Ljava/lang/Object;)Z
// if the returned value is 0, skip to position 14 (return with value 1)
7: ifeq 14
// otherwise, return with value 0
10: iconst_0
11: goto 15
// return the value 1
14: iconst_1
15: ireturn
public static void $init$(Similarity);
Code:
0: return
When a class implements this trait and calls the method isNotSimilar
, the Scala compiler generates the bytecode instruction invokestatic
to call the static method provided by the accompanying class.
Complex polymorphism and inheritance structures may be created from traits. For example, multiple traits, as well as the implementing class, may all override a method with the same signature, calling super.methodName()
to pass control to the next trait. When the Scala compiler encounters such calls, it:
- Determines what exact trait is assumed by this call.
- Determines the name of the accompanying class that provides static method bytecode defined for the trait.
- Produces the necessary
invokestatic
instruction.
Thus we can see that the powerful concept of traits is implemented at the JVM level in way that does not lead to significant overhead, and Scala programmers may enjoy this feature without worrying that it will be too expensive at runtime.
Singletons
Scala provides for the explicit definition of singleton classes using the keyword object
. Let’s consider the following singleton class:
object Config {
val home_dir = "/home/user"
}
The compiler produces two class files:
Config.class
is a pretty simple one:
public final class Config
public static java.lang.String home_dir();
Code:
// execute the method Config$.home_dir()
0: getstatic #16 // Field Config$.MODULE$:LConfig$;
3: invokevirtual #18 // Method Config$.home_dir:()Ljava/lang/String;
6: areturn
This is just a decorator for the synthetic Config$
class that embeds the singleton’s functionality. Examining that class with javap -p -c
produces the following bytecode:
public final class Config$
public static final Config$ MODULE$; // a public reference to the singleton object
private final java.lang.String home_dir;
// static initializer
public static {};
Code:
0: new #2 // class Config$
3: invokespecial #12 // Method "<init>":()V
6: return
public java.lang.String home_dir();
Code:
// get the value of field home_dir and return it
0: aload_0
1: getfield #17 // Field home_dir:Ljava/lang/String;
4: areturn
private Config$();
Code:
// initialize the object
0: aload_0
1: invokespecial #19 // Method java/lang/Object."<init>":()V
// expose a public reference to this object in the synthetic variable MODULE$
4: aload_0
5: putstatic #21 // Field MODULE$:LConfig$;
// load the value "/home/user" and write it to the field home_dir
8: aload_0
9: ldc #23 // String /home/user
11: putfield #17 // Field home_dir:Ljava/lang/String;
14: return
It consists of following:
- The synthetic variable
MODULE$
, through which other objects access this singleton object. - The static initializer
{}
(also known as<clinit>
, the class initializer) and the private methodConfig$
, used to initializeMODULE$
and set its fields to default values - A getter method for the static field
home_dir
. In this case, it is just one method. If the singleton has more fields, it will have more getters, as well as setters for mutable fields.
The singleton is a popular and useful design pattern. The Java language does not provide a direct way to specify it at the language level; rather, it is the responsibility of the developer to implement it in Java source. Scala, on the other hand, provides a clear and convenient way to declare a singleton explicitly using the object
keyword. As we can see looking under the hood, it is implemented in an affordable and natural way.
Conclusion
We’ve now seen how Scala compiles several implicit and functional programming features into sophisticated Java bytecode structures. With this glimpse into the inner workings of Scala, we can gain a deeper appreciation of Scala’s power, helping us to get the most of this powerful language.
We also now have the tools to explore the language ourselves. There are many useful features of the Scala syntax that are not covered in this article, such as case classes, currying, and list comprehensions. I encourage you to investigate Scala’s implementation of these structures yourself, so you can learn how to be a next-level Scala ninja!
The Java Virtual Machine: A Crash Course
Just like the Java compiler, the Scala compiler converts source code into .class
files, containing Java bytecode to be executed by the Java Virtual Machine. In order to understand how the two languages differ under the hood, it is necessary to understand the system they are both targeting. Here, we present a brief overview of some major elements of the Java Virtual Machine architecture, class file structure, and assembler basics.
Note that this guide will only cover the minimum to enable following along with the above article. Although many major components of the JVM are not discussed here, complete details can be found in the official docs, here.
Decompiling Class Files with
javap
Constant Pool
Field and Method Tables
JVM Bytecode
Method Calls and the Call Stack
Execution on the Operand Stack
Local Variables
Return to Top
Decompiling Class Files with javap
Java ships with the javap
command line utility, which decompiles .class
files into a human-readable form. Since Scala and Java class files both target the same JVM, javap
can be used to examine class files compiled by Scala.
Let’s compile the following source code:
// RegularPolygon.scala
class RegularPolygon( val numSides: Int ) {
def getPerimeter( sideLength: Double ): Double = {
println( "Calculating perimeter..." )
return sideLength * this.numSides
}
}
Compiling this with scalac RegularPolygon.scala
will produce RegularPolygon.class
. If we then run javap RegularPolygon.class
we will see the following:
$ javap RegularPolygon.class
Compiled from "RegularPolygon.scala"
public class RegularPolygon {
public int numSides();
public double getPerimeter(double);
public RegularPolygon(int);
}
This is a very simple breakdown of the class file that simply shows the names and types of the public members of the class. Adding the -p
option will include private members:
$ javap -p RegularPolygon.class
Compiled from "RegularPolygon.scala"
public class RegularPolygon {
private final int numSides;
public int numSides();
public double getPerimeter(double);
public RegularPolygon(int);
}
This is still not a lot of information. To see how the methods are implemented in Java bytecode, let’s add the -c
option:
$ javap -p -c RegularPolygon.class
Compiled from "RegularPolygon.scala"
public class RegularPolygon {
private final int numSides;
public int numSides();
Code:
0: aload_0
1: getfield #13 // Field numSides:I
4: ireturn
public double getPerimeter(double);
Code:
0: getstatic #23 // Field scala/Predef$.MODULE$:Lscala/Predef$;
3: ldc #25 // String Calculating perimeter...
5: invokevirtual #29 // Method scala/Predef$.println:(Ljava/lang/Object;)V
8: dload_1
9: aload_0
10: invokevirtual #31 // Method numSides:()I
13: i2d
14: dmul
15: dreturn
public RegularPolygon(int);
Code:
0: aload_0
1: iload_1
2: putfield #13 // Field numSides:I
5: aload_0
6: invokespecial #38 // Method java/lang/Object."<init>":()V
9: return
}
That’s a little more interesting. However, to really get the whole story, we should use the -v
or -verbose
option, as in javap -p -v RegularPolygon.class
:
Here we finally see what’s really in the class file. What does all this mean? Let’s take a look at some of the most important parts.
Constant Pool
The development cycle for C++ applications includes compilation and linkage stages. The development cycle for Java skips an explicit linkage stage because linkage happens at runtime. The class file must support this runtime linking. This means that when the source code refers to any field or method, the resulting bytecode must keep relevant references in symbolic form, ready to be dereferenced once the application has loaded into memory and actual addresses can be resolved by the runtime linker. This symbolic form must contain:
- class name
- field or method name
- type information
The class file format specification includes a section of the file called the constant pool, a table of all the references needed by the linker. It contains entries of different types.
// ...
Constant pool:
#1 = Utf8 RegularPolygon
#2 = Class #1 // RegularPolygon
#3 = Utf8 java/lang/Object
#4 = Class #3 // java/lang/Object
// ...
The first byte of each entry is a numeric tag indicating the type of entry. The remaining bytes provide information about the value of the entry. The number of bytes and rules for their interpretation depends on the type indicated by the first byte.
For example, a Java class that uses a constant integer 365
may have a constant pool entry with the following bytecode:
x03 00 00 01 6D
The first byte, x03
, identifies the entry type, CONSTANT_Integer
. This informs the linker that the next four bytes contain the value of the integer. (Note that 365 in hexadecimal is x16D
). If this is the 14th entry in the constant pool, javap -v
will render it like this:
#14 = Integer 365
Many constant types are composed of references to more “primitive” constant types elsewhere in the constant pool. For instance, our example code contains the statement:
println( "Calculating perimeter..." )
Usage of a string constant will produce two entries in the constant pool: one entry with type CONSTANT_String
, and another entry of type CONSTANT_Utf8
. The entry of type Constant_UTF8
contains the actual UTF8 representation of the string value. The entry of type CONSTANT_String
contains a reference to the CONSTANT_Utf8
entry:
#24 = Utf8 Calculating perimeter...
#25 = String #24 // Calculating perimeter...
Such complication is necessary because there are other types of constant pool entries that refer to entries of type Utf8
and that are not entries of type String
. For example, any reference to a class attribute will produce a CONSTANT_Fieldref
type, which contains a series of references to the class name, attribute name, and attribute type:
#1 = Utf8 RegularPolygon
#2 = Class #1 // RegularPolygon
#9 = Utf8 numSides
#10 = Utf8 I
#12 = NameAndType #9:#10 // numSides:I
#13 = Fieldref #2.#12 // RegularPolygon.numSides:I
For more details about the constant pool, see the JVM documentation.
Field and Method Tables
A class file contains a field table that contains information about each field (i.e., attribute) defined in the class. These are references to constant pool entries that describe the field’s name and type as well as access control flags and other relevant data.
A similar method table is present in the class file. However, in addition to name and type information, for each non-abstract method, it contains the actual bytecode instructions to be executed by the JVM, as well as data structures used by the method’s stack frame, described below.
JVM Bytecode
The JVM uses its own internal instruction set to execute compiled code. Running javap
with the -c
option includes the compiled method implementations in the output. If we examine our RegularPolygon.class
file this way, we will see the following output for our getPerimeter()
method:
public double getPerimeter(double);
Code:
0: getstatic #23 // Field scala/Predef$.MODULE$:Lscala/Predef$;
3: ldc #25 // String Calculating perimeter...
5: invokevirtual #29 // Method scala/Predef$.println:(Ljava/lang/Object;)V
8: dload_1
9: aload_0
10: invokevirtual #31 // Method numSides:()I
13: i2d
14: dmul
15: dreturn
The actual bytecode might look something like this:
xB2 00 17
x12 19
xB6 00 1D
x27
...
Each instruction starts with a one-byte opcode identifying the JVM instruction, followed by zero or more instruction operands to be operated on, depending on the format of the specific instruction. These are typically either constant values, or references into the constant pool. javap
helpfully translates the bytecode into a human-readable form displaying:
- The offset, or position of the first byte of the instruction within the code.
- The human-readable name, or mnemonic, of the instruction.
- The value of the operand, if any.
Operands that are displayed with a pound sign, such as #23
, are references to entries in the constant pool. As we can see, javap
also produces helpful comments in the output, identifying what exactly is being referenced from the pool.
We’ll discuss a few of the common instructions below. For detailed information about the complete JVM instruction set, see the documentation.
Method Calls and the Call Stack
Each method call must be able to run with its own context, which includes things such as locally-declared variables, or arguments that were passed to the method. Together, these make up a stack frame. Upon invocation of a method, a new frame is created and placed on top of the call stack. When the method returns, the current frame is removed from the call stack and discarded, and the frame that was in effect before the method was called is restored.
A stack frame includes a few distinct structures. Two important ones are the operand stack and the local variable table, discussed next.
Execution on the Operand Stack
Many JVM instructions operate on their frame’s operand stack. Rather than specifying a constant operand explicitly in the bytecode, these instructions instead take the values on the top of the operand stack as input. Typically, these values are removed from the stack in the process. Some instructions also place new values on top of the stack. In this way, JVM instructions can be combined to perform complex operations. For example, the expression:
sideLength * this.numSides
is compiled to the following in our getPerimeter()
method:
8: dload_1
9: aload_0
10: invokevirtual #31 // Method numSides:()I
13: i2d
14: dmul
- The first instruction,
dload_1
, pushes the object reference from slot 1 of the local variable table (discussed next) onto the operand stack. In this case, this is the method argumentsideLength
.- The next instruction,aload_0
, pushes the object reference at slot 0 of the local variable table onto the operand stack. In practice, this is almost always the reference tothis
, the current class. - This sets up the stack for the next call,
invokevirtual #31
, which executes the instance methodnumSides()
.invokevirtual
pops the top operand (the reference tothis
) off the stack to identify from what class it must call the method. Once the method returns, its result is pushed onto the stack. - In this case, the value returned (
numSides
) is in integer format. It must be converted to a double floating point format in order to multiply it with another double value. The instructioni2d
pops the integer value off the stack, converts it to floating point format, and pushes it back onto the stack. - At this point, the stack contains the floating point result of
this.numSides
on top, followed by the value of thesideLength
argument that was passed to the method.dmul
pops these top two values from the stack, performs floating point multiplication on them, and pushes the result onto the stack.
When a method is called, a new operand stack is created as part of its stack frame, where operations will be performed. We must be careful with terminology here: the word “stack” may refer to the call stack, the stack of frames providing context for method execution, or to a particular frame’s operand stack, upon which JVM instructions operate.
Local Variables
Each stack frame keeps a table of local variables. This typically includes a reference to this
object, any arguments that were passed when the method was called, and any local variables declared within the method body. Running javap
with the -v
option will include information about how each method’s stack frame should be set up, including its local variable table:
public double getPerimeter(double);
// ...
Code:
0: getstatic #23 // Field scala/Predef$.MODULE$:Lscala/Predef$;
3: ldc #25 // String Calculating perimeter...
// ...
LocalVariableTable:
Start Length Slot Name Signature
0 16 0 this LRegularPolygon;
0 16 1 sideLength D
In this example, there are two local variables. The variable in slot 0 is named this
, with the type RegularPolygon
. This is the reference to the method’s own class. The variable in slot 1 is named sideLength
, with the type D
(indicating a double). This is the argument that is passed to our getPerimeter()
method.
Instructions such as iload_1
, fstore_2
, or aload [n]
, transfer different types of local variables between the operand stack and the local variable table. Since the first item in the table is usually the reference to this
, the instruction aload_0
is commonly seen in any method that operates on its own class.
This concludes our walkthrough of JVM basics.
Sergey Reznick
Saint Petersburg, Russia
Member since October 21, 2015
About the author
Sergey is an experienced software developer with a background in computer science and basic algorithms and over 20 years of experience.
PREVIOUSLY AT