JVM Internals

In this series i want to give you some insides about the JVM.

I will put together some tutorials and docs from the web and explain them and give some code examples.

So let's start:

Virtual Machine



The JRE is composed of the Java API and the JVM. The role of the JVM is to read the Java application through the Class Loader and execute it along with the Java API.
A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Originally, Java was designed to run based on a virtual machine separated from a physical machine for implementing WORA (Write Once Run Anywhere), although this goal has been mostly forgotten. Therefore, the JVM runs on all kinds of hardware to execute the Java Bytecode without changing the Java execution code.
The features of JVM are as follows:

  • Stack-based virtual machine: The most popular computer architectures such as Intel x86 Architecture and ARM Architecture run based on a register. However, JVM runs based on a stack.
  • Symbolic reference: All types (class and interface) except for primitive data types are referred to through symbolic reference, instead of through explicit memory address-based reference. 
  • Garbage collection: A class instance is explicitly created by the user code and automatically destroyed by garbage collection.
  • Guarantees platform independence by clearly defining the primitive data type: A traditional language such as C/C++ has different int type size according to the platform. The JVM clearly defines the primitive data type to maintain its compatibility and guarantee platform independence.
  • Network byte order: The Java class file uses the network byte order. To maintain platform independence between the little endian used by Intel x86 Architecture and the big endian used by the RISC Series Architecture, a fixed byte order must be kept. Therefore, JVM uses the network byte order, which is used for network transfer. The network byte order is the big endian.

Sun Microsystems developed Java. However, any vendor can develop and provide a JVM by following the Java Virtual Machine Specification. For this reason, there are various JVMs, including Oracle Hotspot JVM and IBM JVM. The Dalvik VM in Google's Android operating system is a kind of JVM, though it does not follow the Java Virtual Machine Specification. Unlike Java VMs, which are stack machines, the Dalvik VM is a register-based architecture. Java bytecode is also converted into an register-based instruction set used by the Dalvik VM.

Java bytecode

To implement WORA, the JVM uses Java bytecode, a middle-language between Java (user language) and the machine language. This Java bytecode is the smallest unit that deploys the Java code.
Before explaining the Java bytecode, let's take a look at it. This case is a summary of a real example that has occurred in development process.

Symptom

An application that had been running successfully no longer runs. Moreover, returns the following error after the library has been updated.

Exception in thread "main" java.lang.NoSuchMethodError: com.nhn.user.UserAdmin.addUser(Ljava/lang/String;)V
    at com.nhn.service.UserService.add(UserService.java:14)
    at com.nhn.service.UserService.main(UserService.java:19)


The application code is as follows, and no changes to it have been made.

// UserService.java
public void add(String userName) {
    admin.addUser(userName);
}


The updated library source code and the original source code are as follows.

// UserAdmin.java - Updated library source code
public User addUser(String userName) {
    User user = new User(userName);
    User prevUser = userMap.put(userName, user);
    return prevUser;
}
// UserAdmin.java - Original library source code
public void addUser(String userName) {
    User user = new User(userName);
    userMap.put(userName, user);
}

In short, the addUser() method which has no return value has been changed to a method that returns the User class instance. However, the application code has not been changed, since it does not use the return value of the addUser() method.
At first glance, the com.nhn.user.UserAdmin.addUser() method seems to still exist, but if so, why does NoSuchMethodError occur?

Reasons

The reason is that the application code has not been compiled to a new library. In other words, the application code seems to invoke methods regardless of the return value. However, the compiled class file indicates the method that has a return value.
You will see this through the following error message.

java.lang.NoSuchMethodError: com.nhn.user.UserAdmin.addUser(Ljava/lang/String;)V

NoSuchMethodError has occurred since the "com.nhn.user.UserAdmin.addUser(Ljava/lang/String;)V" method could not be found. Take a look at "Ljava/lang/String;" and the last "V". In the expression of Java Bytecode, "L<classname>;" is the class instance. This means that the addUser() method returns one java/lang/String object as a parameter. In the library of this case, the parameter has not been changed, so it is normal. The last "V" of the message stands for the return value of the method. In the expression of Java Bytecode, "V" means that it has no return value. In short, the error message means that one java.lang.String object has been returned as a parameter and the com.nhn.user.UserAdmin.addUser method without any return value has not been found.
Since the application code has been compiled to the previous library, the class file defined that a method that returns "V" should be invoked. However, in the changed library, the method that returned "V" did not exist, but the method that returned "Lcom/nhn/user/User;" has been added. Therefore, a NoSuchMethodError occurred.
Note
The error has occurred since the developer did not compile a new library again. However, in this case, the library provider is mostly responsible for that. There was no return value of the method as public, but it later has been changed to return the user class instance. This is an obvious method signature change. This means that the backward compatibility of the library has been broken. Therefore, the library provider must have reported to the users that the method has been changed.
Let's go back to the Java Bytecode. Java Bytecode is the essential element of JVM. The JVM is an emulator that emulates the Java Bytecode. Java compiler does not directly convert high-level language such as C/C++ to the machine language (direct CPU instruction); it converts the Java language that the developer understands to the Java Bytecode that the JVM understands. Since Java bytecode has no platform-dependent code, it is executable on the hardware where the JVM (accurately, the JRE of the same profile) has been installed, even when the CPU or OS is different (a class file developed and compiled on the Windows PC can be executed on the Linux machine without additional change.) The size of the compiled code is almost identical to the size of the source code, making it easy to transfer and execute the compiled code via  the network.
The class file itself is a binary file that cannot be understood by a human. To manage this file, JVM vendors provide javap, the disassembler. The result of using javap is called Java assembly. In the above case, the Java assembly below is obtained by disassembling the UserService.add() method of the application code with the javap -c option.

public void add(java.lang.String);
  Code:
   0:   aload_0
   1:   getfield        #15; //Field admin:Lcom/nhn/user/UserAdmin;
   4:   aload_1
   5:   invokevirtual   #23; //Method com/nhn/user/UserAdmin.addUser:(Ljava/lang/String;)V
   8:   return

In this Java assembly, the addUser() method is invoked by the fourth row, "5: invokevirtual #23;". This means that the method corresponding to the 23rd index should be invoked. The method of the 23rd index is annotated by the javap program. The invokevirtual is the OpCode (operation code) of the most basic command that invokes a method in the Java Bytecode. For reference, there are four OpCodes that invoke a method in the Java Bytecode: invokeinterface, invokespecial, invokestatic, and invokevirtual. The meaning of each OpCode is as follows.

  • invokeinterface: Invokes an interface method
  • invokespecial: Invokes an initializer, private method, or superclass method
  • invokestatic: Invokes static methods
  • invokevirtual: Invokes instance methods

The instruction set of Java Bytecode consists of OpCode and Operand. The OpCode such as invokevirtual requires a 2-byte Operand.
By compiling the application code above with the updated library and then disassembling it, the following result will be obtained.

public void add(java.lang.String);
  Code:
   0:   aload_0
   1:   getfield        #15; //Field admin:Lcom/nhn/user/UserAdmin;
   4:   aload_1
   5:   invokevirtual   #23; //Method com/nhn/user/UserAdmin.addUser:(Ljava/lang/String;)Lcom/nhn/user/User;
   8:   pop
   9:   return


You can see that the method corresponding to the 23rd has been converted to the method that returns "Lcom/nhn/user/User;".
In the disassembled result above, what does the number in front of the code mean?
It is the byte number. Perhaps this is the reason why the code executed by the JVM is called Java "Byte"code. In short, the bytecode instruction OpCodes such as aload_0getfield, and invokevirtual are expressed as a 1-byte byte number. (aload_0 = 0x2a, getfield = 0xb4, invokevirtual = 0xb6) Therefore, the maximum number of Java Bytecode instruction OpCodes is 256.
OpCodes such as aload_0 and aload_1 do not need any Operand. Therefore, the next byte of aload_0 is the OpCode of the next instruction. However, getfield and invokevirtual need the 2-byte Operand. Therefore, the next instruction of getfield on the first byte is written on the fourth byte by skipping two bytes. The bytecode shown through Hex Editor is as follows.

2a b4 00 0f 2b b6 00 17 57 b1

In the Java Bytecode, the class instance is expressed as "L;" and void is expressed as "V". In this way, other types have their own expressions. The table below summarizes the expressions.

Table 1: Type Expression in Java Bytecode

Java BytecodeTypeDescription
Bbytesigned byte
CcharUnicode character
Ddoubledouble-precision floating-point value
Ffloatsingle-precision floating-point value
Iintinteger
Jlonglong integer
L<classname>referencean instance of class <classname>
Sshortsigned short
Zbooleantrue or false
[referenceone array dimension
The table below shows examples of Java Bytecode expressions.

Table 2: Examples of Java Bytecode Expressions

Java CodeJava Bytecode Expression
double d[][][];[[[D
Object mymethod(int I, double d, Thread t)(IDLjava/lang/Thread;)Ljava/lang/Object;
For more details, see "4.3 Descriptors" in "The Java Virtual Machine Specification, Second Edition". For various Java Bytecode instruction sets, see "6. The Java Virtual Machine Instruction Set" in "The Java Virtual Machine Specification, Second Edition".

Source:
https://www.cubrid.org/blog/understanding-jvm-internals/

Comments

Popular Posts