Joeq has many similarities to another virtual machine written in Java, called Jalapeno[1,8]. Before Joeq, the author of this paper worked on Jalapeno, and many of the ideas from Jalapeno were reimplemented in Joeq. In particular, the bootstrapping technique and the compiler and garbage collection infrastructures were heavily influenced by the designs in Jalapeno.
However, there is a difference in focus between the two systems, which shows up in the design. Jalapeno is heavily geared towards being a virtual machine for server-side Java, and many of the design decisions reflect that philosophy. For example, Jalapeno completely forgoes an interpreter and takes a compile-only approach. The runtime, data structures and IR are fairly Java-specific. All virtual machine data structures, including code, are treated as objects. Therefore, all code must be compiled as relocatable, and so Jalapeno avoids storing absolute memory references. There is limited support for Jalapeno as a static analysis engine or compiler. Joeq was designed from the start to be language-independent and to include significant support for static analysis and compilation. Recently, Jalapeno independently began using faux address classes, similar to our treatment of addresses; although they do not make a distinction between addresses to code, stack, or heap locations.
The design of the compiler infrastructure drew heavily from two intermediate representations that the author of this paper has extensive experience with: the Jalapeno optimizing compiler IR and the MIT Flex compiler IR. We tried to extract the good ideas from these systems while leaving out difficult, ineffectual, or counterintuitive pieces. Both Jalapeno and Flex use an explicitly-typed, pseudo-register based representation with high-level and low-level operations. They both support the same notion of factored control flow as our Quad format. What is more interesting than the similarities are the differences: In Jalapeno, the IR is simply a list of instructions and the CFG is separate. We decided to make the CFG part of the core IR in Joeq because almost every use of the IR requires control flow information, and maintaining both the ordering of the instructions in the list and of the control flow graph made control-flow transformations more difficult than they needed to be. Flex uses the notion of code factories to generate code and perform compiler passes. We dropped this idea in Joeq in favor of using the visitor pattern, which (to us) is simpler, easier to understand, and easier to program correctly. Flex also includes a pointer in every instruction back to its context. Although this can be useful, we found it to be too space-consuming to justify.
There are some other virtual machines written in Java. JavaInJava is an implementation of a Java virtual machine written entirely in Java. Rivet is an extensible tool platform for debugging and testing written in Java that is structured as a Java virtual machine. Both JavaInJava and Rivet run on a host Java virtual machine using a technique similar to hosted execution in the Joeq virtual machine.
Marmot is an optimizing compiler infrastructure for Java that is written almost entirely in Java. It includes an optimizing native-code compiler, runtime system, and libraries for a large subset of Java. The compiler implements many standard scalar optimizations, along with a few object-oriented optimizations. It uses a multi-level IR and a strongly-typed SSA form. Marmot supports several garbage collectors written in C++. It is difficult to evaluate the design of Marmot because the source code is not available. Intel's virtual machine is written in C++, and it also uses a typed intermediate language. One feature in common with Joeq is that it supports garbage collection at every instruction.
SableVM is a portable Java virtual machine written in C. Its goals are to be small, fast, and efficient, as well as provide a platform for conducting research. It implements many interesting techniques such as bidirectional object layouts, a threaded interpreter and efficient locking. Soot is a framework for analyzing and optimizing Java. It includes three intermediate representations -- Baf, a streamlined bytecode representation, Jimple, a typed 3-address representation, and Grimp, a version of Jimple with aggregated high-level information. The first two representations are similar to our bytecode and Quad IR's, respectively. In the future, we are planning to extend our IR to include high-level information, a la Grimp.
The Microsoft .NET framework has similar goals of supporting multiple languages. The Common Language Infrastructure platform is a development platform that includes a virtual machine specification (named Virtual Execution System, or VES) that has a full-featured runtime environment that includes garbage collection, threading, and a comprehensive class library. It also includes a general language specification (named Common Language Specification, or CLS) that compiler writers can output to if they want to generate classes and code that can interoperate with other programming languages. The intermediate representation that they use is called Common Intermediate Language, or CIL. There are frontends that output CIL from a huge number of languages: Managed C++, Java Script, Eiffel, Component Pascal, APL, Cobol, Oberon, Perl, Python, Scheme, Smalltalk, Standard ML, Haskell, Mercury and Oberon. In the future, we plan to add a CIL loader to Joeq so that we can leverage the various frontends and other work on CIL. The upcoming Microsoft Phoenix project also has similar goals of using a single framework for static and dynamic compilation of safe and unsafe languages with support for both manual and automatic storage.