In
the last post I wrote about how to retrieve information on function arguments and return values using DWARF debugging information. You might now wonder what that was all about, but no worries: I'll explain everything in this post. Some of you might already have guessed that it's related to
Cibyl, my binary translator. The main issue at hand is this:
- Translated C function calls are expensive to make in Java
and the rest of the post will explain the problem and a way of improving that situation. First, since Cibyl translates MIPS binaries into java bytecode, we'll need some background on MIPS and Java bytecode, focusing on the function calling conventions.
MIPS has 32 registers which, like for most microprocessors, are fixed (I.e., it doesn't use SPARC-style register windows). In the MIPS ABI, four registers are used for passing arguments to functions. These arguments, and any additional arguments, are backed by the stack, as illustrated in the figure below.

So to call a function with five 32-bit arguments, the
caller would put the first four in registers
a0..
a3 (the blue boxes in the figure) and the fifth on the stack (the pink box). If there are fewer arguments, they are all placed in registers. Function return values are also passed via registers, using the
v0/
v1 register pair. For 32-bit return values, only
v0 is used, and if more than 64-bits are needed, the rest will be copied via the stack.
The ABI also specifies a set of
callee-saved registers (
s0..
s7) and a stack pointer (
sp). The other registers are basically used for temporaries and are allowed to be overwritten by the callee. (I'm ignoring some special cases here, which can be handled with compiler options).
How does this translate into Java bytecode then? I've explained how Cibyl works
in more details in other articles, but a recap can be good. The important part here is how MIPS registers are handled. A naïve implementation would use static integer class members in Java to represent registers, which would correspond nicely with how actual MIPS registers are implemented. All C functions can then be translated into Java prototypes
void fn_name(void), since all register values are always available statically. In fact, this is how the very first Cibyl release was implemented. I say it's naïve because it's quite inefficient as well. It tends to generate code such as
iconst_1
putstatic CRunTime/t0 I
iconst_2
putstatic CRunTime/t1 I
...
getstatic CRunTime/t0 I
getstatic CRunTime/t1 I
iadd
putstatic CRunTime/t2 Iand
getstatics and
putstatics are both slow and big (bytecode-wise).
So instead, Cibyl uses Java local variables to represent MIPS registers. This allows for much more efficient and compact bytecode to be generated, but instead gives us a problem with functions: You now need to pass all registers which pass values between
caller and
callee to the
callee function. Luckily, this actually boils down to only the argument registers, and the stack pointer. All the other registers are either local temporaries, which we can just throw away, or callee-saved registers, which are saved automatically since function-locals are used.
Let's look at how a function call in Java bytecode works. Java bytecode is a stack-oriented architecture, i.e., all operations are done on values on an operand stack. For example, an integer addition is done by pushing two values on the operand stack, and executing an
iadd instruction which will pop them and push back the result. The figure below shows how a function is called:

i.e., the
caller will push some values on the operand stack and then invoke the method. Since Java is typesafe, the correct number of arguments
must be on the operand stack, or the JVM will not load the program. On the call, the
callee will receive the arguments not on the operand stack, but placed in local variables (the first
n), which is where they would typically end up anyway. To return a value, it's again pushed on the operand stack and returned via the
ireturn instruction (for integer return values). When control returns to the caller, it can find the value at the top of the stack. In the figure, the blue boxes represent the arguments and the code shows an example function call.
Fine. So most of you can now guess how Cibyl handles MIPS argument registers:
a0...
a3 is simply pushed on the operand stack and ends up as local variables in the function. In addition, the stack pointer
sp is also pushed, as it's value is needed by both caller and callee. Other registers are simply allocated and used locally.
But how does Cibyl know which of the argument registers to pass? Now we're getting to the point! Trivially it could pass all argument registers to all functions, but that's unnecessary since not all functions have that many parameters. More function arguments also mean higher overhead, and we'd like to avoid that. So far, Cibyl has simply checked if the argument registers and the stack pointer are used by the function, and then pass the needed ones. Unfortunately,
a0..
a3 can sometimes be used as local temporaries as well - even though they are not function parameters. In this case, Cibyl would pass them in vain, inducing extra overhead for no gain. Luckily, through DWARF there is a better way.
The idea comes from Marcus Groeber, who noticed this while playing with Cibyl. The DWARF debugging information actually contains all information we need to find out which arguments registers are
actually used as parameters. The
last post describes how to look this up. So what's left to do in Cibyl is simply to note this down, and only pass the actual parameters. The same goes for the return values.
In effect, the prototypes in the generated Java methods now become basically the same as the original C functions. Some cases are still tricky, namely parameters and return values that are larger than 32 bits. For these, I simply revert back to the old behavior. These are fortunately pretty uncommon, in C and C++, mostly either small values or pointers are passed around.
I've not yet done any real performance measurements with the new approach, and indeed, I've managed to introduce a regression which I haven't yet tracked down. But around 3% of the instructions in a large program (
Frodo, the C64 emulator) are pruned with this approach. I would guess that the reduced overhead should be measurable for most applications, although probably not dramatic.
So in conclusion: DWARF debugging information makes for an interesting way of optimizing function calls for Cibyl. It also brings the Java bytecode translation a bit close to the original C program, which should be nice for those that like to read Java bytecode assembly!
Update: The regression has been found and fixed. I've also
moved the entire Cibyl source to github. Git is simply better than svn...