www. Jacob   Zimmerman .com



Bootstrapping Impcore

Spring 2023
Published: January 26, 2024

In my school's 'COMP105: Programming Languages' class, the first unit introduces formal language semantics using a primitive language called Impcore (imperative-core). Impcore combines the syntax of Lisp, the performance of Python, and the computational complexity of a potato. What joy! In this ill-advised journey, I decided to bootstrap it.

I had recently come across PiMaker's impressive "Linux in a Pixel Shader", in which he emulates a Linux kernel inside of VRChat using a fragment shader, and I wanted to give my own shot at running Linux in someplace it truly doesn't belong. So during an office hours early in the semester I bet my TA (hi Faith!), that I could get Linux to run within Impcore. Then I would put the original Impcore on the Linux VM and run it inside.

Sadly, I pretty quickly remembered Impcore has no array type, so I couldn't even run a turing machine[0]. I modified the C interpreter to add a special operator to allow "direct RAM access" and I got to work... for about 10 minutes before I realized that not only did Impcore leaks all of it's memory and blows up the stack (even though recursion is the only control flow). It seemed like Linux truly did not belong in the Impcore interpreter.

I don't give up that easily

Building a JIT Compiler

I spent the next month teaching myself to write a compiler. I picked Rust as the language, for performance reasons, and got to work. The compiler used a PEG grammar, which I no longer endorse over LR, and LLVM for codegen.

Tracking down countless segfaults in the Rust LLVM bindings forced me to finally learn a proper debugger. But eventually, I was able to execute my previous homework assignments.

To get the project to work I had to add bitwise operators, arrays and pointers, and IO to Impcore. But finally I had a working compiler. I was able to benchmark results that it was about 14000x faster than Python on my homework assignments.

Impcore doesn't have semantics that normal compiled languages, usually out of convenience. For example referencing an undefined variable in a function definition will only cause an error when you run the function. So I had to lazily compile functions only when they had no free variables.

I quickly realized that Impcore's syntax was not suitable for large programs, so I added a macro system to the compiler. I still didn't really know about Lisp macros, but realized the power of metaprogramming pretty quickly.

Building a RISC-V VM

If there is a bug in the VM, it simply hangs, and I didn't know enough about RISC-V to debug it. I realized there was no chance I could debug both the emulator (which hangs or segfaults immediately during a bug), and the compiler if the Impcore program broke. So I decided to build a working VM in C, then iteratively modify it to only use a subset of C that maps one-to-one with Impcore. For the last step I would use my eyes to make sure everything looked the same (famous last words). I scrounged together the manual and a few other people's implementations and after a little while I had a working C version of the emulator.

Impcore only has a signed 32-bit integer type, so I had to implement the 64 bit RISC-V instructions using integers. I also had to change the C program so that the only data type was a 32-bit signed integer. Impcore doesn't have any control flow other than recursion and conditionals, it also has no statements, only expressions. I had to modify the C program to use only ternary operators and recursion. At last, after hundreds of tiny commits and dozens of git bisects, it C subset worked.

Translating to Impcore

Here I learned a powerful lesson about what makes a good programming language. It's not just features, but tooling too! I had to completely overhaul the compiler error messages so that my dozens of nested S-expressions could be fixed.

I finished the translation, and was quite sure that it was correct. When I ran it, it just didn't work. In the end I decided 2000 lines of Impcore isn't worth debugging so I gave up. I lost the bet

During the project I'd occasionally approach my professor who also taught compilers to get LLVM advice. In the end he asked me to TA, not CS105, but CS107: Compilers, the next semester. I may have lost the bet, but I learned a ton and did won a job, so not as huge of a time-waste as bootstrapping Impcore sound.

Message to future Jacob

It's not over, you've built more compilers and it really wouldn't take you more than a week in a faster (development-wise) language like OCaml. Also you could use DrRacket and Racket to test the emulator first. Maybe someday you should return.