Reading: Reflections on Trusting Trust

Table of Contents

Paper

Digest

This paper first presents the Quine program (https://gist.github.com/yzh119/b4fa97645b155418969f75a682032397), and describe the principle of designing such programs; most importantly, we can add arbitrary amount of characters to such programs and they are still fixpoints of compilers.

The paper next show how can we add features to compilers. Say we already have a compiler A (in binary executable format), and we want it to recognize some new symbol, we add this feature to compiler's source code and use A to compile the new codebase and get compiler B. We now regard B as the standard compiler, and we can write programs with the new symbols without any difficulty and our compiler could recognize the symbol. This is what we called bootstrap of compilers: first we have a compiler with very limited functionalities and optimizations, and iteratively improves codebase and updates the compiler by compiling itself.

The last part is the most excited part, it reveals that we can not trust the behavior of our compiler (and programs), because we cannot guarantee developers have not inserted trojan horses(some deliberate behaviors) to the compilers, and there is no way to detect it. As described in stage 2, we can insert a backdoor to the compiler as a feature (feature 1) but it's too obvious, experienced programmers can easily find it in code review. What if we only release the compiled binary without updating codebase? It doesn't work as well because the feature would be lost during the bootstrap: When user compile the original codebase with new compiler, the code snippet to insert the backdoor is not there.

We than come up with a new idea: add another new feature (feature 2) to the compiler. When we found the input code is the compiler source code where we inserted feature 1, we append the backdoor snippet there. Note that we also want to generate the code corresponding to feature 2, this is exactly the Quine program problem, by doing the same trick we devise a "fork" of compiler code source that can produce compilers with backdoors (compiler-X), and even if we compile the original compiler codebase with compiler-X, the backdoor still exists. Without careful scrutiny of assembly code, it's hard to detect these backdoors.

Such tricks are not only applicable to compilers, but also works for assemblers, hardwares, etc.

Reference

Author: expye(Zihao Ye)

Email: expye@outlook.com

Date: 2021-04-10

Last modified: 2022-12-27 Tue 07:18

Licensed under CC BY-NC 4.0