導入

静的プログラム解析はプログラムのとりうる挙動に関する質問に自動的に回答することを目的とする。この章では、静的プログラム解析がなぜ有用で興味深いのかを解説し、また解析ツールの基本的な特徴について議論する。

1.1 静的プログラム解析の応用

静的プログラム解析は1960年代初頭、コンパイラの最適化から用いられた。さらに最近は不具合の発見や検証のための道具として、またIDEに組み込まれプログラム開発を助けている。以下では、これらの異なる応用で発生するプログラムの動作に関する問題について、いくつかの例を示します。

プログラム最適化のための解析 コンパイラの最適化（インタプリタの実行時コンパイラを含む）は効率的なコードを生成するために、対象となるプログラムの様々な特性について知る必要がある。特性の一例を以下に挙げます:

プログラムはデッドコード、あるいはより具体的に、mainから到達不能な関数 \(f\) を含んでいるか？もし含むなら、そのコードを削ることができる。
ループ内に含まれる式はそのすべての反復において同じ値を取るか？その場合は、その式をループの外に動かして無駄な計算を省くことができる。
変数 \(x\) の値はプログラムに対する入力に依存しているか？そうではない場合、コンパイル時に事前に計算しておくことができる。
整数の値をとる変数 \(x\) の上限と下限はなにか？その答えによっては、変数の実行時表現を決める指針を得られる。
\(p\) と \(q\) はメモリ上の独立したデータ構造を指しているか？その場合は並列処理を行えるかもしれない。

プログラムの正しさの解析 エラーを検出する（またはエラーがないことを検証する）ために設計され最も成功した解析ツールは、特定のプログラミング言語で書かれたプログラムに汎用的に適用される一般的な正しさの特性を対象としています。Cのような不安全な言語では、こうした一般的な問題はときに致命的な脆弱性の原因となります。Javaのようなより安全な言語では、このようなエラーは一般的にそれほど深刻ではないものの、それでもプログラムのクラッシュを引き起こすことがあります。特性の一例を以下に挙げます:

nullポインタの解釈、ゼロによる除算、または演算オーバーフローを引き起こす入力は存在するか？
すべての変数は読み取られる前に初期化されているか？
配列は常にその範囲内でアクセスされているか？
ぶらさがった参照、例えばすでに開放されたメモリへのポインタがあるか？
すべての入力においてプログラムが終了するか？OSのようなOSのようなリアクティブなシステムであっても、デバイスドライバのルーチンなど、個々のソフトウェア部品は常に終了することが期待されています。

その他の正確さを示す特性は言語ではなく個別のプログラム（またはライブラリ）の仕様によって決まります。例えば:

すべての表明は成功することが保証されているか？表明はすべての実行において保たれると期待されるプログラム固有の正確さを示している。
next関数を呼び出す前に必ずhasNext関数が呼び出されているか、read関数の前にopen関数が呼び出されているか？多くのライブラリはこのような型状態の正しさとも呼ばれる特性を持っている。
対象のプログラムは何らかの入力に対してActivityNotFoundExceptionあるいはSQLiteExceptionを投げるか？

ウェブやモバイルのソフトウェアでは、情報の流れの正確さは非常に重要な特性である:

信頼されていないユーザによって入力された値が検証されないままファイルシステム操作に用いられることがあるか？あるならば 整合性 の違反となる。
秘密情報が公開されてしまうことはあるか？そのような状況は 機密性 の違反となる。

並行処理（並列または分散コンピューティング）やイベント駆動型の実行モデルの利用が増えることで、さらなるプログラムの動作に関する問題が生じる:

データの競合が生じるか？マルチスレッドプログラミングにおける問題の多くは、2つのスレッドが共有された資源を必要な同期をとることなく扱うことによる。
プログラムあるいはその一部がデッドロックを起こすか？これはマルチスレッドプログラミングにおいて同期のためロックを用いる場合にしばしば懸念となる。

Analysis for program development Modern IDEs perform various kinds of program analysis to support debugging, refactoring, and program understanding. This involves questions, such as:

Which functions may possibly be called on line 117, or conversely, where can function \(f\) possibly be called from? Function inlining and other refactorings rely on such information.
At which program points could \(x\) be assigned its current value? Can the value of variable \(x\) affect the value of variable \(y\)? Such questions often arise when programmers are trying to understand large codebases and during debugging when investigating why a certain bug appears.
What types of values can variable \(x\) have? This kind of question often arises with programming languages where type annotations are optional or entirely absent, for example OCaml, JavaScript, or Python.

1.2. おおよその回答

Regarding correctness, programmers routinely use testing to gain confidence that their programs work as intended, but as famously stated by Dijkstra [Dij70]: “Program testing can be used to show the presence of bugs, but never to show their absence.” Ideally we want guarantees about what our programs may do for all possible inputs, and we want these guarantees to be provided automatically, that is, by programs. A program analyzer is such a program that takes other programs as input and decides whether or not they have a certain property.

Reasoning about the behavior of programs can be extremely difficult, even for small programs. As an example, does the following program code terminate on every integer input n (assuming arbitrary-precision integers)?

while (n > 1) {
  if (n % 2 == 0) // if n is even, divide it by two
    n = n / 2;
  else // if n is odd, multiply by three and add one
    n = 3 * n + 1;
}

In 1937, Collatz conjectured that the answer is “yes”. As of 2017, the conjecture has been checked for all inputs up to 87·2⁶⁰ but nobody has been able to prove it for all inputs [Roo19]. Even straight-line programs can be difficult to reason about. Does the following program output true for some integer inputs?

x = input; y = input; z = input;
output x*x*x + y*y*y + z*z*z == 42;

This was an open problem since 1954 until 2019 when the answer was found after over a million hours of computing [BS19].

Rice’s theorem [Ric53] is a general result from 1953 which informally states that all interesting questions about the input/output behavior of programs (written in Turing-complete programming languages 1 ) are undecidable. This is easily seen for any special case. Assume for example the existence of an analyzer that decides if a variable in a program has a constant value in any execution. In other words, the analyzer is a program A that takes as input a program \(T\), one of \(T\)’s variables \(x\), and some value \(k\), and decides whether or not \(x\)’s value is always equal to \(k\) whenever \(T\) is executed.

1: From this point on, we only consider Turing complete languages.

We could then exploit this analyzer to also decide the halting problem by using as input the following program where \(TM(J)\) simulates the \(j\)’th Turing machine on empty input:

x = 17; if (TM(j)) x = 18;

Here \(x\) has a constant value \(17\) if and only if the \(j\)’th Turing machine does not halt on empty input. If the hypothetical constant-value analyzer \(A\) exists, then we have a decision procedure for the halting problem, which is known to be impossible [Tur37].

At first, this seems like a discouraging result, however, this theoretical result does not prevent approximative answers. While it is impossible to build an analysis that would correctly decide a property for any analyzed program, it is often possible to build analysis tools that give useful answers for most realistic programs. As the ideal analyzer does not exist, there is always room for building more precise approximations (which is colloquially called the full employment theorem for static program analysis designers).

Approximative answers may be useful for finding bugs in programs, which may be viewed as a weak form of program verification. As a case in point, consider programming with pointers in the C language. This is fraught with dangers such as null dereferences, dangling pointers, leaking memory, and unintended aliases. Ordinary compilers offer little protection from pointer errors. Consider the following small program which may perform every kind of error:

int main(int argc, char *argv[]) {
  if (argc == 42) {
    char *p,*q;
    p = NULL;
    printf("%s",p);
    q = (char *)malloc(100);
    p = q;
    free(q);
    *p = ’x’;
    free(p);
    p = (char *)malloc(100);
    p = (char *)malloc(100);
    q = p;
    strcat(p,q);
    assert(argc > 87);
  }
}

Standard compiler tools such as gcc -Wall detect no errors in this program. Finding the errors by testing might miss the errors (for this program, no errors are encountered unless we happen to have a test case that runs the program with exactly 42 arguments). However, if we had even approximative answers to questions about null values, pointer targets, and branch conditions then many of the above errors could be caught statically, without actually running the program.

Exercise 1.1

Describe all the pointer-related errors in the above program.

Ideally, the approximations we use are conservative (or safe), meaning that all errors lean to the same side, which is determined by our intended application. As an example, approximating the memory usage of programs is conservative if the estimates are never lower than what is actually possible when the programs are executed. Conservative approximations are closely related to the concept of soundness of program analyzers. We say that a program analyzer is sound if it never gives incorrect results (but it may answer maybe). Thus, the notion of soundness depends on the intended application of the analysis output, which may cause some confusion. For example, a verification tool is typically called sound if it never misses any errors of the kinds it has been designed to detect, but it is allowed to produce spurious warnings (also called false positives), whereas an automated testing tool is called sound if all reported errors are genuine, but it may miss errors.

Program analyses that are used for optimizations typically require soundness. If given false information, the optimization may change the semantics of the program. Conversely, if given trivial information, then the optimization fails to do anything.

Consider again the problem of determining if a variable has a constant value. If our intended application is to perform constant propagation optimization, then the analysis may only answer yes if the variable really is a constant and must answer maybe if the variable may or may not be a constant. The trivial solution is of course to answer maybe all the time, so we are facing the engineering challenge of answering yes as often as possible while obtaining a reasonable analysis performance.

In the following chapters we focus on techniques for computing approximations that are conservative with respect to the semantics of the programming language. The theory of semantics-based abstract interpretation presented in Chapter 11 provides a solid mathematical framework for reasoning about analysis soundness and precision. Although soundness is a laudable goal in analysis design, modern analyzers for real programming languages often cut corners by sacrificing soundness to obtain better precision and performance, for example when modeling reflection in Java [LSS⁺15].

1.3 Undecidability of Program Correctness

(This section requires familiarity with the concept of universal Turing machines; it is not a prerequisite for the following chapters.)

The reduction from the halting problem presented above shows that some static analysis problems are undecidable. However, halting is often the least of the concerns programmers have about whether their programs work correctly. For example, if we wish to ensure that the programs we write cannot crash with null pointer errors, we may be willing to assume that the programs do not also have problems with infinite loops.

Using a diagonalization argument we can show a very strong result: It is impossible to build a static program analysis that can decide whether a given program may fail when executed. Moreover, this result holds even if the analysis is only required to work for programs that halt on all inputs. In other words, the halting problem is not the only obstacle; approximation is inevitably necessary.

If we model programs as deterministic Turing machines, program failure can be modeled using a special fail state 2. That is, on a given input, a Turing machine will eventually halt in its accept state (intuitively returning “yes”), in its reject state (intuitively returning “no”), in its fail state (meaning that the correctness condition has been violated), or the machine diverges (i.e., never halts). A Turing machine is correct if its fail state is unreachable.

We can show the undecidability result using an elegant proof by contradiction. Assume \(P\) is a program that can decide whether or not any given total Turing machine is correct. (If the input to \(P\) is not a total Turing machine, \(P\)’s output is unspecified – we only require it to correctly analyze Turing machines that always halt.) Let us say that \(P\) halts in its accept state if and only if the given Turing machine is correct, and it halts in the reject state otherwise. Our goal is to show that \(P\) cannot exist.

2: Technically, we here restrict ourselves to safety properties; liveness properties can be addressed similarly using other models of computability.

If \(P\) exists, then we can also build another Turing machine, let us call it \(M\), that takes as input the encoding \(e(T)\) of a Turing machine \(T\) and then builds the encoding \(e(S_T)\) of yet another Turing machine \(S_T\) , which behaves as follows: \(S_T\) is essentially a universal Turing machine that is specialized to simulate \(T\) on input \(e(T)\). Let \(w\) denote the input to \(S_T\) . Now \(S_T\) is constructed such that it simulates \(T\) on input \(e(T)\) for at most \(|w|\) moves. If the simulation ends in \(T\)’s accept state, then \(S_T\) goes to its fail state. It is obviously possible to create \(S_T\) in such a way that this is the only way it can reach its fail state. If the simulation does not end in \(T\)’s accept state (that is, \(|w|\) moves have been made, or the simulation reaches \(T\)’s reject or fail state), then \(S_T\) goes to its accept state or its reject state (which one we choose does not matter). This completes the explanation of how \(S_T\) works relative to \(T\) and \(w\). Note that \(S_T\) never diverges, and it reaches its fail state if and only if T accepts input \(e(T)\) after at most \(|w|\) moves. After building \(e(S_T)\), \(M\) passes it to our hypothetical program analyzer \(P\). Assuming that \(P\) works as promised, it ends in accept if \(S_T\) is correct, in which case we also let \(M\) halt in its accept state, and in reject otherwise, in which case \(M\) similarly halts in its reject state.

We now ask: Does \(M\) accept input \(e(M)\)? That is, what happens if we run \(M\) with \(T\) = \(M\)? If \(M\) does accept input \(e(M)\), it must be the case that \(P\) accepts input \(e(S_T)\), which in turn means that \(e(S_T)\) is correct, so its fail state is unreachable. In other words, for any input w, no matter its length, \(S_T\) does not reach its fail state. This in turn means that \(T\) does not accept input \(e(T)\). However, we have \(T\) = \(M\), so this contradicts our assumption that \(M\) accepts input \(e(M)\). Conversely, if \(M\) rejects input \(e(M)\), then \(P\) rejects input \(e(S_T)\), so the fail state of \(S_T\) is reachable for some input \(v\). This means that there must exist some w such that the fail state of \(S_T\) is reached in \(|w|\) steps on input \(v\), so \(T\) must accept input \(e(T)\), and again we have a contradiction. By construction \(M\) halts in either accept or reject on any input, but neither is possible for input \(e(M)\). In conclusion, the ideal program correctness analyzer \(P\) cannot exist.

Exercise 1.2

In the above proof, the hypothetical program analyzer \(P\) is only required to correctly analyze programs that always halt. Show how the proof can be simplified if we want to prove the following weaker property: There exists no Turing machine \(P\) that can decide whether or not the fail state is reachable in a given Turing machine. (Note that the given Turing machine is now not assumed to be total.)