1 Introduction

1.1 static analysis

考虑多条路径的safe

Static analysis = Abstraction + Over-approximation

Abstraction
抽象指的是具体程序中需要关注的东西

Over-approximation – Transfer Functions
根据语义操作

Over-approximation – Control Flows

控制流图

T: 数据流汇集点

1.2 Qs

What are the differences between static analysisand (dynamic) testing?
- Timing
  Static analysis is performed during the development phase, while the code is being written, reviewed, and edited.
  Dynamic testing, on the other hand, is conducted after the code has been compiled and executed.
- Methodology
  Static analysis uses automated tools to analyze the source code, looking for potential issues such as coding errors, security vulnerabilities, and code smells.
  Dynamic testing involves running the software and testing its behavior under different conditions to find bugs, errors, and other issues.
- Coverage
  Static analysis can detect potential issues in code that may not be executed during dynamic testing, such as unreachable code or unused variables.
  Dynamic testing is better suited for detecting issues that only manifest under certain conditions, such as race conditions or memory leaks.
- Scope
  Static analysis can analyze the entire codebase.
  Dynamic testing can only test the code that is actually executed during testing.
Understand soundness, completeness, false negatives,and false positives.
- soundness
  不会遗漏软件中的任何实际错误或问题
- completeness
  报告所有可能有问题的问题，无论它们是否是真正的错误。
- false negatives
  误报
- false positives
  漏报
Why soundness is usually required by static analysis?
- 稳健性在安全关键系统中尤为重要，例如用于航空航天、汽车、医疗和其他行业的系统。在这些领域，软件缺陷的后果可能很严重，甚至危及生命，因此在部署软件之前识别并消除任何潜在风险至关重要。
How to understand abstraction and over-approximation?
- 看上面

2 Intermediate Representation

2.1 Compilers and Static Analyzers

Compiler过程

词法分析，结合正则表达式
语法分析 –> 抽象语法树，结合上下文无关文法
语义分析（比如苹果吃人就不符合） –> 装饰过的AST，结合属性文法
转换器 –> IR （Intermediate Representation，中间表示形式），将 decorated AST 翻译为生成三地址码这样的中间表示形式，基于 IR 做静态分析（例如代码优化这样的工作）
代码生成器

两者的关系相当于经历上述步骤后转化为一个容易理解的东西 –> 静态分析

为什么不直接拿 source code 做静态分析？

这是因为我们得先确保这是一份合格的代码，然后再进行分析。
分析代码合不合格，这是 trivial 的事情，由前面的各种分析器去做就行了，我们要做的是 non-trivial 的事情。

2.2 AST vs. IR

其实我没看懂这个

为什么在静态分析的时候，使用 IR 而非 AST 呢？

AST 是 high-level 且接近语法结构的，而 IR 是 low-level 且接近机器代码的。
AST 是依赖于语言的，IR 通常是独立于语言的：三地址码会被分析器重点关注，因为可以将各种前端语言统一翻译成同一种 IR 再加以优化。
AST 适合快速类型检查，IR 的结构更加紧凑和统一：在 AST 中包含了很多非终结符所占用的结点（body, assign 等），而 IR 中不会需要到这些信息。
AST 缺少控制流信息，IR 包含了控制流信息：AST 中只是有结点表明了这是一个 do-while 结构，但是无法看出控制流信息；而 IR 中的 goto 等信息可以轻易看出控制流。

2.3 IR: Three-Address Code (3AC)

3地址码：没有统一的格式。在每个指令的右边至多有一个操作符。
每一条3AC最多包含3个地址（名称，常量，临时变量）。

举个例子：

2.4 3AC in Real Static Analyzer: Soot

Soot：java静态分析里面的工具

例子：

StringBuilder用来建立foo()函数内字符串，下面这个3地址是仅仅foo函数的

2.5 Static Single Assignment (SSA)

所谓静态单赋值（SSA），就是让每次对变量x赋值都重新使用一个新的变量xi，并在后续使用中选择最新的变量。

多个变量备选的情况可以使用SSA：

为什么不用 SSA 呢？

SSA 会引入过多的变量和 phi 函数
在转换成机器代码时会引入低效率的问题

2.6 Basic Blocks(BB) & Control Flow Graphs (CFG)

Basic Blocks(BB) ：基本块(Bb)是连续三地址指令的最大序列，其性质如下：

只有一个出口
只有一个入口

CFG：

从3AC转化到CFG的步骤：

输入：程序 P 的一系列 3AC

输出：程序 P 的基本块

方法

决定 P 的 leaders
- P 的第一条指令就是一个 leader
- 跳转的目标指令是一个 leader
- 跳转指令的后一条指令，也是一个 leader
- 为什么（3）这里要分出来？因为如果3加在1和2里面，那么这个块就有两个入口：1和11
构建 P 的基本块
- 一个基本块就是一个 leader 及其后续直到下一个 leader 前的所有指令。
构建完成的基本块如下：
构建边
- 块 A 和块 B 之间有一条边，当且仅当：
  - A 的末尾有一条指向了 B 开头的跳转指令。
  - A 的末尾紧接着 B 的开头，且 A 的末尾不是一条无条件跳转指令。
除了构建好的基本块，还会额外添加两个结点，「入口（Entry）」和「出口（Exit）」
- 这两个结点不对应任何 IR
- 入口有一条边指向 IR 中的第一条指令
- 如果一个基本块的最后一条指令会让程序离开这段 IR，那么这个基本块就会有一条边指向出口
这样就完成了一个控制流图的构建

软件分析

1 Introduction

1.1 static analysis

1.2 Qs

2 Intermediate Representation