5 CALL

约 951 个字预计阅读时间 5 分钟

Interpretation and Compilation¶

当效率不重要时，interpret high-level language; 如果需要提升吧表现，则 translate 为 lower-level language

Interpreter: 直接运行源代码；能给出更好的报错信息；更慢，但代码更少；平台无关，可以在任何机器上运行
translator(compiler): 将高级语言翻译为低级语言；可以在转换时加入额外的信息来帮助 debug，如 gcc -g；效率更高，表现更好

输入：高级语言 (foo.c)
输出：汇编语言 (foo.s for RISC-V)，可能包含伪指令（assembler 可以处理但不包含在机器中的指令）
中间步骤：
- Lexer: Turns the input into "tokens", recognizes problems with the tokens
- Parser: Turns the tokens into an "Abstract Syntax Tree", recognizes problems in the program structure
- Semantic Analysis and Optimization: Checks for semantic errors, may reorganize the code to make it better
- Code generation: Output the assembly code

输入：汇编代码 (foo.s)
输出：object file (foo.o)，包含
- 机器码（machine code）：CPU 直接执行的二进制指令
- 符号表（symbol table）：记录函数名（labels）、全局变量（.data 后的变量、可能在多个文件中访问的变量）等符号的地址信息
- 重定位表（relocation table）：标记需要链接器处理的地址（例如未解析的外部函数），以及 static section 中的 data（如在 la 指令中使用的）
- 可以参考 deepseek 的讲解
读入并利用 directives、替换 pseudo-instructions

Give directions to assembler, but not produce machine instructions

.text: Subsequent items put in user text segment (machine code)
.data: Subsequent items put in user data segment (binary rep of data in source file)
.global sym: declares sym global and can be referenced from other files
.string str: Store the string str in memory and null-terminate it
.word w1…wn: Store the n 32-bit quantities in successive memory words

其中 tail 可用于尾递归优化（在 CS61a 中有讲）

object file header: size and position of the other pieces of the object file
text segment: the machine code
data segment: binary representations of the static data in the source file
relocation information: identifies lines of the code the need to be fixed up later
symbol table: list of this file's labels and static data that can be referenced
debugging information

上述过程是 statically-linked 方式，

与之相对的是 dynamically linked libraries