LLVM stands for Low Level Virtual Machine is a compiler interface, that can represent and optimize some code like assembly.
To compile to LLVM we can use:
clang -S -emit-llvm {input} -o {output}.bc
It will produce an bytecode, just like assembly with registers, calls, and functions.
clang -c -emit-llvm {input} -o {output}.ll
Will produce an IR Now we can use one of llvm-tools called opt that will convert the llvm assembly file into another format, called dot.
opt -p dot-cfg {input} -disable-output
We can use some tool, like graphviz to see the dot file as a PDF, it will contain a graph to represent the program.
And finally we can convert the intermediate representation into machine code using another llvm-tool called llc , for that use:
llc {input} -march={architeture} -o {output}.{arch, ie: .x86}
Here “architecture” can be ARM, AArch64, x86 and so on, you can see the list of architectures that llc can compile to using:
llc --version
There are a lot of tool to optimize either the Intermediate Representation of code and the binary code generated. To enable that, we can use:
clang -Xclang -disable-O0-optnone -S -emit-llvm {input} -o {output}
It allows the program to be optimized. One of those optimizations is mem2reg which means “from memory to registers” using:
opt -S -p mem2reg {input} -o {output}.ll
With that variables static allocated will be moved to registers. At this point this registers can be called virtual registers since they aren't the physical registers as they own, but later on when we generate machine code they will be allocated at physical registers.
Now we can compile it to llvm bytecode using llvm-as using:
llvm-as {input} -o {output}.bc
and then translate this bytecode to assembly using:
llc {input} -o {output}.x86
and finally use the linux assembler as to assembly x86, for instance, to machine code:
$LINUX/as {input} -o {output}
You can visualize your program like a graph in PDF format using some tools of LLVM
clang -S -emite-llvm {input} -o {output}
opt -p dot-cfg {input}.ll
OPT will output a dot file. Now to generate a PDF of it use:
dot -Tpdf {input} -o {output}
In this graph, vertices are called Basic blocks, and it represents a set of instruction that always execute together. The Edges represent, possible paths in the program, like for loops and conditional statements.
To simplify this representation, you can use:
opt -p dot-cfg-only {input}
It will show only a header of each block.
We also can see regions in this graph use:
opt -p dot-regions-only {input}
It helps a lot to see optimizations in your code. To compute regions, you use an DS called “Dominator Tree”, we can see this domination tree using:
opt -p dot-dom {input}
and to simplify, you can use:
opt -p dot-dom-only {input}
There are a lot of ways to visualize it, to see all options, use:
opt --help | grep dot
With OPT you can also know a lot of information about your code like how many loops or how many pointers your code have, and even vulnerabilities in your code.
OPT isn't the unique program to visualize this graph, there are some alternatives like:
- graphviz
- neato
- twopi
- circo
- fdp
- sfdp
LLVM passes are a kind of transformation of some code, into an optimized version of it with the same semantics.
In analyses pass, we can optimize our code, dividing it into a couple of passes:
- Loop Pass – As the name implies, this will analyse the loops.
- Function Pass – Will run into individuals functions on the code.
- Module Pass – Will run on the hole modules of the code. Others:
- Call Graph SCC Pass
- Region Pass
- Machine Function Pass
-
Analysis Examples:
int total = 0; for(int i=0;i<100;++i) total += i; int f(int n){ int ans = 0; if(n) ans = 1; else ans = 2; return ans; }
- Range Analysis – Collect information about the range of values that a variable may assume. On the above code, the variable i only can be in the range [0;99].
- Scalar Evolution – How variable involve inside the loop. It's really useful because it can, for instance, replace a loop for the value that a variable will assume at the end of it.
- DominatTree – It is shows which part of the code will actually run inside the code.
-
Transformations Examples:
- Dead Code Eliminations – Permits to remove part of code that will never be executed.
int v; if(1 < 2) v = 0;else v = 1;
- Constant Propagation – Since we know that there is a constant it can be replaced in other part of codes and in compile time, to compute these terms. For example, in the code above, it will compile to: b = 30 and c = 150.
int a = 10; int b = a + 20; int c = b * 5;
- Loop Invariant Code Motion – We can move some part of code out of the loop if in all iterations it has the same value. For example, in the code above, int x = size * 2 have the same value in all interactions, so it will be moved right before the loop start.
int sum = 0; for(int i=0;i<size;++i){ int x = size * 2; int sum = i * size; }
CFG - Control Flow Graph