-
-
Save paulcoding810/f6dc9d061aebc3f1a465dc508f3367ec to your computer and use it in GitHub Desktop.
Smalli Cheat-Sheet
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A little help in Smali | |
(To be supplemented) | |
# | |
general information | |
# | |
Smali | |
Types | |
Dalvik bytecode has two main type classes, primitive types and reference types. Reference types are objects and arrays, everything else is primitive. | |
Primitives are represented by a single letter. | |
V - Void - can only be used for return types | |
Z - Boolean (logical) | |
B - Byte (byte) | |
S - Short | |
C - Char | |
I - Integer | |
J - Long (64 bits) | |
F - Float (floating) | |
D - Double (64 bits) | |
Objects take the form Lpackage / name / ObjectName; - where the leading "L" indicates that this is the type of object, package / name / is the package that contains the object, ObjectName is the name of the object, and ";" marks the end of the object name. This will be equivalent to package.name.ObjectName in java. Or for a more specific example, Ljava / lang / String; equivalent to java.lang.String | |
Arrays take the form [I - this will be an array of integers with one dimension. those. int [] in Java. For multidimensional arrays, you simply add more "[" characters. [[I = int [] [], [[[I = int [] [] [] etc.) (Note: the maximum number of measurements you can have is 255). | |
You can also have arrays of objects, [Ljava / lang / String; there will be an array of strings. | |
Methods | |
Methods are always specified in very detailed form, which includes the type that contains the method, the name of the method, the types of the parameters, and the return type. All this information is needed so that the virtual machine can find the correct method and be able to perform static analysis on bytecode. | |
They take the form Lpackage / name / ObjectName; -> MethodName (III) Z | |
In this example, you must recognize Lpackage / name / ObjectName; as a type. MethodName is the name of the method. (III) Z is the signature of the method. III are the parameters (in this case, 3 integers), and Z is the return type (bool). | |
Method parameters are listed one after the other, with no separators between them. | |
Here's a more complex example: | |
Lpackage / name / ObjectName; -> MethodName (I [[IILjava / lang / String; [Ljava / lang / Object;) Ljava / lang / String; | |
In Java, this would be | |
String MethodName (int, int [] [], int, String, Object []) | |
Fields | |
Fields are also always specified in a verbose form that includes the type containing the field, the field name, and the field type. Again, this allows the VM to find the correct field and also perform static analysis on the bytecode. | |
They take the form Lpackage / name / ObjectName; -> FieldName: Ljava / lang / String; | |
It should be pretty obvious - it's the package and object name, field name and field type respectively | |
# | |
Registers | |
Introduction | |
In the dalvik bytecode, registers are always 32 bits and can contain any type of value. 2 registers are used to store 64-bit types (long - Long and double - Double). | |
Specifying the number of registers in a method | |
There are two ways to specify how many registers are available in a method. the .registers directive specifies the total number of registers in the method, while the alternative .locals directive specifies the number of registers without parameters in the method. The total number of registers will also include, however many registers are needed to store method parameters. | |
How method parameters are passed to a method | |
When the method is called, the parameters of the method are placed in the last n registers. If the method has 2 arguments and 5 registers (v0-v4), the arguments will be placed in the last 2 registers - v3 and v4. | |
The first parameter for non-static methods is always the object on which the method is called (this object) | |
For example, let's say you are writing a non-static method LMyObject; -> callMe (II) V. This method has 2 integer parameters, but also has an implicit LMyObject; parameter before both integer parameters, so there are only 3 arguments for the method. | |
Suppose you specified that there are 5 registers in method (v0-v4), either with the .registers directive 5 or with the .locals directive 2 (i.e. 2 local registers + 3 parameter registers). When the method is called, the object to which the method is executed (i.e. this reference) will be in v2, the first integer parameter will be in v3, and the second integer parameter will be in v4. | |
For static methods, they are the same, except that this argument is implicit. | |
Register names | |
There are two naming schemes for registers - the usual naming scheme v # and the p # naming scheme for parameter registers. The first register in the p # naming scheme is the first register of parameters in the method. So, let's go back to the previous example of a method with 3 arguments and 5 full registers. The following table shows the common name v # for each register followed by the name p # for parameter registers | |
v0 First local register | |
v1 Second local register | |
v2 p0 First parameter register | |
v3 p1 Second parameter register | |
v4 p2 Third parameter register | |
You can refer to parameter registers by name - it doesn't matter. | |
Parameter Registers Insertions | |
p # naming scheme was introduced as a practice question | |
Let's say you have an existing method with multiple parameters and you add some code to that method and you find that you need extra case. You think: "It's okay, I'll just increase the number of registers specified in the .registers directive!" | |
Unfortunately, it is not that easy. Be aware that method parameters are stored in the last registers in the method. If you increase the number of registers, you change which registers enter the method arguments. Therefore, you will have to change the .registers directive and renumber each parameter register. | |
But if the p # naming scheme was used to refer to parameter registers throughout the method, you can easily change the number of registers in the method without worrying about re-numbering any existing registers. | |
Long / Double values | |
As mentioned earlier, long and double primitives (J and D respectively) have 64-bit values and require 2 registers. This is important to keep in mind when referring to method arguments. For example, suppose you have a (non-static) method LMyObject; -> MyMethod (IJZ) V. The method parameters are LMyObject;, int, long, bool. Thus, all of its parameters will require 5 registers. | |
p0 this | |
p1 I | |
p2, p3 J | |
p4 Z | |
Also, when you call the method later, you need to specify both registers for any double-expanded arguments in the register list for an invoke statement. | |
# | |
array (arrays) | |
array-length vA, vB | |
A: Destination register (4 bits) | |
B: Array of reference-bearing register (4 bits) | |
Stores the length (number of entries) of the specified vB array to vA | |
fill-array-data vA +,: target | |
A: Registering a pair containing an array reference | |
B: Target label defining the array data table | |
Populates the specified array vA + with the specified data in the target. The link must be in an array of primitives and the data table must match it in type and size. The array width is defined in the table. | |
The register pairs are occupied by vX and vX + 1. for example v1, v2. | |
Example data table: | |
: target | |
.array-data 0x2 | |
0x01 0x02 | |
0x03 0x04 | |
.end array-data | |
new-array vA +, vB, Lclass; -> type | |
A: Destination register (8 bits) | |
B: Size register | |
C: Type reference | |
Creates a new array of the specified type and size. The type must be an array type. | |
filled-new-array {vA [vB, v .., vX]}, Lclass; -> type | |
vA-vX: Argument Registers (4 bits each) | |
B: Type reference | |
Creates a new array of the specified type and size. The type must be an array type. A reference to the newly generated array can be obtained with the move-result-object command, immediately after the fill-new-array command. | |
filled-new-array / range {vA .. vX}, Lclass; -> type | |
vA .. vX: Range of registers containing array parameters (4 bits each) | |
B: Type reference (16 bit) | |
Creates a new array of the specified type. The type must be an array type. A reference to the newly created array can be obtained with the move-result-object command, immediately after the fill-new-array / range command. | |
# | |
array accessors | |
Legend: | |
A (aget): Destination register | |
A (aput): Source register | |
B: Array reference | |
C: Index in the array | |
aget vA, vB, vC | |
Retrieves the integer value at index vC from the array referenced by vB and stores it in vA | |
aput vA, vB, vC | |
Stores the integer value from vA in the array referenced by vB at the index of vC | |
There are also other aget / aput, adding an ending changes the value type. For example: aget-objec (Gets an object). | |
-boolean | |
-byte | |
-char | |
-object | |
-short | |
-wide | |
# | |
comparison | |
Legend: | |
A: Destination register | |
B: First source register | |
C: Second source register | |
B +: First pair of source registers (pair) | |
C +: Second pair of source registers (pair) | |
cmp-long vA, vB +, vC + | |
Compares long values in original registers, keeping 0; | |
If vB + == vC + then preserves 1; | |
If vB + <vC + or vB +> vC + then retains -1. | |
cmpg-double vA, vB +, vC + | |
Compares double values in original registers, keeping 0; | |
If vB + == vC + then preserves 1; | |
If vB + <vC + or vB +> vC + then retains -1. | |
If vB + or vC + is not a number, 1 is returned. | |
cmpg-float vA, vB, vC | |
Compares float values in source registers, keeping 0; | |
If vB == vC then preserves 1; | |
If vB <vC or vB> vC then retains -1. | |
If vB or vC is not a number, 1 is returned. | |
cmpl-double vA, vB +, vC + | |
Compares double values in original registers, keeping 0; | |
If vB + == vC + then preserves 1; | |
If vB + <vC + or vB +> vC + then retains -1. | |
If either vB + or vC + is not a number, -1 is returned. | |
cmpl-float vA, vB, vC | |
Performs the specified float comparison, storing 0; | |
If vB == vC then preserves 1; | |
If vB <vC or vB> vC then retains -1. | |
If vB or vC is not a number, -1 is returned. | |
# | |
const | |
const vAA, # + BBBBBBBB | |
A: Destination register (8 bits) | |
B: 32-bit signed constant integer | |
Move the specified constant integer value to the specified vAA register. | |
const / 16 vAA, # + BBBB | |
A: Destination register (8 bits) | |
B: Integer (16 bit) | |
Pushes # + BBBB into vAA register | |
const / 4 vA, # + B | |
A: Destination register (4 bits) | |
B: Integer (4 bits) | |
Places the specified 4-bit integer constant in the destination register vA. | |
const / high16 vAA, # + BBBB | |
A: Destination register (8 bits) | |
B: Integer (16 bits) | |
Places a 16-bit constant in the uppermost bits of the vAA register. Used to initialize float values. | |
const-class vAA, Lclass | |
A: Destination register (8 bits) | |
class: Class reference | |
Will move the reference to the class specified in the vAA destination register. In the case where the specified type is primitive, this will store a reference to a special class of the primitive type. | |
const-string vAA, "BBBB" | |
A: Destination register (8 bits) | |
B: String value | |
Move the reference to the string specified in the vAA destination register | |
const-string / jumbo vAA, "BBBBBBBB" | |
A: Destination register (8 bits) | |
B: String value | |
Move the reference to the string specified in the vAA destination register | |
jumbo - indicates that the value will be "large" | |
const-wide / 16 vA +, # + BBBB | |
# While empty | |
const-wide / high16 vA +, # + BBBB | |
# While empty | |
const-wide vA +, # + BBBBBBBBBBBBBBBB | |
# While empty | |
# | |
goto | |
goto - Unconditional jump to: target. | |
goto: target | |
goto / 16: target # 16bit | |
goto / 32: target # 32bit | |
Note: goto literally uses +/- offsets from the current command. APKTool converts them to labels for readability. If within the code, a 16-bit value is required for an offset, goto / 16 should be used, or for a 32-bit value, goto / 32 should be used. It's almost impossible to tell if goto / 16 or goto / 32 is required when adding a new command (unless you know for sure). If you don't know exactly which bit, goto / 16 can replace any goto, and goto / 32 can replace any goto / 16 or goto. | |
Only the replacement cannot be made for a turn: goto cannot replace goto / 16, and it, in turn, cannot replace goto / 32. | |
# | |
if | |
Legend: | |
A: First register to check (integer) | |
B: Second register to check (integer) | |
target: Target label | |
Note:! = Not equal | |
if-eq vA, vB,: target | |
Execution jumps to: target if vA == vB | |
if-eqz vA,: target | |
: target if vA == 0 | |
if-ge vA, vB,: target | |
: target if vA> = vB | |
if-gez vA,: target | |
: target if vA> = 0 | |
if-gt vA, vB,: target | |
: target if vA> vB | |
if-gtz vA,: target | |
: target if vA> 0 | |
if-le vA, vB,: target | |
: target if vA <= vB | |
if-lez vA,: target | |
: target if vA <= 0 | |
if-lt vA, vB,: target | |
: target if vA <vB | |
if-ltz vA,: target | |
: target if vA <0 | |
if-ne vA, vB,: target | |
: target if vA! = vB | |
if-nez vA,: target | |
: target if vA! = 0 | |
# | |
invoke | |
Legend: | |
vA-vX: Arguments passed to the method | |
class: The name of the class containing the method | |
method: The name of the method to call | |
R: Return type. | |
invoke-direct {vA, v .., vX}, Lclass; -> method () R | |
Calls a non-static direct method (that is, an instance method that by its nature is not overridden, namely either a private instance method or a constructor). | |
invoke-interface {vA, v .., vX}, Lclass; -> method () R | |
Calls an interface method (that is, an object whose specific class is unknown using a method that refers to an interface). | |
invoke-static {vA, v .., vX}, Lclass; -> method () R | |
Calls a static method (which is always considered a direct method). | |
invoke-super {vA, v .., vX}, Lclass; -> method () R | |
Calls the virtual method of the immediate parent class. | |
invoke-virtual {vA, v .., vX}, Lclass; -> method () R | |
Calls a virtual method (a method that is not static or final, and is not a constructor). | |
Note: | |
If the method returns (R is not "V" for Void), it must be committed to the next line by one of the move-result statements, or it will be lost. | |
You can also not list all the vA-vX arguments, but make the Range of arguments by adding the / range ending. For example: invoke-direct / range {vA .. vX}, Lclass; -> method () R And this can be done with any of the above invoke. | |
invoke-direct {v1, v2, v3} is the same as invoke-direct / range {v1 .. v3} | |
invoke-direct {v0} is the same as invoke-direct / range {v0 .. v0} | |
It often leads to errors using invoke-virtual {vX} instead of invoke-virtual / range {vX .. vX} in methods with a large number of local registers (v1, v2, v22) | |
# | |
misc / misc | |
check-cast vAA, Lclass | |
A: Reference register (8 bits) | |
B: Type reference (16 bits) | |
Checks if an object reference in vAA can be passed to an instance of the type referenced by class. | |
Throws a ClassCastException; if this is not possible, execution continues otherwise. | |
instance-of vA, vB, Lclass | |
A: Destination register (4 bits) | |
B: Reference register (4 bits) | |
C: Class reference (16 bits) | |
# No description yet | |
new-instance vAA, Lclass | |
A: Destination register (8 bits) | |
B: Type reference | |
Creates a class object of type and places a reference to the newly created instance in vAA. | |
The type must be of the non-array class. | |
nop | |
Empty command / No operation | |
throw vAA | |
A: Exception-bearing register (8 bits) | |
Throws the specified exception. The exception object reference is in vAA. | |
# | |
move | |
Legend: | |
A: Destination register (4, 8, 16 bits) | |
B: Original register (4, 16 bits) | |
#A: x bits. B: x bits is not part of the code. Added only to denote bits in registers | |
move vA, vB #A: 4 bits. B: 4 bits | |
Moves the contents of one non-object register to another. | |
move / 16 vAAAA, vBBBB #A: 16 bits. B: 16 bits | |
Does the same as move. Source register and destination register only 16 bits | |
move / from16 vAA, vBBBB #A: 8 bits. B: 16 bits | |
Does the same as move / 16. Destination register only 8 bits | |
move-exception vAA #A: 8 bits | |
Saves the just caught exception to vAA. This must be the first statement of any exception handler whose exception should not be ignored, and this statement can only ever occur as the first statement of an exception handler. PS: nowhere without tautology) | |
move-object vA, vB #A: 4 bits. B: 4 bits | |
Moves the contents of one register object to another. | |
move-object / 16 vAAAA, vBBBB #A: 16 bits. B: 16 bits | |
Does the same as move-object. Source register and destination register only 16 bits | |
move-object / from16 vAA, vBBBB #A: 8 bits. B: 16 bits | |
Does the same as move-object / from16. Destination register only 8 bits | |
move-result vAA #A: 8 bits. | |
Wraps the result of a single word non-object from the most recent invoke type to vAA. This should be done as a statement immediately after the invoke type, the result of which (one-word, not an object) should not be ignored. | |
move-result-object vAA #A: 8 bits. | |
Transfers the object result from the last invoke to vAA. This should be executed as a statement immediately following an invoke type or fill-new-array, whose (object) result should not be ignored. | |
move-result-wide vA + #A: 8 bits. | |
# While empty | |
move-wide vA +, vB + #A: 4 bits. B: 16 bits | |
# While empty | |
move-wide / 16 vA +, vB + #A: 16 bits. B: 16 bits | |
# While empty | |
move-wide / from16 vA +, vBBBB #A: 8 bits. B: 16 bits | |
# While empty | |
# | |
operations | |
ADD operator - adds values on either side of the operator | |
# | |
add-double vA +, vB +, vC + | |
A: Pair of destination registers (8 bits) | |
B: Source register pair 1 (8 bits) | |
C: Source register pair 2 (8 bits) | |
Calculates vB + + vC + and stores the result in vA + | |
add-double / 2addr vA +, vB + | |
A: Source register 1 / destination register pair (8 bits) | |
B: Source register pair 2 (8 bits) | |
Calculates vA + vB and store the result in vA + | |
add-float vA, vB, vC | |
A: Destination register (4 bits) | |
B: Source register 1 (4 bits) | |
C: Source register 2 (4 bits) | |
Calculates vB + vC and stores the result in vA | |
add-float / 2addr vA, vB | |
A: source register 1 / destination register (4 bits) | |
B: source register 2 (4 bits) | |
Calculates vA + vB and stores the result in vA | |
add-int vA, vB, vC | |
A: destination register (4 bits) | |
B: source register 1 (4 bits) | |
C: source register 2 (4 bits) | |
Calculates vB + vC and stores the result in vA | |
add-int / lit8 vA, vB, 0xC | |
A: destination register (8 bits) | |
B: source register (8 bits) | |
C: signed constant value constant (8 bits) | |
Calculates vB + 0xC and stores the result in vA | |
add-int / lit16 vA, vB, 0xC | |
A: destination register (4 bits) | |
B: source register (4 bits) | |
C: signed constant value constant (16 bit) | |
Calculates vB + 0xC and stores the result in vA | |
add-int / 2addr vA, vB | |
A: source register 1 / destination register (4 bits) | |
B: source register 2 (4 bits) | |
Calculates vA + vB and stores the result in vA | |
AND Operator - A binary operator copies a bit into the result if it exists in both operands. | |
# | |
# While empty | |
DIV Operator - Divides the left operand by the right operand | |
# | |
# While empty | |
MUL operator - multiplies values on either side of the operator | |
# | |
# While empty | |
OR Operator - Copies a bit if it exists in any of the operands. | |
# | |
# While empty | |
REM operator - divides the left operand by the right operand and returns the remainder | |
# | |
# While empty | |
SHL Operator - The value of the left operands is moved left by the number of bits specified by the right operand. | |
# | |
# While empty | |
SHR operator - the value of the right operands is moved to the right by the number of bits specified by the left operand. | |
# | |
# While empty | |
SUB - operator subtracts the left operand from the right operand | |
# | |
# While empty | |
USHR operator - # no description | |
# | |
# While empty | |
XOR Operator - Copies a bit if it is set in one operand, but not in both. | |
# | |
# While empty | |
# | |
return | |
The return statement is used to make an explicit return from a method. That is, it again transfers control to the object that called this method. The return statement instructs the interpreter to stop executing the current method. If the method returns a value, the return statement is followed by some expression. The value of this expression becomes the return value of the method. | |
return vAA | |
A: Return value register (8 bits) | |
Returns from the return method of a non-object with the value vAA. | |
return-object vAA | |
A: Return value register (8 bits) | |
Returning from the object-returning method using the object-reference in vAA. | |
return-void | |
Returning from a void method with no value. | |
return-wide vA + | |
A: Pair of return value registers (8 bits) | |
Returns a double / long (64-bit) value in vA +. | |
# | |
switch | |
Legend: | |
A: The register that is being checked | |
target: Target label of packed-switch table (switches) | |
packed-switch vAA,: target | |
Implements a switch statement where case constants are sequential. The instruction (code execution script) uses the index table. vAA pointers to this table to find the instruction offset for a specific case. If vAA drops out of the index table, execution continues with the next command (default case). pack-switch is used when the possible vAA values are consistent regardless of the lowest value. | |
Example of a table with radio buttons: | |
: target | |
.packed-switch 0x1 # 0x1 = Lowest / Lowest vAA | |
: pswitch_0 # Jump to pswitch_0 if vAA == 0x1 | |
: pswitch_1 # Jump to pswitch_1 if vAA == 0x2 | |
.end packed-switch | |
sparse-switch vAA,: target | |
Implements a switch statement where case constants are not sequential. The statement uses a lookup table with case constants and offsets for each case constant. If there is no match in the table, execution continues with the next command (default case). | |
: target | |
.sparse-switch | |
0x3 ->: sswitch_1 # Will go to sswitch_1 if vAA == 0x3 | |
0x65 ->: sswitch_2 # Will go to sswitch_2 if vAA == 0x65 | |
.end sparse-switch |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment