Bytecode Verifier

Bytecode Verifier: Checking Safety of Stack Usage, Types, Resources, and References.

The body of each function in a compiled module is verified separately while trusting the correctness of function signatures in the module. Checking that each function signature matches its definition is a separate responsibility. The body of a function is a sequence of bytecode instructions. This instruction sequence is checked in several phases described below.

CFG Construction

A control-flow graph is constructed by decomposing the instruction sequence into a collection of basic blocks. Each basic block contains a contiguous sequence of instructions; the set of all instructions is partitioned among the blocks. Each block ends with a branch or return instruction. The decomposition into blocks guarantees that branch targets land only at the beginning of some block. The decomposition also attempts to ensure that the generated blocks are maximal. However, the soundness of the analysis does not depend on maximality.

Stack Safety

The execution of a block happens in the context of an array of local variables and a stack. The parameters of the function are a prefix of the array of local variables. Passing arguments and return values across function calls is done via the stack. When a function starts executing, its arguments are already loaded into its parameters. Suppose the stack height is n when a function starts executing; then valid bytecode must enforce the invariant that when execution lands at the beginning of a basic block, the stack height is n. Furthermore, at a return instruction, the stack height must be n+k where k, s.t. k>=0 is the number of return values. The first phase of the analysis checks that this invariant is maintained by analyzing each block separately, calculating the effect of each instruction in the block on the stack height, checking that the height does not go below n, and that is left either at n or n+k (depending on the final instruction of the block and the return type of the function) at the end of the block.

Type Safety

The second phase of the analysis checks that each operation, primitive or defined function, is invoked with arguments of appropriate types. The operands of an operation are values located either in a local variable or on the stack. The types of local variables of a function are already provided in the bytecode. However, the types of stack values are inferred. This inference and the type checking of each operation can be done separately for each block. Since the stack height at the beginning of each block is n and does not go below n during the execution of the block, we only need to model the suffix of the stack starting at n for type checking the block instructions. We model this suffix using a stack of types on which types are pushed and popped as the instruction stream in a block is processed. The type stack and the statically-known types of local variables are enough to type check each instruction.

Resource Safety

Resources represent assets of the blockchain. As such, there are certain restrictions over these types that do not apply to normal values. Intuitively, resource values cannot be copied and must be used by the end of the transaction (this means moved to global storage or destroyed). Concretely, the following restrictions apply:

ยท CopyLoc and StLoc require the type of local is not of resource kind.

ยท WriteRef, Eq, and Neq require the type of the reference is not of resource kind.

ยท At the end of a function (when Ret is reached), no any local whose type is of resource kind must be empty, i.e., the value must have been moved out of the local.

As mentioned above, this last rule around Ret implies that the resource must have been either:

ยท Moved to global storage via MoveToSender.

ยท Destroyed via Unpack.

Both of MoveToSender and Unpack are internal to the module in which the resource is declared.

Last updated