Available under at Creative Commons CC-BY-NC-ND license and readable at Mastering Ethereum
-
-
Save aquaflamingo/9dec538df64cc64bb4944cba78be76b8 to your computer and use it in GitHub Desktop.
Ethereum is:
- unbounded (turing complete)
- determinisic
- a state machine
- globally accessible singleton Smart contracts are computer programs that provide:
- highly availability
- auditability
- transparency
- neutrality
- reduced counter party risk
- censorship resitance
- P2P network that propegates transactions and blocks
- Messages that represent state transitions
- Consensus rules governing valitiy of state transitions
- A state machine processing the transactions
- Chain of cryptgrapphically secure blocks
- Consensus algorithm with decentralized control
- Game theoretic incentive scheme to maintain the ledger
- Multiple open source client implementations
- Ethereum is general purpose it tracks the changes in code and data overtime
- Ethereum runs a virtual machine to execute code called the Ethereum Virtual Machine
- The state data for etheruem is tored in a serialized hashed data structure called a Merkle Patricia Tree
- Ethereum uses a metered resource called gas to avoid infinite loops
- Each instruction has a cost in gas
- transactions specify the max gas limit which if the EVM reaches
- Tokens can represent:
- Currency
- Resource Claims
- Equity
- Voting
- Access
- Collectibles
- Identity
- Attestation (Signatures)
- Utility (Payment)
- Can be fungible (non-unique) or non-fungible (unique)
- Some tokens are native to the blockchain and therefore have the lowest counter party risk
- e.g. Purely blockchain based tokens, opposed to tokens which bridge chain to real world
- One critica role of token is converting extrinsic assets into intrinisc (native) assets
- ETH transactions are handled at the protocol level whereas Tokens are processed at the contract level
- Utility Tokens: Use of token is required to access a resource
- Equity Tokens: Represents a control or ownership of something
- A fungible token standard
- Is a common interface for tokens
- (functions):
totalSupply
,balanceOf
,transfer
,transferFrom
,approve
,allowance
- (events):
Transfer
,Approve
- (functions):
- A token contract is essentially two internal mappings:
- Balances: `address=>int)
- Allowances:
address=>(address=>int))
// used for delegation
- Developers should be mindful in programming to accept ERC20 tokens, because if the contract does not have ERC20 functionality the tokens will be lost forever -- approx ~$2.5 USD lost this way so far
- Nuances:
- Unlike ETH, there are no receipient addresses
- The token contract manages all the mappings and state of accounts
- Tokens are send via the
TokenContract.transfer
, notsend
- Sending tokens requires ETH
- Unlike ETH, there are no receipient addresses
- Another standard to prevent sending tokens to ERC20 incompatible contracts
- A check to determine whether the destination address is a contract or not
- e.g. does it have a
tokenFallback
function
- e.g. does it have a
- A check to determine whether the destination address is a contract or not
- Support for an extended ERC20 interface, with support for ERC820 token registraries
- Includes
- various "hooks" for receiving notifications
- proxy support
- Metadata
- "Deed" contracts representing ownership of a unique thing
- Mapping has an "owner" rather than a balance
- Optional:
MetaDataInterface
(indexable interface)
- Standards exists to promote interopeability
- Additional extensions include:
- Burning
- Minting
- White / Black listing
- Ownership
- Crowd funding
- Caps
- Recovery
- Oracles Provide external data source to smart contracts
- Oracles are needed becasue the EVM execution must be deterministic
- There is no randomness interally
- External data can only be included in the
data
payload
- Oracles "act as a bridge" to extend the real world or chain world
- Data uses include:
- Randomness
- Exchange Rates
- Political Events
- Geolocations
- Flights Stats
- Functionally:
- Collecting data from an offchain source
- Transfering data on chain
- Making data available by placing in storage for contracts to access
- Immediate Request: a one time query for a value (e.g. is over 21?)
- Pub-Sub: An RSS feed type that pulls data from offchain
- Request Response:
- Data set is too large to be stored on chain
- Can world via requesting the oracle and observing result as an EVM state change
There are a few ways to prove authenticity of an oracle's data
- Authenticity Proofs: Cryptographic guarantees that data has not been tampered with
- Oracalize's TLSNotary Proof (proof of HTTPs communication between client and server
- Oracles that perform infeasible on chain computation
- e.g. Oracalize deploys a user configred docker container to AWS which performs computation and writes values to
stdout
to return to the requesting dapp- Centralized by auditable
- Cryptlets are a standard
- Web3 encompasses and attempts to decentralize: messaging, storage and naming
- Core benefits of DApps
- Resiliency
- Transparency
- Censorship Resistance
- Business Logic = Smart contracts
- Messaging = Whisper
- Storage = IPFS/Swarm
- Domain Name System
- Operates nodes creating by the namehash algorithm
- EIP137
- Only nodes can set their names and subdomains
- Nodes are traversed by recursively hashing and resolving the higher domain:
keccack(keccack(0x00 + keccack("eth")+keccack("example"))
- Currently the root TLD (which is required to resolve anything is owned by a 4-7 multisig
- Resolver contracts are user-controlled and set-able by domain owners to resolve metadata an information about domain
- Names are distributed through a vickory auction (Sealed bids)
- Winner pays second highest big
- In ENS:
- User's lock funds
- Must submit two transactions (submit bid + commit/reveal)
- Reveal bid or risk fund loss
- Domain names lock an amount of ETH as a commitment for one year
- Resolver contract addresses are returned after ENS hashes the node (if it exists)
- There is a default resolve if none is supplied which can point to wallets or swarn addressess
- EVM is a "quasi" turning complete state machine
- Stack Based
- 256 bit words
- ROM for code storage
- Has no scheduling capacity (single threaded)
- Role of EVM is to update state by valid state transitions
- Ethereum World State is a mapping of 160bit address to accounts
- Accounts represent:
- Balance (ETH)
- Nonce
- (Contracts) Storage
- (Contracts) Code
- When a contract is executed a "sandboxed" EVM is instantiated and performs the computation
- In the event of a reversion, the EVM is destructed and no state change occurs to the global state
- If the code complete, the global state machine copies the state change occuring from the sandboxed EVM
- When creating a new contract the code in the
data
field is used to instantiate an EVM instance with the data loaded into program ROM
- When a transaction is sent to an ABI compatible contract
- Dispatcher is called to read the data field
- The first 4 bytes are extracted
- The stack machine then evaluates each function signature (4 bytes of keccak hash) searching for prototype
- jumps to that execution if it exists otherwise exits with
STOP
- Ethereum gees are calculated based on computation
- Gas PRICE: is exchange rate in units of gas per ETH
- Gas COST: is the total amount of units consumer for the operation
- Deleting stored values is encouraged and therefore gas is refunded for doing so
- The Block gas limit is the maximum number of gas that can be consumer in a single block
- ~ 8 million or 380 send transactions (21k gas)
- Miners can vote to increase or decrease the gas limit based on network demands
- Proof of Work (POW): ETHash
- Hashimoto-Dagger algorithm
- Generates a larger DAG structure to make hashing memory hard (ASIC resistance)
- Proof of Stake (POS): Casper
- Small deposit of ETH is made to become a validator and propose blocks
- Lose stake if block is rejected
- Earn rewards proptional to stake if block accepted
- Ethereum has a native currency called ether, ETH, Ξ, (greek Xi)
- The smallest unit is Wei 1018 which is used everywhere interally
- gwei = gigawei = 109
- Ethereum is account based rather than UTXO based (Bitcoin)
- Two types of accounts in Ethereum:
- Externally Owned Accounts: Does not have contract code, has a private key
- Contract Accounts: Contain executable code, does not have a private key
- Contracts are owned by their logic (e.g. they can set whoever is the owner)
- They have no private key so they cannot initiate state transitions, but can react to them
- When a transaction is sent to a contract account, the EVM is run using the transactions *data payload.
- transaction objects specify function names in their data field that is used to run the contract
- msg object is one of the input accessible by a smart contract
- it is part of the transaction that triggers execution of the contract
- fallback function
- this function is executed when a transaction is made without specifying the function to be called
- it acts as the "fallback" and is used to accept ether deposits (as long as it's marked payable)
- Contract creation is registered by creating a transaction with destination address to the zero address. (0x0).
- When a contract sends some of it's balance to an EoA it is considered an "internal transaction" message (you can see the difference on Etherscan)
- Ethereum clients are interoprable so long as they follow the same reference specification (e.g. Yellow paper)
- Remote Clients: MyCrypto, MetaMask communicate with existing networks
- Light Clients: Simple Payment Verification (SPV), validate block headers and merkle proofs to verify a transaction is included in a block
- Fullnodes: Download and validate the entire blockchain
- The fullnode is set to prune in default mode and is therefore ~80-100GB
- A fuly archival node is 1TB
- Clients include: Parity (Rust), Geth (Go)
- DDue to a DoS attack @ block 2283397 fullnode syncing will stall until block 2700000
- Fast syncing is available via
--fast
- HTTP Service offered on port
8545
- Client libraries essentially stub and construct RPC calls in JSON format under the hood
- RPC Calls include
jsonrpc: version (usually 2.0)
method: method to call
params: arguments to pass along
id: used to correlate calls (usually for batching calls)
- Ethereum uses digital signatures to authorize the movements of funds - the "crypto" in cryptocurrencies
- Fundamental property of cryptography: easy to go one way, hard to compute the inverse
- This enables digital signatures and secrets
- Trap door functions: crypto functions that are hard to invert unless one has a secret piece of information
- Discrete logrithm problem: is a difficult problem and is needed to undo elliptic curve multiplication in Elliptic Curve Cryptography (ECC)
- Results a good property which is elliptic curve multiplication is easy but opposite is hard
-
The search space of Ethereum private keys is
1 - 2<sup>256></sup>
(256 bits or 64 hexadecimal digits) -
To create a
privKey
we feed a large random number (entropy) into thekeccak256
hashing function- Aside:
keccak256
preceededsha3
but the NIST did not finish standardizingsha3
during Ethereum's development, which is whykeccak256
is used
- Aside:
-
Creating a Public key:
K (pubKey) = k (privKey) * G (generator point constant)
Note *
: denotes elliptic curve multiplication not arithmatic, it is only one way
The reverse action is to find the discrete logirithm
-
Ethereum uses the
secp256k1
curve (generator pointG
)- Note that
G
is constant in Ethereum therefore the samepubKey
is always generated
- Note that
-
ECC Addition:
-
ECC Multiplication:
- Arithmatically: d x P = P1 + P2 + P3 ... + Pd-1 + Pd (e.g. addition d times)
-
The Public Key (
K
) and Private Key (k
) are just points on the curve: e.g. K = (01234567,09876542) -
Ethereum only uses uncompressed keys (0x04 prefix)
- Addressess are unique identifiers derived from public keys using the
keccak256
hasing function and then taking the last20 bytes
pubKey: (x,y) -> concatenated xy -> keccak256 -> last 20bytes
- By default addresses are not check sumed
- EIP55 introducing check sums
- Addressess are compatible with ICAP protocol (which is compatible with IBAN, bank name protocol)
- Wallets are a basic data structure to manage a user's keys
- ! Wallets only hold keys not tokens
- Just a bunch of keys (JBOK)s in a store
- geth generates a
keystore.json
file: a JSON payload of the private key- uses a password stretching function ("KDF", Key derivation function)
- repeatedly hashes to stretch thereby preventing brute force/ rainbow table attacks
- (e.g need to hash
n
times for each password attempt)
- Keys are related from a common seed.
- Use a key derivation algorithm to derive, the most common being BIP32/44
- HD Wallet (BIP32)
- A tree datastructure where parents derive child keys using an key derivation algorith,
- Allows an organized structure for keys
- Allows child public keys to be generated from children (creating watch only addresses from xpub keys)
- A tree datastructure where parents derive child keys using an key derivation algorith,
- BIP39: Mnemonics
- Mnemonic words are used to generate a seed
- A word represents a random number used to construct a seed (entropy mapped to 11 bit map to words)
Ex. (not accurate)
00000000000 army
00000000001 abandon
- Entropy of mnemonic (128-256 bits) is stretched to 512 bits via PBKDF2 using arguments
mneonmic
andsalt
- Seed is completed after 2048 rounds of hashing with HMA-SHA512 to prevent brute forcing
- the
salt
is an optional passphrase which if incorrect (frmo the original) leads to a different set of keys- Good for a second factor security
- An "under durress" walet
- BIP43
- A standard to include a "purpose" as the first child hardened key for the derivation paths
- Allowing orginzation and specific types of HD wallet trees
- e.g.
m/43'
is a different tree structure from anm/44'
- BIP44
- Extension pf BIP43"
m / purpose' / coin' / account' / is_change / address_index
m
= master private keyM
= master public key- Ethereum is
m/44'/60'/0/0*/index
(* Ethereum is always receiving) - The structure let's us create extended public keys too, allowing downstream derivation of child keys
Parent Pub -
\
Child Priv --> Child public
- Great for Cold Storage
- Transaction are signed messages originating from an Externally Owned Account (EoA)
- They are the only way to mutate state of Ethereum
- Structure
- Nonce: A sequence number by the original EoA (for replay protection)
- Gas Price: The price in (ETH) willing to pay for gas
- Gas Limit: The maximum gas to buy for this transaction
- Recipient: The destination of the transaction
- Value: The value contained
- Data: A bytecode payload
- v, r , s: ECDSA Signatures used to derive the public key
- Transaction are serialized with Reverse Length Prefix (RLP) format, which has no delimiters or labels.
Nonce is a dynamically calculated value of the number of confirmed transactions from the originating EoA
- Allows proper transaction ordering → Nonce too high is stuck in mempool ignored
- Prevents replay attacks on transactions
- You can fetch the transaction count via
getTransactionCount
- Beware of nonce gaps:
- e.g. Two transactions Txnonce 0 & Txnonce 2.
- Txnonce 2 is stuck in mempool until node receives the preceding Txnonce 1.
gas is a separate virtual currency that is NOT Ether that is paid for computation in fees to miners
- Gas has it's own exchange rate per Ether
- It allows separation from the volatility of ether value
- Gas Limit: is the gas unit order size
- Gas Price: Wei per gas unit
- Simple payment transactions require 21,000 gas
- Contract executions are variable and not preset
- A good analogy:
- A credit account for the gas station with payment only sent on completion
There are three basic types of transactions:
- Value only (Payment)
- Data only (Invocation)
- Value & Data (Payment and Invocation)
value
to EoA: Updates the state of destination address with value sentvalue
to Contract:- Calls function in
data
payload - Fallback function if no function specified in
data
payload - Increase the contract balance if no fallback function is defined
- Calls function in
data
to EoA: Usually not interpreteddata
to Contract: calls function specified indata
payload.data
transactions are subject to consensus rules
- The
data
payload of a transaction is a hex serialization of:- Function selection → first 4 bytes of
keccack256
hash of function prototype [e.g.withdraw(uint amount)
] - Arguments → hex encoded to Application Binary Interface (ABI) spec e.g.
- Function selection → first 4 bytes of
|_ _ _ _ _ _ _ _|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ ... _ _|
8 bytes. 32 bytes variable
1. Padding 2.
- Contract creation is a special kind of transaction at destination address
0x0
(zero address)- The transaction contains the compiled bytecode as
data
- The transaction contains the compiled bytecode as
Digital signatures perform three functions:
- proves owner of private keys (and as a result he autorization to spend or execute functions)
- Proof of authorization is undeniable
- Proof that data has not / cannot be modified
-
The signature in Ethereum is:
- the
keccak256
hash of RLP encodeddata
for the transaction
Sig = Fsig(Fkeccack256(m),k) = (r,s) where Fsig is the signature algorithm Fkeccack256 is keccack256 hashing function
m
is the message to hash (transaction)k
is the private keyr
is the resultingx
coordinate of the ephemeral private key created during the signature process - the
-
To verify signatures one needs:
- The public key that "signed" the message
- The serialized transaction to verify
- The signature parameters
(r,s)
Algorithm::create_ethereum_signed_tx
1. Construct a transaction, tx, with:
data
gasLimit
gasPrice
nonce
to
value
chainID
0,0
2. Serialize the transaction in RLP format:
rlp_encoded_tx = rlp_encode(tx)
3. Take the keccack256 hash of the payload in (2):
hash = keccack256(rlp_encoded_tx)
4. Sign the hash with an EoA private key
sig = sign(hash, privKey)
5. Append the v, r, and s values to tx
(The tx is now signed)
-
EIP155: Replay Protection
- The parameters
chainID
,0
,0
included in the original unsignedtx
preventschainID
tampering and attestation as to the network which the transaction was broadcast and intended
- The parameters
-
Public Key Recovery (
v
) Parameter- When we compute the recovery of a signers public key, the curve is symmetric, so we can compute two possible keys
R
, andR'
- So avoid this duplicate work, the signing algorithm returns
v
where:- if
v
is odd, useR'
- else use
R
- if
- When we compute the recovery of a signers public key, the curve is symmetric, so we can compute two possible keys
- A *smart contract was originally defined by Nick Szabo to be
A set of promises specified in digital form in which the parties perform on eachother's promises
- In Ethereum:
Immutable deterministic computer programs that run in a limited context of the Ethereum Virtual Machine (EVM) on a world computer (e.g. propegated to a global state)
- Unlike EoAs, contracts do not have private keys and can only run when invoked by a transaction.
- Contracts cannot run in parallel since Ethereum s a single threaded state machine
- Transactions are atomic with state changes recorded only if execution is successful (e.g. there is no error)
- Errors trigger state rollbacks reverting all state changes in value except for gas fees
- Contracts cannot be modfied but can be deleted via op code:
SELFDESTRUCT
- The original program must have this functionality programmed in however.
- The EVM runs EVM Bytecode
- Coding in ethereum should prefer functional (declarative) programming over imparative (procedural)
- Solidity is procedural and the de facto language today
- Application Binary Interface: defined the contract for how data structures and function are accessed by machine code
- Defines the functions in a contract that can be invoked
- Compiler directive:
pragma 0.4.24
tells what the acceptable compiler is^0.4.24
indicates that any minor version is accesible but no major versions (e.g.0.5
is disallowed)
-
msg.sender
: Originating caller -
msg.data
: data payload -
msg.value
: Ether sent in Wei -
msg.sig
: first 4 bytes of data -
msg.gasLeft
: gas remaining -
tx.gasPrice
: Gas price for the transaction -
tx.origin
: Originating EoA -
block.blockhash
: Originating caller -
block.coinbase
: Coinbase address for fees and rewards -
block.difficulity
: Current PoW of block -
block.gasLimit
: Max gas for all transactions to fit in block -
block.number
: Current block height -
block.timestamp
: Unix epoch timestamp in seconds -
address.balance
: Gets balance of address in wei -
address.transfer
: Attempts to transfer value passed, throws error if fails -
address.call
: Low level CALL, arbitrary msg withdata
payload -
address.delegatecall
: Like call code but with msg context (libraries) -
address.send
: Attempts to send value passed in, returns false if failed -
address.callcode
: CALLCODE function replacing this addresses' contract code
public
: Callable from any EoA or contractprivate
: Not collable from derived contractsexternal
: Only callable with explicitthis
internal
: Only callable from derived contracts (protected
)
constant
/view
: Promises not to write (modify) state but can readpayable
: Function can accept incoming paymentspure
: Function cannot read or write to state (declarative)
- Created via
construtor
keyword - Destruction with
SELFDESTRUCT
- Modifiers place constraints on the function execution
- They substitude code before execution of a function body:
onlyOwner {
require(msg.sender == owner)
_; // This placeholder is replaced with code by owner
}
- Errors revert all state changes
require
is used as a gateway conditionassert
/throw
is used to halt execution and state changedtransfer
will automatically throw if not enough ETH is present
- A transaction receipt is given when a tx completes
- these contain
logs
which can be watched
- these contain
- Can create contracts via
new
operator which returns an object - Specify value of the contract on creation:
faucet = (new Faucet).value(0.5 ether)
- You can cast an address as a contract
- `faucet = Faucet(_f)
- Can directly
call
methods viacall
- Uses direct opcodes of EVM
_faucet.call("withdraw", 0.1 eth)
- Risk of re-entrancy attack
delegatecall
keeps themsg
context constant so thatmsg.sender
is the same - for librariescall
in comparison modifies themsg.sender
(e.g. different execution context)
- Vyper takes one step closer to declaritive programming than solidity
- Types of Problematic Contracts Vyper tries to address
- Suicidal Contracts: Contracts can at times be arbitrarily destroyed
- Greegy Contracts: Contracts that can get into "unreleasable" or "unusable" states
- Prodigal Contracts: Contracts that can release funds to arbitrary addressess
- Vyper removes modifiers as it changes the context of code execution
- Uses inline confirmation and assertion checks instead
- Also forces state modifications to be explicit in the contracts
- Vyper has no inheritance
- Vyper has no inline assembly
- Vyper allows explicit type casting via
convert()
which calls through a conversion table but no implicit casting - Vyper outputs LLL (Low Level Lisp Language) code to be compiled into bytecode for the EVM
- Defensive programming
- Use well tested code libraries
- Prefer minimalistic and simple implementations
- There is a high quality bar due to immutability
- Write in a readable and auditable way
- Focus on extensive test coverage
- Sending Ether to an unknown address.
- a
payable
fallback function from a malicious contract address can execute a vulnerable execution path causing it to "re-enter" the contract
- Use the Check-Effect_Interaction pattern
- Use Mutex's to lock state
- Prefer the
transfer
function oversend
- a fixed size variable is used to store data but the value is outside the range causing wrap arounds for numbers and unexpected behaviour
- Underflow: Value is under the storage range
uint my_var = 0
my_var - 1 // = 255
- Overflow: Value is over the storage range
- Using
SafeMath
SELFDESTRUCT
opcode forcibly sends ethere to a contract (e.g. malicious contract destruction sends ether to target)- Contract relies on "in-variant" values like
this.balance
- Self defined variable to track balance rather than relying on
this.balance
- Context perserving
delegatecall
is misused -- particularly to exploit how Ethereum manages storage with slots.- Contracts store data in
slot
, e.g. library's first variable isslot[0]
- Contracts store data in
- Malicious contract sets it's address as a receiving address or library address for a critical point of business logic.
- Because
delegatecall
perserves context, slot values can be overwritten
- Because
- Using
library
keyword explicitly for contracts - Avoiding state bearing contracts, and non self-destructing
- Performing floating point arithematic with integers causing issues like under or overflows, or lack of precision
- Allow large numerators in fractions
- Be mindful of order of operations
- Convert to higher precision, do the math, then convert down
- Visibility is not specified resulting in default
public
and accesible functions
- Explicity setting modifiers in code
- Contracts require some sort of randomness to function and mistakeningly use psuedo random values like block hashes or numbers
- There is no uncertainty in Ethereum since execution is deterministic
- External sources of randomness (like oracles)
- Solidity contracts are casted to an address but the address might not contain the intended functions implemented correctly
- Malicious contracts could implement the same interface and inject themselves through
constructor
- Initializing contracts with
new
at deployment time - Hardcoding contract addressess
- Third party apps do not validate input to the contracts
- EVM fills short addresses with zeros which can multiply the trailing value, e.g.:
|<-- address -->|<-- value -->|
|<-- addr -->|<-- value -->000| // zero's filled in resulting in much larger value
- Validating input parameters
- Developer does not revert execution based on failed
send
callsend
returnsfalse
if it fails and does not revert, unliketransfer
- Prefer
transfer
function where possible - Use the
withdraw
pattern
- User's must call the
withdraw
function that handles sending, and reversions on failure
- A malicious user duplicate a transaction payload with a higher
gasPrice
to "front-run" existing transactions- Think exchanges, lottery guessing games, etc
- Be wary of both malicious users and miners
- Add explicit upper bounds to
gasLimit
- Implement "commit-reveal schemes"
- Send a hidden data value and only reveal once included in a block
- Use "submarine sends" via
CREATE2
opcode
- A contract loops through a large mapping or array, or a malicious user intentionally creates a huge array (can sometimes exceed block's
gasLimit
) - Keys are lost to priviledged account (e.g. EoA of
owner
account is lost)
- Provide "pull" rather than "push" operations to withdraw funds (e.g.
withdraw
vs.distribute
) - Add lock times or multi sig so funds are not lost forever
- Malicious miners change the timestamp to exploit a contract depending on it
- Use block numbers instead of timestamps for entropy
- Old
constructor
left from Solidity0.4.22
which use the contract name instead of keyword are publically accessible and invokable functions
- Using
0.4.22
Solidity compiler which uses theconstructor
keyword
- Solidity defaults local variables to
storage
(opposed tomemory
). So an object referencesslot[0]
and can overwrite values
- Solidity lint for pointers
- Explicit
memory
use