Last active
April 18, 2018 16:18
-
-
Save YZLRYO/5bbe1ecdf690280cb87749a97e05e2f8 to your computer and use it in GitHub Desktop.
Notes for db impl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Feb 19, Week 7 | |
Review: | |
global depth = 2 | |
% 2^gd | |
hashkey, ptr | |
h(k) | ptr(address) address | ld | |
0 0 addree right 0 2 4 12 | |
1 1 1 2 17 | |
2 3 2 2 11 15 19 3 | |
3 2 3 2 2 10 14 | |
=> split when?? ld is equal gd | |
how many most in a page in hsh file?? 2^(gd-ld) | |
Note: still need to do sth like chaining/ linear probing, | |
(the page is a linked chain) | |
Why? | |
1. Repeated key is issue, in general, hashing does not like skew | |
Pros: | |
1. No downtime to increase table size | |
2. Just 1 I/O for point finds (if things are waiting) | |
Cons: | |
1. range queries are terrible | |
2. Need to pin the directory , | |
eg. 1M pages | |
best case: 20B each Pg, 20MB | |
but the Pg size(including directory size) may increase | |
3. hashing is vulnerable to skew, nasty cases, zips law? 80 data to 20 buckets. ending in one disk full | |
4. linear hashing |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
B tree + space filling curve : B tree for use with MD data | |
# How ro handle keys composed of 2+ attributes? | |
A: 1. "Arbitrary" say one goes "first"...use second to break ties.... (Combined them) | |
Problem: can only do searches on first attribute, "scanning the second is not efficient" | |
2. Use a so-called "spatial index" to organize data | |
Designed to handle point(range) in any subset of key attributes | |
# What is a space filling curve? | |
A: Fracted curve that hits every point on a plane ( or highest space) | |
# What curves are commonly used in DB system? | |
A: Z-ORDER curve | |
* ------ * * -* *--* | |
_____/ ,’,' ,' | |
/ *--* *--* | |
*--------* | |
make small z, then large z with small z | |
Hibert Curve | |
* *-----*-----* *-----* | |
| | ^ | | | |
*-----* | *-----* *-----* | |
connect between U | |
@ Note: Z-ORDER, Hilbert curves defined in > 2 dimensions | |
# So, how to use to store multized data? | |
A : 1. k = sfc(key) | |
2. store (k, record) pair in B-tree | |
* * 3*-----* *-----* | |
| | | | | |
*-----* 2*-----* *-----* | |
| | | |
* * 1* *-----* * | |
| | | | | | range | |
*-----* 0*-----* *-----* | |
0 1 2 3 | |
To perform a range find: | |
1) Break query into a set of 1-D ranges in curve | |
2) Query each range using B-tree | |
# When work well? | |
A: When query result in few finds on underlying B-tree | |
In other words, the curve should preserve "spatial locality" (close in high d space, then close in 1 d space) | |
# PRO: | |
1. Easy, just as B-tree, nice for point data | |
CON: | |
1. Does not handle volume data well | |
volume data: add a house?(an object thing, with diff attribute => volume) not good for spatial data | |
Diff volume, range: quite opposite | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Q3 Monday Btree, extendible hashing | |
Linear Hashing | |
top | |
------ | |
1 hash using gd+1 bits | |
2 cursor | |
3 | |
------ hashed using gd bits | |
4 | |
5 | |
6 hashed using gd+1 bits | |
7 | |
------- | |
# Do I need to handle onr-full pages? | |
A: YES need chaining, linear probing. | |
# How does this handle "hot spot"? | |
A: It doesn't. ( extended hash: Naturally handle hot spot, split hot spots | |
* PRO | |
1. hash with dynamic growth, no directory | |
* CON | |
1. hash not hash well, linear probing, bad for hot spots | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Q4 Friday | |
For volume data.. R-tree file organized for spatial data with volume | |
# Looks like a B-tree, but ranther (PE, ptr) pairs, have (MBR, ptr) pairs // MBR: Minimum bonding rectangle | |
# R-tree range queries | |
- recurse down the tree | |
- whenever MBR intersects Q(query range), recurse to child tree using coressponding ptr | |
# what makes a 'good' R-tree? | |
A: 1) all nodes quite full | |
2) we also want MBR with small perimeter (dont want overlap) | |
* 1) and 2) often in conflict | |
# Insert into an R-tree: | |
1) Start at root | |
2) while not at leaf: out of all leaf ranges | |
a) insert into childw smallest increase in perimeter to cover the object | |
b) extend MBR as needed | |
# How to handle splits? | |
PRO: | |
1. R-tree is the only way to nicely index volume data | |
CON: | |
1. unusable in >= 10 dims, (not to worry data high dims not frequently) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Indexing V.S Storing data in a primary file arganization | |
* Issue: can only organize data in an array | |
* Bat may query many ways | |
* How to handle? Indexing | |
- Indexing | |
* Rather than storing, store (key, ptr) pairs | |
* concepts & defs | |
1) sparse index | |
- only have (key, ptr) pair per data page in the index | |
key,ptr: (2,0) (19,1) (32,2) (49,3) | |
ptr: 0 1 2 3 | |
| 2 4 6 12 | | 19 20 27 30 | | 32 36 40 42 | | 49 57 75 80 | | |
Note: have seen these directory | |
2) Dense index | |
- (key,ptr) pair for every record in the data file | |
3) Primary index | |
- Index is used to organize data in pages | |
(Index is same as primary file organization) | |
4) Data file and index are independent | |
ex: data in B-tree where key is NetID, secondary index on these data is GPA | |
Notes: | |
1) All secondary indices are dense | |
2) Primary indices may be dense or sparse | |
3) Usually primary index is sparse, but a dense index saves going to a data page when key value is enough | |
4) Taken to file extreme, "dense index on everything" gives us a "colomn oriented" DB. | |
5) Limited to one primary index (without replication) | |
6) Can have unlimited secondary indices | |
pro: queries can be faster | |
con: update more expensive | |
7) Even if an index is available, question "should I use it?" is hard to answer | |
========================================================== | |
idea: | |
a bunch of data | |
| 2 4 6 12 | | 14 14 23 | | 48 78 98 | | |
0 1 2 | |
want find quickly | |
exxtra map for decondary indexing: | |
(2,0) (14,1) (48,2) => sparse | |
| 6,4,7 | | 2.3.8 | | 1 9 10 | not in order | |
(1,2) (2,1)... => dense | |
query faster with secondary index, update slow |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Query compilation & Optimization | |
Overview: | |
* Given an SQL query | |
* Assume an abstract Relational Algebra(RA) "machine", join, proj, selection... | |
* Assume Machine has multiple implementations of each RA operation | |
Goal: * Compile SQL into RA expression annotatetd with implementations for each RA op | |
# Are many ways to compile SQL into annnotated RA | |
* Which one to choose? | |
- The one with best "performance" (wall clock time, throughput: common, mem usage, comm of network traffic) | |
# Are really 2 sub-problems in optimization test | |
* logic opt'n : choose the Best RA | |
* physical oppt'n: choose the best annotation |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1) Compilation - build initial RA for SQL | |
Today: 1) | |
a) scan/ tiokenice the SQL | |
b) parse the SQL | |
c) translate into RA | |
2) logical opt - improve the RA | |
3) Physical opt annotate RA with best physical ops | |
Compilation - usually two steps: sa=canning and parsing | |
scan: break program into tokens or terminals in formal grammar | |
parse: reverse engineering production rules of formal grammars used to produce the sequence of tokens | |
Virtually all programs in modern PLs correspond to strings framed using a formal grammar - a CFG | |
* CFG: basically a machine that can be used to produce all strings in a partition language | |
* CFG is a set of rules of the form (context-free-grammar) | |
Left Hand Side = > Right HS | |
non-terminal a string of terminals & non-terminals | |
* terminal: symbol that appear in the program | |
* non-terminal: is an intermediate symbol, not expected in prog | |
EX: CFG for all binary strings with same # of 0's & 1's | |
start -> 0 start 1 | | |
1 start 1 | | |
start start | | |
epsilon(empty) | |
EX: all palindrome | |
start -> 0 start 0 | | |
1 start 1 | | |
0 | | |
1 | | |
epsilon(empty) | |
Issue: we typically dont write CFGs where termminals are individual characters | |
- CFG complicated & slow to parse | |
Instead, we use a regular expression to specify terminals | |
-ex FP numbees: (* regular expression of number) | |
[+-]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? | |
CFG is more stronger than RegularExpression | |
Issue: during compilation, we are not producing strings, we are reversing the production of the strings called "parsing" | |
- How parsing work? dont care in modern times, no one writes a parser, use a "compiler compiler" -ex YACC | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How to search graph? | |
1) Exhausive - works up to 5, 6 joins | |
2) Greedy.. walk graph, always tranverse to best adjacen node | |
What are various transitions? | |
(1) On selections | |
sigma_B pi_A always applies if A is simply a listof attrs | |
| | | |
pi_A => sigma_B if A computes non-identify functions over attrs of R, | |
| | where result is referenced by B, it wont apply | |
R R | |
pi_A sigma_B always when B only refers attrs, actually return by A | |
| | | |
sigma_B => pi_A | |
| | | |
R R | |
sigma_B xjoin_B2^B' | |
| / \ | |
xjoin_B' => sigma_B1 sigma_B3 | |
/ \ / \ | |
R S R S | |
assuming B is in CNF, is valid when: | |
B1: CNF with those clauses in B referring only to atts in R | |
B3: CNF with those clauses in B referring only to atts in S | |
B2: CNF with those clauses in B referring to atts from both R and S | |
2) On join | |
xjoin_B xjoin_B | |
/ \ / \ | |
R xjoin_B' => xjoin_B1 T | |
/ \ / \ | |
S T R S | |
applies assuming B1 in CNF including clauses refering only to attrs in R and S |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Parsing: task of analyzing a string to determine CFG rules used to produce that string | |
No one writes parsers by hand | |
* instead use compiler compiler | |
* classic tools lex/flex yacc/bison | |
compiler compiler: | |
* Write down a CFG | |
* Annotate CFG with code to exec when rule recognized | |
* then comp-comp generates a parser | |
I case of SQL: com-comp used to obtain: | |
1) expressions in SELECT clause | |
2) list of tables + aliases in FROM clause | |
3) the bool expr from where clause | |
Most modern compilers just use parser to extract a store data in C/Java/C++, data structs | |
What do I need to do with those data structs? | |
1) analysis & verification of program, (ex. semantic checking | |
2) emit target code ( RA in a DBcompiler) | |
In case of SQL, how do I build my output RA exp? | |
SELECT E1, E2 pi | |
FROM T1, T2 => E1, E2 | |
WHERE B | | |
sigma | |
B | |
| | |
x | |
/ \ | |
x T4 | |
..... | |
Parsing: How work? dont care | |
But need to know enough to know when a comp-comp will fail | |
- When will it fail? | |
1) Grammar is ambiguous, ex can go to multiple grammar | |
2) Grammar reuires too much lookahead | |
Logical Optimization | |
takes as input RA, convert to "better" RA | |
/ | |
add up the total # of bytes produced in implementing RA | |
/ | |
or simple add up the # of the tuples produced | |
One view of logical opt | |
* We have a large graph | |
* Each node in the graph is a diff RA expr ("plans") that is eui to original SQL | |
* Is an edge from plan A to plan B if there is an algebraic transform from A to B guaranteed not to alter semantics | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
xjoin_B(R, xjoin_B'(S, T)) => xjoin_B(xjoin_B'(S, T), R) | |
- we have covered pi, sigma, join the entire core of RA | |
- didn't cover antijoin, semi-join (appear when have correct subquary) or outerjoin, intersection, union | |
costing RA expressions | |
-always based on ability to estimate # of tuples returned from operation | |
ex: sigma_B2 <- guess # of tuples coming out here | |
| | |
xjoin B1 | |
/ \ | |
R S | |
2 methods: | |
1) sampling, run query over sample of input relations,,, => not used reason: 1) hard to maintain samples 2) slow | |
3) accuracy with small samples | |
2) statistical synopsis (counts, histograms, etc) | |
/ \-the | |
number of tuples in each relation | |
What we want: for each RA op to be able to take as input T(R), V(R,A1) V(R,A2) | |
1) selection | |
R' <- sigma_A=V (R) | |
T(R') = T(R) / V(R, A) | |
V(R', A') = 1 if A == A' | |
= min( T(R'), V(R,A') ) if A!=A' | |
R' <- sigma_A!=V (R) | |
T(R') = T(R) - T(R) / V(R, A) | |
V(R', A') = V(R, A) - 1 if A == A' | |
= min( T(R'), V(R,A') ) if A!=A' | |
R' <- sigma_A>V (R) | |
T(R') = T(R) / 3 | |
V(R', A') = V(R, A)/3 if A == A' | |
= min(.) if A!= A' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
R' <- sigma_B1 or B2 (R) | |
T(R') = T(sigma_B1(R)) + T(sigma_B2 (R)) , problem: double count interrsection | |
=> trans to T(R') = MIN( T(sigma_B1(R)) + T(sigma_B2 (R)), T(R) ) | |
V(R', A) = min(V(sigma_B1(R), A) + V(sigma_B2(R), A), V(R, A)) | |
R' <- sigma_B1 and B2 (R) | |
T(R') = T(sigma_B1(sigma_B2 (R))) , assuming independent | |
V(R', A) = V(sigma_B1(sigma_B2 (R)), A) | |
under correlation may over OR under estimate | |
R' <- sigma_!B (R) | |
T(R') = T(R)-T(sigma_B(R)) , assuming independent | |
V(R', A) = V(R, A) | |
2) Joins | |
R' <- R xjoin_R.A=S.A S (R join S on R.A = S.A) | |
1. sort on A for R and S | |
2. every bucket in left(R) has a match in right(S) | |
3. assume every bucket in relation with smaller # of join keys matches with a bucket in other relation | |
=> will hold in case of a foreign key join | |
given this, output size is | |
SUM ni * f(i) <- returns size of bucket matching bucket matching bucket i in other relation | |
ni in bucket size from smaller relation | |
to approximate with T(), V(), use | |
T(R') = T(R)/V(R,A) * T(S)/V(S,A) * min(V(R,A), U(S,A)) if A == A' | |
= min(T(R'), V(R,A')) if A' from R | |
= min(T(R'), V(S,A')) if A' from S and A!=A' | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How to handle more complex predicates? | |
=> Two reasonable way | |
1. Treat like: R' <- sigma_(R.B==S.B) (R xjoin_R.A==S.A (S)) | |
benefit: easy, if are have costing for single attribute joins, also selection | |
2. Treat like: R' <- R xjoin_((R.A concat R.B ) == (S.A concat S.B)) (S) | |
if chose 2, need to estimate V(R, R.A concat R.B), reasonable cost is MAX(T(R), V(R,R.A) concat V(R, R.B)) | |
Physical optimization | |
- annotating an RA expression with mplementation details. | |
What are we concerned about when choosing an implementation. | |
1) Implementation of each RA operation | |
2) Side effects of RA implementations, ex: output is sorted, or hashed to machine via some key, etc. | |
3) "Pipelining" - consume of output of op takes its data directly from data produced when ready... avoid trips through mem hierarchy | |
Implementing RA Ops | |
Let's cover in order from fast to slowww... | |
1) Selection with an index. | |
2) Selection without an index: just apply prediction to each tuple. | |
3) "Scan" Join: applicable to equi-joins where one input relation can fit into RAM | |
alg: R' <- R xjoin _R.A==S.A (S) -1- Hash S (smaller relations) on S.A - store results in RAM in hash table | |
-2- Scan R.. for each r in R, probe hash table to find match, send to output stream. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4) Scan based aggregation | |
- Hash table, one entry per group, (ex, SELECT SUM(att1) FROM TB GROUP BY att2) | |
- each hash table from finite set of attributes | |
- process a tuple | |
1. hashes to a group key | |
2. update entry in table | |
5) Union | |
- two types of union(UNION and UNION ALL, set(union) and bag based union) | |
- bag based is easy, scan R then scan S | |
- set based: (Assume Union hash can fit into RAMs) | |
1) scan R for each r in R | |
- hash r | |
- add to hashtable if not there | |
2) Scan same on S | |
3) Scan hash, and contents to output stream | |
6) Intersection and Difference | |
- | |
10) 2-pass, hash-based set ops (n, v, ~) | |
- If R is big, one-pass hash-based won't work | |
- Can use algorithm similar to hash join | |
11) Sort-merge join | |
first, simple "sort join" : | |
a) TPMMS over R (sort key is join key) | |
b) TPMMS over S | |
c) Merge | |
cost: 5(|R|+|S|) | |
but wasteful: find merge can be combined with TPMMS merge | |
- Sort-merge join idea: never materialize sorted R,S | |
- Instead, do merge directly after sort phase of TPMMS | |
=> cost 3(|R|+|S|) | |
Note: often R or S are already sorted, <- can avoid 2 passes on sorted relation | |
Why? may be sorted on disk(either could be sorted primary key) | |
- or - may be output from prior join |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12) sort-based duplicate removal - like TPMMS | |
in merge phase: cost: 3|R| | |
| 1 2 4 | | 6 7 9 | | 10 11 12 | | 2 4 5 | | 6 7 9 | | 13 14 15 | | |
Run 1 Run 2 | |
PQueue: 1 2 3 4 ... | |
keep polling while get the same value | |
key merge difference: | |
when remove low value, repeadly peek into front of PQ, discard duplicate | |
13) hash-based duplicate removal | |
cost: 3|R| | |
after hash parsor | |
r rrrrrr rrrrr r r | |
|---------------| |---------------| |---------------| |---------------| |---------------| |---------------| | |
0 1 2 3 4 | |
rrrrrr | |
|---------------| | |
in merge phase: do in RAM duplicate removal on each bucket | |
Compare sort + hash -based dup. removal | |
=> hash-based better | |
- O(1) time complexity per second to add to hash, Compare to O(lgn) for sort | |
- early duplicate removal, remove in add stage | |
=> sort-based | |
- output is sorted | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CCR - concurreny control recovery | |
Recovery from failure.. what types are there? | |
(1) Bad data: | |
- to handle (from prog. point of view): | |
* constraints | |
* triggers | |
(2) Media failures | |
- what tools do we have as system designers to handle? | |
* RAID and similar schemes | |
* backups (off-set in extreme case) | |
(3) System s user initiated failures, handled via transactions | |
Transaction in SQL: BEGIN TRANSACTION; code; COMMOT or RELLBACK | |
Transaction provides "all or nothing" semantics "atomicity" | |
- if commit, all actions commit | |
- if fail/rollback nothing happened | |
WHy is atomicity hard? | |
- failures can happen any time | |
- some changes may be on disk, some may in buffer | |
- typical mechanism to recover from system a user init failures: | |
LOGGING! | |
- idea: have a log used to store info about update... enough info to recover from failure | |
What is a "log"? | |
- A sequence of special records.... | |
- Writes to log are append only | |
- No updates, deletionvia "truncations" | |
- What type of records are written to a log? | |
depends on type of log in, but all logging schemes uses: <START T> <COMMIT T> <ABORT T> | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
LOG: | |
|rrrrrrrrrrrrrrrrrrr||head tail||rrrrrrrrrrrrrrrrrrr| | |
We need some basic DB operations | |
INPUT(X) - loads Z into buffer | |
READ(X,t) - loads X from buffer into variable t | |
WRITE(X,v) - put v into X, store in buffer | |
OUTPUT(X) - write contents of X to disk from buffer | |
What are we writing/reading? That is what is X? | |
* Relations | |
* pages | |
* records | |
* atts | |
(1) Undo logging | |
2 key rules | |
- If WRITE(X, t) by Transaction T, then <T,X,v> must be to disk before OUTPUT(X)... v is old value of X | |
- If T commits, then <COMMIT t> to disk AFTER OUTPUT(X) | |
Say we fail... recall <START T> is by record when transactions begins | |
1) look at log... find all trans. with START, no COMMIT | |
2) for each such trans, use <T,X,v> recs to restore value v for X... going from head of log back to tail | |
Issue: com we have | |
<START T1> | |
<START T2> | |
<T1, X, v1> | |
<T2, X, v2> | |
<COMMIT T2> ... crash... | |
=> T1 not commited, rolled back cause of this but then T2 undone as well | |
How to truncate log in undo logging? | |
Easy way to "checkpoint": | |
1) Stop new transactions | |
2) wait for all trans to commit/abort | |
3) Set tail = head | |
Problem: stops new transactions for a (long?) time | |
better solution: "Nonquiescent" checkpointing | |
1) Write <CHECKPT T1, T2.....Tk> where T1...TK are all active trans | |
2) Wait until T1..Tkk all commit or abort | |
3) Write <END HECKPT> tolog | |
Rocovery: | |
1) Locate last <CHECKPT> <END CHECKPT> pair....run recovery alg only on log after that <CHECKPT> | |
Implies : can truncate the log at <CHECKPT> record after we write the <END CHECKPT> record | |
tail ST1 ST2 CHECKPT ST3 CT1 CT2 END ST4 CHECKPT crash | |
|-------|--------------------------|----------------------|--------|--------|----------|----------|---------|--------|------------ | |
- Why is undo logging not really used? | |
Undo rules: | |
* forces flush of written DB objects before commit | |
* affects utility of buffering | |
1) If WRITE(X,t) by T, then <T, X, v> to log before OUTPUT(X) | |
2) If T commits, <COMMIT T> to log after all OUTPUT(X) | |
- Redo logging | |
-Idea: log stores info to redo commited trans | |
- Rules: | |
1) Write <START T> before T begins: helps us checkpointing | |
2) Before OUTPUT(X), corresponding <T, X, v> (v is new value) must be in log and <COMMIT T> on disk | |
3) <COMMIT T> must follow after all corresponding <T, X, v> in log | |
- Why does this work? | |
1) transaction will write a bunch of stuff | |
2) Before written data to disk, will have redo info in log | |
3) Also, know that if I successfully commit, all redo info in log | |
Recovery: | |
1) Identify all of the commits transaction in the log | |
2) Scan log from begin, for each <T,X,v> (T commited) write v to X | |
st T1 st T2 end T1 CHEKPT st T3 | |
--------|--------------|---------|----------------|--------------|---------------------------------------- | |
<-| | |
ok write T1 if: all T1's values have been OUTPUT() | |
ok write T2 if: | |
could be log entries for T1 and T2 | |
at point that we have OUTPUT(X) for all trans that had data in buffer at CHEKPT | |
write <END CHECKPT> and truncate log, OR the TRANs that wrote aborts | |
Recovery in checkpointed log: | |
- Scan from back to front... if hit <END CHECKPT> just run recovery from corresponding check point | |
- If hit <CHECKPT T1, T2....> then: | |
all trans with data in buffer at start of CHCKPT | |
T1, T2 have began, run recovery from there | |
Drawback of redo logging: | |
- Still messes with BM... must join written pages until COMMIT | |
Bad cause can require more RAM than available | |
- Also affects LRU performance | |
Undo-redo logging | |
Rule | |
- Before OUTPUT(X), after WRITE(X, v) due to T, muat have <T, X, oldval, v> on disk | |
Recovery: | |
1) From start of log, redo all commited trans | |
2) From end of log, undo all trans with no COMMIT, ABORT | |
checkpoint in redo logging: | |
1) Write <CHECKPT T1, T2...> where T1, T2..are all trans with data in buffer | |
2) Wait until T1, T2.. have no more data in buffer not just commited, all of T1, T2 have commited or aborted, write <END CHECKPT> | |
3) Write <END CHECKPT> | |
Concurrency Control | |
- Facilitates the view no one is in DB concurrently | |
- The gold standard for isolation is serializability (SR) | |
- SR: a schedule is SR if state of DB after arunning the schedule is the same as if had run some serial aftersome of running in schedule | |
* schedule : a list of reads and writes | |
* serial: no overlap among transactions | |
-issue: SR really hard to reason about | |
-So in practice we strive for CSR (C = "conflict") | |
-CSR: We sau two DB ops conflict if they involve the same data object and one is a write | |
Then a schedule is CSR if can transform to a serial schedule by flipping order of non-conflicting ops. | |
-Want CSR, how do we get it? | |
-Classic mechanism: locking | |
*When using locking, must lock any object before read/write | |
*locks are granted as folllows S X | |
S Y N | |
X N N | |
-Classic locking alg to guarantee CSR in 2PL | |
*Each trans must obtain all of its locks before it begins releasing any | |
*In practice, means a trans has a growing phase and a shrinking phase | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment