Skip to content

Instantly share code, notes, and snippets.

@Robert076
Last active June 4, 2025 04:36
Show Gist options
  • Save Robert076/f57d76e450779cc3c35ebe5b9309f466 to your computer and use it in GitHub Desktop.
Save Robert076/f57d76e450779cc3c35ebe5b9309f466 to your computer and use it in GitHub Desktop.
DBMS tips/notes for the exam

My personal notes regarding the upcoming DBMS exam

Hope this helps.


⚙️ Two-Phase locking protocol

This basically means that the transaction acquires as many locks as it needs, and releases them at the end. It will not acquire new locks after it releases the first one.

It contains two phases: "Growing phase" and "Shrinking phase". In the growing phase it acquires locks and releases them in the shrinking phase.


🚀 A solved exercise about indexes

The reduction factor for conditioning Age > 19, assuming the data is evenly distributed and there is an index I on age, can be estimated by:

a) 1 / NKeys(I) ❌

b) NKeys(I) ❌

c) (19 - ILow(I)) / (IHigh(I) - ILow(I)) ❌

d) (IHigh(I) - 19) / (ILow(I) - IHigh(I)) ❌

e) None of the answers are correct ✅

Let's first clarify some notations:

NKeys(I) => the number of distinct keys (unique values) on column I (Age in our case)
IHigh(I) => the highest value in column I
ILow(I) => the lowest value in column I

Why is e) the correct answer?

Well let's do some basic maths:

a) 1 / NKeys(I) translates to "the reduction factor is 1 divided by the number of total different ages in our table" => it would actually be the answer to Age = 19 and NOT Age > 19

b) NKeys(I) translates to "the reduction factor is the total different ages" Not only does it not make sense, but the reduction factor must be > 0 and < 1 so it cannot be.

c) (19 - ILow(I)) / (IHigh(I) - ILow(I)) translates to Age < 19 and NOT Age > 19. It's the opposite of what we want

d) (IHigh(I) - 19) / (ILow(I) - IHigh(I)). This would actually be correct but the denominator is reversed. It should be IHigh(I) - ILow(I) because it cannot be negative


🌎 Anomalies that can appear

  1. Dirty read *(when a transaction reads uncomitted data)
  2. Unrepeatable read *(when a transaction reads the same resource twice but with different results)
  3. Phantom read *(when reading the same resource twice gives extra rows the second time)
  4. Lost update *(two transactions update the same resource, but one update overrides the other)

🧩 Isolation levels

  1. Read uncommitted (best performing, lowest security. does not prevent any anomaly)
  2. Read committed (2nd best performing, prevents dirty reads only)
  3. Repeatable read (3rd best performing, prevents dirty reads and non repeatable reads)
  4. Serializable (4th best performing, most secure, prevents everything: dirty reads, unrepeatable reads, phantom reads, lost updates)

💻 Encoding (another example is in lecture 5)

Suppose we want to encode the phrase: "encode encode encode"

Using the secret key: "secret"

And the table of codes:

a b c d e f g h i j k l m n o p q r s t u v w x y z
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

To each letter in the initial phrase we assign the value in the table of codes above.

e n c o d e e n c o d e e n c o d e
05 14 03 15 04 05 05 14 03 15 04 05 05 14 03 15 04 05

Then we add the secret key under them, repeating it until the end:

e n c o d e e n c o d e e n c o d e
05 14 03 15 04 05 05 14 03 15 04 05 05 14 03 15 04 05
19 05 03 18 05 20 19 05 03 18 05 20 19 05 03 18 05 20

Note that secret is: 19 05 03 18 05 20 in our table of codes. It is a coincidence that it fits perfectly 3 times but if it doesn't just fill the bottom row up with as many letters from the secret key as you can.

Now compute the sum % 27 (we have 27 values in our table of codes):

e n c o d e e n c o d e e n c o d e
05 14 03 15 04 05 05 14 03 15 04 05 05 14 03 15 04 05
19 05 03 18 05 20 19 05 03 18 05 20 19 05 03 18 05 20
24 19 06 06 09 25 24 19 06 06 09 25 24 19 06 06 09 25

The bottom row is our encoded message. Translate it to letters using the table of codes.

To decode just go backwards.


😀 Joins

E, S - 2 relations. E is the outer relation. E has pE records per page, S has pS records per page. E has M pages, S has N pages.

1. Simple nested loops join:

Cost: M + pE * M * N

2. Page oriented nested loops join:

Cost: M + M * N

3. Block oriented nested loops join

If we have nbuffer_count buffer pages, then:

Cost: M + ((M / (nbuffer_count - 2)) * N)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment