1. Database Internals#

1.1. How are Tables stored and queried on Disk#

  • For Instance :

  • Table with only one integer col of 32 bits (4 Bytes)

  • Insert one row->4 bytes into disk

  • Insert another row-> another 4 bytes into disk

1.2. But how this querying works internally?#

  • This is how a disk looks like from inside

  • Disk is divided into Tracks,sectors,blocks

    • Eg, Track no 7,Sector no 5,Block number 2=>Gives block of disk

  • Each Block is a fixed size memory block, 512 Bytes lets say

image.png

  • So,while inserting data into memory:

  • Insert row 1-> 4 bytes->Stored in that Given block

    • Block storage left=(512-4) Bytes

  • Insert row 2-> 4 bytes->Stored in that same block

  • That block will get filled until complete

1.3. How will database remeber which row is where?#

Database stores Metadata,like Row one is at Track no 7,Sector no 5,Block number 2=>Gives block of disk

  • So,when the IO req will come for this row,disk will start rotating till that particular block is found by the spin head and then db will pull that entire block of 513bytes you wanted rows but you will receive all the rows

1.4. What in case of SSDs?#

  • There are page numbers instead of Block numbers

1.4.1. In Depth:#

  • Database needs an identifier to identify each row

  • For that,it uses Primary key,if its present else it create a new column,row_id in the backend for its reference

  • Now these rows are stored in memory as heap

  • So Heap memory contains something called Pages(fixed size memory locations) and rows goes into these pages

  • First,rows will go into page one till this page is full then other rows will go to page 2 and so on

  • All these pages are in Heap.

  • So, Hierarchy becomes-> Rows->Pages(Part of Heap)

image.png

  • By default,It will scan all the pages,all the rows,all the data

image.png

  • But with indexing on one or more column,Lets say like here we did indexing on just on one column,employee id,it will store diff pages with just that column and the row identifier and the page identifier.

  • So now,It will allready know to which specific page he has to search for a particular column

image.png

1.5. Are the blocks or Pages stored simultaneausly in disk??#

  • Like if block one has last row no. 128 , Will the row no. 129 in next block?

    • Maybe or maybe not

    • It depends on How the data is fragmented