DATABASE MANAGEMENT SYSTEM: Sorting Concept

Sorted File

An approach to keep a level of organization to the tuples and permit some more efficient searching
Rows are sorted based on some attribute(s)

Access path is a binary search
An equality or range query based on that attribute has cost log₂F to retrieve page containing first row
Successive rows are in same (or successive) page(s) and cache hits are likely
By storing all pages on the same track, seek time can be minimized

Example - Transcript sorted on StudId :

SELECT T.Course, T.Grade
FROM Transcript T
WHERE T.StudId = 123456

SELECT T.Course, T.Grade
FROM Transcript T
WHERE T.StudId BETWEEN 111111 AND 199999

Transcript Stored as a Sorted File

123456 CS305  S1996 4.0 
234567 CS305  S1999 4.0

515151 EE101  F1995 3.0

page 0

586609 EE101  S1998 3.0 
666666 MGT123 F1994 4.0
717171 CS315  S1997 4.0
765432 MAT123 S1996 2.0

page 1


878787 MGT123 S1996 3.0    
987654 CS305  F1995 2.0

page 2

keep space on the page for new insertions.

Maintaining Sorted Order

Problem: After the correct position for an insert has been determined, inserting the row requires (on average) F/2 reads and F/2 writes
(Partial) Solution 1: Leave empty space in each page: fillfactor or utilization
(Partial) Solution 2: Use overflow pages (chains) but this has disadvantages:

Successive pages no longer stored contiguously

Overflow chain not sorted, hence cost no longer log₂F

Overflow

123456 CS305 S1996 4.0 234567 CS305 S1999 4.0 515151 EE101 F1995 3.0	3	page 0
586609 EE101 S1998 3.0 666666 MGT123 F1994 4.0 717171 CS315 S1997 4.0 765432 MAT123 S1996 2.0	-	page 1
878787 MGT123 S1996 3.0 987654 CS305 F1995 2.0	-	page 2
111654 CS305 F1995 2.0 111233 PSY 220 S2001 3.0		page3

Eventually degrades to a linked list and has to be reorganized. A 7/24 database cannot afford to be taken offline for reorganization.