Sorted File
An approach to keep a level of organization to the tuples and permit some more efficient searchingRows are sorted based on some attribute(s)
- Access path is a binary search
- An equality or range query based on that attribute has cost log2F to retrieve page containing first row
- Successive rows are in same (or successive) page(s) and cache hits are likely
- By storing all pages on the same track, seek time can be minimized
SELECT T.Course, T.Grade FROM Transcript T WHERE T.StudId = 123456 | SELECT T.Course, T.Grade FROM Transcript T WHERE T.StudId BETWEEN 111111 AND 199999 |
Transcript Stored as a Sorted File
123456 CS305 S1996 4.0
234567 CS305 S1999 4.0
515151 EE101 F1995 3.0
| page 0 |
586609 EE101 S1998 3.0
666666 MGT123 F1994 4.0
717171 CS315 S1997 4.0
765432 MAT123 S1996 2.0
| page 1 |
878787 MGT123 S1996 3.0
987654 CS305 F1995 2.0
| page 2 |
Maintaining Sorted Order
Problem: After the correct position for an insert has been determined, inserting the row requires (on average) F/2 reads and F/2 writes(Partial) Solution 1: Leave empty space in each page: fillfactor or utilization
(Partial) Solution 2: Use overflow pages (chains) but this has disadvantages:
Overflow
123456 CS305 S1996 4.0
234567 CS305 S1999 4.0
515151 EE101 F1995 3.0
| 3 | page 0 |
586609 EE101 S1998 3.0
666666 MGT123 F1994 4.0
717171 CS315 S1997 4.0
765432 MAT123 S1996 2.0
| - | page 1 |
878787 MGT123 S1996 3.0
987654 CS305 F1995 2.0
| - | page 2 |
111654 CS305 F1995 2.0
111233 PSY 220 S2001 3.0
| page3 |
Eventually degrades to a linked list and has to be reorganized. A 7/24 database cannot afford to be taken offline for reorganization.