Subtopic Notes
13.2 File organisation and access
13. Data Representation
File Organization Methods
- Serial
- Records are stored in the order they are entered, one after another (chronological order)
- There is no specific order
- Record is appended in the next available spot
- Example: Logging events, Transaction records, Text File
- Advantage
- Simple
- Cheap
- No need to resort data each time data added
- Easy to append
- Best for small files or when only adding new data
- Disadvantage
- Search requires required checking each record one by one
- Cannot support modern high speed requirements for quick record access
- Sequential
- Records are stored in a sorted order based on a key field (eg ID, name)
- A logical order is maintained
- To Update: A new version of file has to be created
- High hit rate: A situation where the desired record is found on the first access attempt most of the time.
- Example: Storing student records
- Advantage
- Easier and faster to search
- Easier to sort data
- Allows batch processing
- Disadvantage
- All records must be structurally identical
- Hard to accommodate updates
- Random access not possible
- Random (Direct)
- Records are stored at specific locations using a hashing algorithm based on a record key
- If a record already exists in that key we perform a form of collision handling
- Fast access while reading and writing
- Useful when frequent searching is needed
- More complex setup
- Low hit rate
- Example: Banking System, Inventory System
- Advantage
- Records can be retrieved easily
- Record can have different sizes
- Sorting of records not required
- Updates several files quickly
- Disadvantage
- Data can be accidentally erased or overwritten without precautions
- Direct access files lack backup facilities
- Require costly hardware, software, and programming
- Less storage-efficient than sequential files
File Access Methods
- Sequential Access:
- Used to access serial and sequential files
- Reads the file from start to end, one record at a time until the record is found or key value exceeded
- Random/Direct Access
- Used to access random and sequential files (with indexing)
- Jumps directly to the location of the required record using the key and possibly a hash function
Hashing Algorithms
- Convert a key field (eg. ID) into an address or index in a file or memory location
- Process
- A hash function is applied to the key.
- This generates a unique address
- The record is stored or accessed at that address
- Collision Handling
- Collision Occurs when two keys produce the same address using a hash function
- Methods
- Check the next available spot (Linear Probing)
- Each address holds a linked list of records with same hash (Chaining)
- Apply a second hash function to find a new address when first address is taken (Rehashing)
