Subtopic Notes

13.2 File organisation and access

13. Data Representation

File Organization Methods

Serial
- Records are stored in the order they are entered, one after another (chronological order)
- There is no specific order
- Record is appended in the next available spot
- Example: Logging events, Transaction records, Text File
- Advantage
  - Simple
  - Cheap
  - No need to resort data each time data added
  - Easy to append
  - Best for small files or when only adding new data
- Disadvantage
  - Search requires required checking each record one by one
  - Cannot support modern high speed requirements for quick record access
Sequential
- Records are stored in a sorted order based on a key field (eg ID, name)
- A logical order is maintained
- To Update: A new version of file has to be created
- High hit rate: A situation where the desired record is found on the first access attempt most of the time.
- Example: Storing student records
- Advantage
  - Easier and faster to search
  - Easier to sort data
  - Allows batch processing
- Disadvantage
  - All records must be structurally identical
  - Hard to accommodate updates
  - Random access not possible
Random (Direct)
- Records are stored at specific locations using a hashing algorithm based on a record key
- If a record already exists in that key we perform a form of collision handling
- Fast access while reading and writing
- Useful when frequent searching is needed
- More complex setup
- Low hit rate
- Example: Banking System, Inventory System
- Advantage
  - Records can be retrieved easily
  - Record can have different sizes
  - Sorting of records not required
  - Updates several files quickly
- Disadvantage
  - Data can be accidentally erased or overwritten without precautions
  - Direct access files lack backup facilities
  - Require costly hardware, software, and programming
  - Less storage-efficient than sequential files

File Access Methods

Sequential Access:
- Used to access serial and sequential files
- Reads the file from start to end, one record at a time until the record is found or key value exceeded
Random/Direct Access
- Used to access random and sequential files (with indexing)
- Jumps directly to the location of the required record using the key and possibly a hash function

Hashing Algorithms

Convert a key field (eg. ID) into an address or index in a file or memory location
Process
- A hash function is applied to the key.
- This generates a unique address
- The record is stored or accessed at that address
Collision Handling
- Collision Occurs when two keys produce the same address using a hash function
- Methods
  - Check the next available spot (Linear Probing)
  - Each address holds a linked list of records with same hash (Chaining)
  - Apply a second hash function to find a new address when first address is taken (Rehashing)

Previous:User-defined data types Next:Floating-point numbers, representation and manipulation