Subtopic Notes

13.2 File organisation and access

13. Data Representation

File Organization Methods

  • Serial
    • Records are stored in the order they are entered, one after another (chronological order)
    • There is no specific order
    • Record is appended in the next available spot
    • Example: Logging events, Transaction records, Text File
    • Advantage
      • Simple
      • Cheap
      • No need to resort data each time data added
      • Easy to append
      • Best for small files or when only adding new data
    • Disadvantage
      • Search requires required checking each record one by one
      • Cannot support modern high speed requirements for quick record access
  • Sequential
    • Records are stored in a sorted order based on a key field (eg ID, name)
    • A logical order is maintained
    • To Update: A new version of file has to be created
    • High hit rate: A situation where the desired record is found on the first access attempt most of the time.
    • Example: Storing student records
    • Advantage
      • Easier and faster to search
      • Easier to sort data
      • Allows batch processing
    • Disadvantage
      • All records must be structurally identical
      • Hard to accommodate updates
      • Random access not possible
  • Random (Direct)
    • Records are stored at specific locations using a hashing algorithm based on a record key
    • If a record already exists in that key we perform a form of collision handling
    • Fast access while reading and writing
    • Useful when frequent searching is needed
    • More complex setup
    • Low hit rate
    • Example: Banking System, Inventory System
    • Advantage
      • Records can be retrieved easily
      • Record can have different sizes
      • Sorting of records not required
      • Updates several files quickly
    • Disadvantage
      • Data can be accidentally erased or overwritten without precautions
      • Direct access files lack backup facilities
      • Require costly hardware, software, and programming
      • Less storage-efficient than sequential files

File Access Methods

  • Sequential Access:
    • Used to access serial and sequential files
    • Reads the file from start to end, one record at a time until the record is found or key value exceeded
  • Random/Direct Access
    • Used to access random and sequential files (with indexing)
    • Jumps directly to the location of the required record using the key and possibly a hash function

Hashing Algorithms

  • Convert a key field (eg. ID) into an address or index in a file or memory location
  • Process
    • A hash function is applied to the key.
    • This generates a unique address
    • The record is stored or accessed at that address
  • Collision Handling
    • Collision Occurs when two keys produce the same address using a hash function
    • Methods
      • Check the next available spot (Linear Probing)
      • Each address holds a linked list of records with same hash (Chaining)
      • Apply a second hash function to find a new address when first address is taken (Rehashing)