Subtopic Notes

16.2 Translation Software

16. System Software

Interpreter

  • Reads line by line and checks for error
  • Executes the program line by line and stops when error is found
  • Can execute program without creating a translated version
  • Analysis and code generation run for each code line
  • Each line is executed as soon as the intermediate code is generated

Stages of Compilation

  1. Lexical analysis
    • Converts source code into tokens (smallest meaningful units).
    • Comments removed
    • The symbol table stores identifiers and their attributes
    • The keyword table stores reserved words and operators
  2. Syntax analysis/Parsing
    • Checks tokens against the grammar of the language
    • Generates a parse tree (or abstract syntax tree)
    • Reports syntax errors if rules are violated
  3. Code generation
    • Translates the parse tree/intermediate representation into object code or machine code
    • Produces intermediate code (e.g., reverse polish notation) before final output
  4. Optimization
    • Improves the generated code for efficiency
    • Aims to reduce execution time and/or memory usage without changing program behavior

Expressing Grammar of language

Syntax Diagram

  • Visual way to represent the grammar of a programming language
  • Flow: Start → Sequence of symbols/constructs → End.
  • Helps understand valid sequences of statements or expressions
  • The two methods of representing syntax (Diagram and BNF) directly maps onto one another

Backus-Naur Form (BNF)

  • Context-free grammar commonly used by developers of programming languages to specify the syntax rules of a language.
  • Used to represent the rules of a grammar.
  • Usefulness of BNF
    • BNF is a general approach which can be used to describe any assembly of data
    • BNF can be used as the basis for an algorithm.
    • Uses a range of symbols and expressions to create production rules
  • It consists of:
    • A set of terminal symbols
    • A set of non-terminal symbols (<> used to represent)
    • A set of production rules
  • Symbols
    • <> Non terminal symbols
    • | Denotes alternative
    • ::= Equal to

Format:
LHS ::= RHS

Examples
<digit> ::= 0|1|2|3|4|5|6|7|8|9
<fullname> ::= <title><name><name>
<title> ::= Mr|Mrs|Ms|Miss|Dr
<addition> ::= <number>+<number>
<number> ::= <sign><integer>|<integer>
<integer> ::= <digit>|<digit><integer>
<sign> ::= +|-

Example of Recursion
<number> ::= <digit>|<digit><number>
<digit> ::= 0|1|2|3|4|5|6|7|8|9

Reverse Polish Notation (RPN)

  • A method of representing an arithmetic or logical expression without brackets or special punctuation.
  • Uses postfix notation, where an operator is placed after the variables it acts on. For example, A + B would be written as A B +
  • Compilers use RPN because any expression can be processed from left to right without backtracking.
  • Advantages of RPN
    • Do not need brackets, and there is no need for the precedence of operators
    • Simpler for a machine to evaluate
    • No need for backtracking in evaluation as the operators appear in the order required for computation and can be evaluated from left to right
  • Reading RPN
    • RPN expression is read left to right.
    • When you find a number or operands: Push onto the stack.
    • When you find an operator: Pop the last two values, apply the operator, and push the result back
    • Continue until the end of the expression
    • The final value in the stack is the result
  • Infix to Reverse Polish Notation
    • Operands (numbers/variables) → write directly to the output.
    • Operators → push onto a stack.
      • Pop operators from the stack to output if they have higher or equal precedence than the current operator.
    • Parentheses
      • ‘(’ → push onto the stack.
      • ‘)’ → pop operators from the stack to output until '(' is found, then discard ‘(’.
    • At the end, pop any remaining operators from the stack to the output
  • Examples
    • ( a - b ) * ( a + c ) / 7 → a b - a c + * 7 /
    • ( a / b ) * 4 - ( a + b ) → a b / 4 * a b + -