Subtopic Notes
13.3 Floating-point numbers, representation and manipulation
13. Data Representation
Mantissa
- Stores the actual digits of the number
- Increasing the number of bits in mantissa increases accuracy
Exponent
- Tells you where the decimal point is placed by shifting
- Increasing the number of bits in exponent increases range
Floating Point Number Representation
- ± mantissa × 2exponent
- Can represent fractions and negative numbers
- In scientific notation a number is represented in the following way: 2.15 x 104
- Here 2.15 acts like mantissa (the significant digit) while 104 acts like the exponent (shifts decimal point 4 places to the right)
- The decimal point in binary numbers may be termed as binary point
Converting binary to floating-point real number (Method 1)
- You will be given the value of mantissa and exponent
Mantissa: 01110100 Exponent: 0010 - Write the binary number with decimal point: by default consider that the decimal point is after the first digit of mantissa
0.1110100 - Evaluate the value of exponent
0010 = 2 - Shift the decimal point to the right the number of times equaling the value of the exponent
011.10100 - Evaluate the value
| -4 | 2 | 1 | . | 1/2 | 1/4 | 1/8 | 1/16 | 1/32 |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | . | 1 | 0 | 1 | 0 | 0 |
Value: 2 + 1 + (½) + (⅛) = 3.625
Converting binary to floating-point real number (Method 2)
- You will be given the value of mantissa and exponent
Mantissa: 01110100 Exponent: 0010 - Answer will be mantissa × 2exponent
| sign | . | 1/2 | 1/4 | 1/8 | 1/16 | 1/32 | 1/64 | 1/128 |
|---|---|---|---|---|---|---|---|---|
| 0 | . | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
= 0.1110100 x 20010
= (½ + ¼ + ⅛ + 1/32 ) x 22 = 3.625
Converting denary to binary floating-point number (example)
- You will be given a denary number and the number of bits for mantissa and exponent
- Convert to binary putting the decimal point in the correct place
| -128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | . | 1/2 | 1/4 |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | . | 0 | 1 |
- Move the decimal point until the number is normalized (The value represented is the mantissa)
| 1 | . | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 |
|---|
- Calculate the exponent.
Since the decimal point moved by 7 places, the value of exponent will be 7: 0111
Mantissa: 1001100101 Exponent: 000111
Note: If a given number cannot be represented fully, represent the closest number possible (Precision may be lost)
Normalization
- Normalization is done to ensure uniqueness of number representation
- Makes calculation more straight forward
- Stores maximum range of numbers in minimum number of bits
- Enables very large/small number to be stored with accuracy
- During normalization a positive number must start with 01 from the left and negative number starts with 10
- Normalization Steps:
- Example: Normalize 0010110110 0010
- Step 1: Adjust the decimal point by shifting it left or right until the number begins with 01 or 10
- 0.010110110 becomes 0.101101100
- Step 2: Adjust the exponent as you shift the decimal point: moving it to the left increases the exponent, while moving it to the right decreases the exponent.
- Decimal point moves 1 to the right, so 1 is deducted from the exponent. Answer becomes 0101101100 0001
- Normalizing Negative Floating Point Values
- Example: 11110010 0111
- By default the number is in the format
1**.1110010 x 20111 = 1.**1110010 x 27 - Move the decimal point until normalized
1111**.**0010000 - Since the decimal point moved by 3 places, deduct 3 from exponent
= 1**.**0010000 x 24
= 10010000 0100
Overflow
- Happens when a number is too large to be represented within the available number of bits.
- The exponent exceeds its maximum value.
- Precision may be lost
- Excess values cut off
- Example: Trying to store 220 with only a 4-bit exponent.
Underflow
- Happens when a number is too small (close to 0) to be represented within the available bits.
- The exponent is too negative (less than the minimum allowed).
- Result: The number may be stored as zero, losing precision.
- Example: 0.0000001 with limited mantissa and exponent bits.
Rounding Error
- When a number cannot be represented exactly in binary, it is approximated to the closest possible value.
- This small difference can cause inaccuracies, especially when such numbers are used in multiple calculations.
- Over time, these tiny errors accumulate and become noticeable
- Example: 0.1 + 0.7 might give 0.7999999999999 instead of 0.8.
Important values
Consider for the following 8 bit mantissa and 4 bit exponent
| Largest positive number | 0111 1111 0111 |
|---|---|
| Smallest positive number | 0000 0001 1000 |
| Smallest normalized positive number | 0100 0000 1000 |
| Largest magnitude negative number | 1000 0000 0111 |
