Week 06 Tutorial Questions

Objectives

introduce the IEEE-754 standard for representing floating-point numbers in binary as a more complex binary representation of data
introduce unicode characters
understand the representation of pointers, structs and unions
practice pointer arithmetic
introduce function pointers

Code Review

Someone will be selected to present their print_bigger.s code. The reviewee should give a brief description of their code, and the class should ask questions, comment on the quality of the code, and suggest improvements. Each review should take about 10 minutes.

Note: The Code Review will take place in the second hour of the tutorials.

What decimal numbers do the following single-precision IEEE754-encoded bit-strings represent?
1. 0 00000000 00000000000000000000000
2. 1 00000000 00000000000000000000000
3. 0 01111111 10000000000000000000000
4. 0 01111110 00000000000000000000000
5. 0 01111110 11111111111111111111111
6. 0 10000000 01100000000000000000000
7. 0 10010100 10000000000000000000000
8. 0 01101110 10100000101000001010000
Each of the above is a single 32-bit bit-string, but partitioned to show the sign, exponent and fraction parts.
Convert the following decimal numbers into IEEE754-encoded bit-strings:
1. 2.5
2. 0.375
3. 27.0
4. 100.0
Show the complete Unicode bit-string for each of the following Unicode characters (written in hexadecimal). If the character is ascii, show its representation as a C char.

Symbol Code

{ 0x0007B

ë 0x000EB

ф 0x00444

≤ 0x02264

𝔶 0x1D536

Note that the above codes do not include the extra bits that are needed in Unicode to mark it as a 1,2,3,4-byte sequence.
Write C functions that determine the number of bytes and the number of symbols in a Unicode string. Use the function headers:
```
int unicodeNbytes(unsigned char *str) { ... }
int unicodeNsymbols(unsigned char *str) { ... }
```
Do not include the trailing '\0' in the count.

An example of use:
```
unicodeNbytes("abc\xE2\x86\xABdef")  returns  9
unicodeNsymbols("abc\xE2\x86\xABdef")  returns  7
```
Each \xNN gives a single byte value in hexadecimal. The bytes in red correspond to a single Unicode symbol.
HTML entity notation provides a convenient way of writing Unicode characters and having them rendered in a Web browser. The web page https://dev.w3.org/html5/html-author/charref gives a (large) table of characters and shows how each entity can be written, using both symbolic and numeric notation. For example, the loopy arrow ↫ can be expressed either as &larrlp; or ↫, where the 21AB is a hexadecimal value, giving the significant bits from the Unicode encoding.

Write a C function to convert strings of Unicode characters into HTML entities. Use the function header below and write the results to standard output.
```
void unicode2html(unsigned char *str) { ... }
```
Note that you do not need to produce symbolic names for any character, except & for the & character. Regular ASCII characters can be printed as themselves.

Hint: if you see a byte that looks like 11100010, then you know that it is the start of a 3-byte Unicode sequence. The following example shows how you need to process the 3 bytes:
```
11100010 10000110 10101011
```
which are interpreted as
```
11100010 10000110 10101011
```
where the red bits correspond to the significant bits in the code. The bits are placed in a bit-string by themselves (without the Unicode "boilerplate") to produce the hex value for the symbol:
```
0010000110101011
i.e.
0010 0001 1010 1011
i.e.
   2    1    A    B
```
which is written as an HTML entity as ↫

You will need similar strategies for the 2-byte and 4-byte Unicode sequences.
Draw diagrams to show the difference between the following two data structures:
```
struct {                 union {
   int   a;                 int   a;
   float b;                 float b;
} x1;                    } x2;
```
If x1 was located at &x1==0x1000 and x2 was located at &x2==0x2000, what would be the values of &x1.a, &x1.b, &x2.a, &x2.b?

How large (#bytes) is each of the following C unions?

```
union { int a; int b; } u1;
```

union { unsigned short a; char b; } u2;

```
union { int a; char b[12]; } u3;
```
```
union { int a; char b[14]; } u4;
```

union { unsigned int a; int b; struct { int x; int y; } c; } u5;

You may assume sizeof(char) == 1, sizeof(short) == 2, sizeof(int) == 4.

Consider the following C union
```
union _all {
   int   ival;
   char cval;
   char  sval[4];
   float fval;
   unsigned int uval;
};
```
If we define a variable union _all var; and assign the following value var.uval = 0x00313233;, then what will each of the following printfs produce:
1. printf("%x\n", var.uval);
2. printf("%d\n", var.ival);
3. printf("%c\n", var.cval);
4. printf("%s\n", var.sval);
5. printf("%f\n", var.fval);
6. printf("%e\n", var.fval);
You can assume that bytes are arranged from right-to-left in increasing address order (ie little endian order).

Revision

Write a C function, six_middle_bits, which, given a uint32_t, extracts and returns the middle six bits.
What does the following printf statement display
```
printf("%c%c%c%c%c%c", 72, 101, 0x6c, 108, 111, 0x0a);
```
Try to work it out without simply compiling and running the code The command "man 7 ascii" will help with this. Then check your answer by compiling and running.

Consider the following C program:

int main(void)
{
   int x = 100;
   char s[8];
   int y = 200;
   ...
   strcpy(s, "a long name");
   ...
}

If the memory looks like

Address      Value      
0x7ffee32c    100        
0x7ffee324    ????????   
0x7ffee320    200

at the start of the program, show what it looks like after the strcpy() is executed.

Consider the following generic code which sets the value of a pointer (to a hopefully legal address on the stack), and subsequently increments it:
```
Type *ptr = 0x7ffff000;
...
ptr = ptr + 1;
// what value does ptr have here?
```
For each of the following variants of Type, show what the value of ptr would be after it is incremented:
1. int
2. short int
3. char
4. double
5. struct xyz { int x; int y; int z; }
6. struct abc { char a; int b; float c; }
Make explicit assumptions about the sizes of various types.
Consider the following struct definition defining a type for points in a three-dimensional space:
```
typedef struct _Coord {
   unsigned int x;
   unsigned int y;
   unsigned int z;
} Coord;
```
and following array definition
```
{
   Coord coords[10];
}
```
Write code to iterate over the coords array using just the variable p and setting each item in the array to (0,0,0). Do not use an index variable.

Consider the following small C program:

#include <stdio.h>

int main(void)
{
   int n = 1234;
   int *p;

   p = &n;
   n++;
   printf("%d\n", *p);
   printf("%p\n", p);
   p++;
   printf("%p\n", p);
   return 0;
}

If we assume that the variable n has address 0x7654, then what values will the program print?

What is the output from the following program and how does it work? Try to work out the output without copy-paste-compile-execute.

#include <stdio.h>

int main(void)
{
   char *str = "abc123\n";
   char *c;

   for (c = str; *c != '\0'; c++)
      putchar(*c);

   return 0;
}

Without using any index variables in your function, implement a function myStrCmp(a,b) which behaves like the standard strcmp(a,b) function
- if a appears before b in the dictionary, return a negative value
- if a appears after b in the dictionary, return a positive value
- if a and b are the same string return 0
You can assume that both strings are terminated by a '\0' character. Make sure you handle the case where the strings are different lengths, and one string is a sub-string of the other.

Use the following function template:
```
int myStrCmp(char *a, char *b) { ... }
```

Symbol	Code
{	`0x0007B`
ë	`0x000EB`
ф	`0x00444`
≤	`0x02264`
𝔶	`0x1D536`