So. I know Nick has thrown a bunch of ASCII and serial stuff at you guys,
but I don’t know if anyone has put it together for you with the binary
stuff in Luke’s handout from the start of semester. So this is a $7
guide to binary encoding, which I will develop into ASCII and printing in C.
Firstly, let’s get some bullshit out of the way. People say computers
only understand 1s and 0s. This is, of course, ridiculous. Computers
don’t understand anything. They can store 1s and 0s like a
motherfucker, but they don’t have the faintest idea what they “mean”.
That’s because what they “mean” is entirely dependent on context. The
8 bits 1011 0011 don’t “mean” 227. They don’t “mean” -77, either, for
those who are up to speed with 2’s complement. They might be 8 bits
from the middle of a longer number, or a decimal number. Who knows?
So the computer stores 1s and 0s, and we assign meaning to that
depending on context. They are stored in chunks of 8 bits because
that’s a nice round number if you’re working in binary. People have
played with 7 bits and 9 bits, or 10 bits, but these days 8 is
standard. Note again, though, that it’s a chosen standard – 8 bits to a
byte is not actually any inherent limit.
One of the things humans really like doing with computers, it turns out, is
typing and printing words. And the thing about words is that they have
letters. So how do we store words as ones and zeros? Well, we can
pretty easily see that letters map to numbers. There are 26 letters in
the alphabet. But when we print them, we need a different number to
say “this is a capital’. So we need 52 numbers. Plus some punctuation.
Turns out we need about a hundred characters to pretty much cover
written English. Then, of course, we’re printing this on a screen – we
might want some “characters” to tell the screen what to do (‘tab’, for
example, to shortcut printing 4 spaces, or “new line”).
We can still fit all that handily in 128 characters. That’s less than
a byte, so let’s round up and say we’ll store one character per byte.
There are technical reasons for that, but basically, it’s as easy to
get a full byte from memory as it is to get half a byte (or a nybble,
if you prefer), and it’s easier to convert a byte to a number than to
split it in half and convert each 4 bits into a number. So now if we
have some standard way to assign numbers to letters, we can just store
those numbers, and then when we retrieve them from memory, we can look
up the letter, and print that. So back in the day, a bunch of
boffins got together and agreed on a standard for converting letters
to numbers. Actually, because this is computing, 2 separate bunches of
guys agreed on 2 separate standards, and then the market decided. So
what we have now is ASCII – the American Standard Code for Information
Interchange. Because EBCDIC was even worse.
Which brings me to C. I’ve been talking about storing and printing,
and perhaps rather than moving on to serial transmission, I’ll look at
how that actually works, and what the implications are.
Let’s start with the following:
#include <stdio.h>
int main(void){
char Alice = ‘A’;
char Bob = 66;
printf(“%c\n”, Alice);
printf(“%c\n”, Bob);
}
The next $7 guide will look at why main has int and void, because that
seems to be a popular question, but for now, focus on what’s in main.
So we say to The Machine: “I would like enough memory to store a character,
and I will call that chunk of memory Alice. In that chunk of memory,
please store the value of capital A.”
The single quotes around ‘A’ are important. If we just wrote
char Alice = A;
then The Machine would think we had another chunk of memory
somewhere called A, and get confused because it couldn’t find it.
So the single quotes say “Store the literal thing inside these
quotes”. The Machine is just clever enough to know that it can’t store
‘A’, so it goes away and looks up the ASCII table and gets a number
for A. That will be 65.
Reasonably enough, capital B is the letter after A. Normally, it’s
easier to put ‘B’ and make The Machine look it up, but this
demonstrates that we can skip that step and put a number in directly.
Note that we don’t need quotes around numbers. The Machine is smrt
enough to recognise a number.
[Bonus points: digits have ASCII codes that don’t match the “number”
represented by the digit. If you have “Alice = ‘1’;”, then the quotes
say “store the value of the thing in the quotes”, and the ASCII value
for the digit “1” is 49. Yay?]
So now we move on to printing those. printf() is an amazingly clever
routine. We put what we want to print in double quotes this time, and
we say to printf “I want you to print a character here”. That’s the
%c. %c just says ” I will give you a number. Go look the number up in
the ASCII table, and print the letter that matches.”
Of course, we have lied to poor printf – what we actually give it is the
name of a chunk of memory called Alice. But printf can just about
manage to pull a number out of that, and that number is 65, which
matches A.
Likewise B.
So this is all very fascinating, and I know you’re all thrilled, but
this is a hell of a bg deal to be making out of looking up letters.
Turns out, of course, that it’s more interesting than that. Try this:
#include <stdio.h>
int main(void){
char Alice = 65;
char Bob = Alice +1;
char Zarathustra = Alice +25;
char Littlebob = Bob + 32;
char Zero = 0x30;
printf(“%c %d \n”, Alice, Alice);
printf(“%c %d \n”, Bob, Bob);
printf(“%c %d \n”, Zarathustra, Zarathustra);
printf(“%c %d \n”, Littlebob, Littlebob);
printf(“%c %d \n”, Zero, Zero);
}
Yow. Mind. Blown. Now you see why this is the most exciting way I
could think of to spend my Friday night.
Lookit:
#include <stdio.h>
int main(void){
int i = 0;
for ( i = 0; i < 10; i++){
printf(“%c \n”, 48 + i);
}
for ( i = 1; i<27; i++){
int Alice = 64 +i;
printf(” Big %c ! little %c \n”,Alice , Alice + 32 );
}
}
Now we’ve made Alice an int. The only difference between a char and an int
is how much memory is put aside. “char” says “give me one byte”. “int” says
“give me enough memory to store an int”. That s technically dependent on the
system, but for most desktops is 4 bytes. So I’m using more memory, but
since a “char” is just “a number that fits in 1 byte”, it also fits in 4
bytes. And since it’s just a number, there’s no difference between 01000001 and
00000000 00000000 00000000 01000001.
And if that’s not exciting enough:
#include <stdio.h>
int main(void){
int i = 0;
for ( i =0; i<26; i++){
printf(” Big %c ! little %c \n”, ‘A’ +i , ‘A’ + i + 32 );
}
}
The value of ‘A’ is a number, and we can add it to other numbers and do math
with it directly.
Now, if you’ll excuse me, I think I need a cold shower. Any questions?
There are 2 more $7 guides in the works. One of them addresses why “main()”
works even though you should write “int main(void)” anyway (or perhaps why
you should write “int main(void)” when “main()” works just as well. That will
include what the hell “(int argc, char *argv[])” means, and probably also a
rant about why the Pugh book has issues in the 21st century.
The other one will be on serial transmission, honest, and how to know when
01000110 01110101 01100011 01101011 means a very large number, and when it
means a very rude word (short answer, you can’t. Longer answer, it’s up to you!)
But I’m open to suggestions if anyone has requests.