 |
Unnecessary Whitespace
|
In a programming language such as C, programmers have a tendency to use a lot of extra whitespace in their programs. The effect? Thousands of bytes of wasted disk space, storing useless 0x09 and 0x20 characters. Take, for example, the following program, which calculates the denominations of change (in US currency) for a transaction entered by the user:
| Listing 1 |
|---|
#include <stdio.h>
struct DENOM
{
char *NameS, *NameP;
int value;
}
denom[] =
{
"100 dollar bill", "100 dollar bills", 10000,
"50 dollar bill", "50 dollar bills", 5000,
"20 dollar bill", "20 dollar bills", 2000,
"10 dollar bill", "10 dollar bills", 1000,
"5 dollar bill", "5 dollar bills", 500,
"1 dollar bill", "1 dollar bills", 100,
"quarter", "quarters", 25,
"dime", "dimes", 10,
"nickel", "nickels", 5,
"penny", "pennies", 1,
};
#define denoms (sizeof(denom)/sizeof(DENOM))
int
main (void)
{
float cash, price, change;
int c, num, intchange;
printf ("Enter cash tendered: ");
scanf ("%f", &cash);
printf ("Enter price of item: ");
scanf ("%f", &price);
change = cash - price;
if (change < 0)
{
printf ("Not enough money, you idiot!\n");
return -1;
}
else if (change == 0)
{
printf ("No change.\n");
return -1;
}
else
{
printf ("The change is: $%0.2f\n\n", change);
intchange = (int) ((change + .005) * 100);
for (c = 0; c < denoms; c++)
{
num = intchange / denom[c].value;
if (num)
{
printf ("%d %s\n", num, num == 1 ? denom[c].NameS : denom[c].NameP);
intchange -= denom[c].value * num;
}
}
return 0;
}
}
|
As you can see this program is several pages long, and uses 1,297 bytes of storage. Now take a look at the more efficient version, which does the exact same thing:
| Listing 2 |
|---|
#include<stdio.h>
struct DENOM{char*NameS,*NameP;int value;}denom[]={"100 dollar bill",
"100 dollar bills",10000,"50 dollar bill","50 dollar bills",5000,
"20 dollar bill","20 dollar bills",2000,"10 dollar bill","10 dollar bills",1000,
"5 dollar bill","5 dollar bills",500,"1 dollar bill","1 dollar bills",100,
"quarter","quarters",25,"dime","dimes",10,"nickel","nickels",5,"penny",
"pennies",1,};
#define denoms (sizeof(denom)/sizeof(DENOM))
int main(void){float cash,price,change;int c,num,intchange;printf(
"Enter cash tendered:");scanf("%f",&cash);printf("Enter price of item:");scanf(
"%f",&price);change=cash-price;if(change<0){printf(
"Not enough money, you idiot!\n");return-1;}else if(change==0){printf(
"No change.\n");return-1;}else{printf("The change is: $%0.2f\n\n",change);
intchange=(int)((change+.005)*100);for(c=0;c<denoms;c++){num=intchange/
denom[c].value;if(num){printf("%d %s\n",num,num==1?denom[c].NameS:denom[c].NameP)
;intchange-=denom[c].value*num;}}return 0;}}
|
The new version of the code is only 986 bytes. That's a significant savings. (And that includes the linefeed at the end of each line to make it fit in an 80 column editor) In fact, there are many advantages to compressing your code in such a fashion (which I will call source-level compressing):

|
When using one-character variable
names, remember that variables are
case-sensitive, and the underscore
character is also valid. That gives
you 53 different one-character
variable names to work with!
|
- The most obvious reason is to make it take up less disk space. By removing unnecessary spaces, we decreased the file size by 311 bytes. That's about a 24% savings.
- The resultant code also takes up fewer pages. If you printed out both versions of the code, the latter would waste far less paper than the first.
- Because the new code fits on a page, it is easier to debug, since you can see the entire program at once. There is no need to page up and down to try to follow the flow of execution to look for bugs.
That's all useful stuff, but we haven't yet made the program optimal. We can make the program take up even less space by using shorter variable names:
| Listing 3 |
|---|
#include<stdio.h>
struct D{char*s,*p;int v;}d[]={"100 dollar bill","100 dollar bills",10000,
"50 dollar bill","50 dollar bills",5000,"20 dollar bill","20 dollar bills",2000,
"10 dollar bill","10 dollar bills",1000,"5 dollar bill","5 dollar bills",500,
"1 dollar bill","1 dollar bills",100,"quarter","quarters",25,"dime","dimes",10,
"nickel","nickels",5,"penny","pennies",1,};
#define ds (sizeof(d)/sizeof(D))
int main(void){float m,p,r;int c,n,i;printf("Enter cash tendered:");scanf("%f",
&m);printf("Enter price of item:");scanf("%f",&p);r=m-p;if(r<0){printf(
"Not enough money, you idiot!\n");return-1;}else if(r==0){printf("No change.\n")
;return-1;}else{printf("The change is: $%0.2f\n\n",r);i=(int)((r+.005)*100);for(
c=0;c<ds;c++){n=i/d[c].v;if(n){printf("%d %s\n",n,n==1?d[c].s:d[c].p);i-=d[c].v*
n;}}return 0;}}
|
And even more can be gained by eliminating the unnecessary repetition of the string "dollar bill" in the section that declares the strings for all the denominations:
| Listing 4 |
|---|
#include<stdio.h>
#define B "dollar bill"
struct D{char*s,*p;int v;}d[]={"100 "B,"100 "B"s",10000,"50 "B,"50 "B"s",5000,
"20 "B,"20 "B"s",2000,"10 "B,"10 "B"s",1000,"5 "B,"5 "B"s",500,"1 "B,"1 "B"s",
100,"quarter","quarters",25,"dime","dimes",10,"nickel","nickels",5,"penny",
"pennies",1,};
#define ds (sizeof(d)/sizeof(D))
int main(void){float m,p,r;int c,n,i;printf("Enter cash tendered:");scanf("%f",
&m);printf("Enter price of item:");scanf("%f",&p);r=m-p;if(r<0){printf(
"Not enough money, you idiot!\n");return-1;}else if(r==0){printf("No change.\n")
;return-1;}else{printf("The change is: $%0.2f\n\n",r);i=(int)((r+.005)*100);for(
c=0;c<ds;c++){n=i/d[c].v;if(n){printf("%d %s\n",n,n==1?d[c].s:d[c].p);i-=d[c].v*
n;}}return 0;}}
|
The final code is only 734 bytes, which is 43% smaller than the original file. And we didn't use any kind of fancy compression algorithm, nor did the program change in operation at all.
Now you might be saying, "I just compress my files using gzip or PKZIP. I don't need to compress my code any other way." Well, I've got some more thoughts for you:

|
Save yourself some
typing by doing this:
#define p printf
|
- If you use gzip to compress Listing 1 and Listing 4, they are 556 and 448 bytes respectively. So you can still save almost 20% by using the source-level compression technique.
- Algorithmically compressed files can't be compiled. Programs compressed source-level can still be compiled and edited as-is, without any annoying decompress stage.
|