Vinay's C Tutorial

Variables and Basic Data Types

What Are Variables?

Variables are places where the program stores data. More precisely, it is a memory location, which is referred to by your program using an name called and identifier. As the name suggests, the value of variables can change (vary) as the program executes. Variables are a concept central to most programming languages. If you program in any language, you are probably familiar with the idea of variables. Variables can hold several types of data including numbers, characters and strings.

Introduction to Variables

Now look at the following program to add two numbers.

#include <stdio.h>

main()
{
     int Num1, Num2;
     int Ans;

     Num1 = 5;
     Num2 = 2;
     Ans = Num1 + Num2;

     printf ("The answer is %d\n", Ans);
}

When you run the program, it says 'The answer is 7'.

The program includes stdio.h and declares main like the hello program in the previous chapter. The first two lines of main function declare the three variables, Num1, Num2 and Ans to be integers. Later in this chapter you'll learn about other types of variables, but for now we'll stick to integers. Every variable must be declared before you can use it. By declaring the variables, we inform the compiler of our intent to use these variables, and tell it what type of data the variables hold.

A declaration begins with a type specifier, int in this case. It tells the compiler what type of variable we're declaring. Then  is a list of one or more variables, separated be commas. In the first line two variables Num1 and Num2 are declared. The second line declares only one variable, Ans. Like all other statements in C, a declaration should be terminated with a semicolon. We may well have used three lines to declare the three variables like this

int Num1;
int Num2;
int Ans;

Or we could declare all three variables in one line, like this

int Num1, Num2, Ans;

All declarations must precede any other statements in the function. So the following would be illegal

main()
{
     int x = 10, y;
     y = x * 10;

     int z;
}

It's illegal because z is declared after the statement y = x * 10.

Variable names mean nothing to the compiler; we may well have named the three variable Apple, Orange and Grape. However it is better to use variable names that tell the reader what the variable's used for. The rules for naming variables are given a little later.

'Integer' has the same meaning as in mathematics (0, 1, 2, -1, -2 , -3 etc., but not 0.1, 1.5 etc.) except that in C, its range is limited. Don't worry about the range of integers for now; we'll get to that later in this chapter.

Numbers like 5, 2 and strings like "Hello, world.\n" have fixed values and are called constants.

The meaning of the next three lines should be fairly obvious. Num1 = 5 and Num2 = 2 assign the values of 5 and 2 to the variables Num1 and Num2 respectively. The line

Ans = Num1 + Num2;

adds the values of  Num1 and Num2 and assigns the value to Ans. The + sign has much the same meaning as it does in mathematics. Mathematical symbols like + and = are called operators in C. The above statement is called an expression.

This chapter only introduces the fundamentals of operators, that you'll need to properly study variables and data types. These topics will be dealt with in detail in the next chapter on Operators and Expressions.

For now just remember that +, -, =, () have their usual mathematical meanings. The operator * represents multiplication and / represents division. As in mathematical equations, 2 + 3 * 5 equals 17 rather than 25, but (2 + 3) * 5 equals 25.

Variable Naming Rules

Variable names can use letters A-Z and the digits 0-9 and the underscore character _ (upper case character of -). Variable names are case sensitive, thus ABC, abc, Abc, aBc, ABc are all different variables. However, the name of a variable can't begin with a digit. Thus 123ABC is an illegal name, but ABC123 is legal. Following are some legal variable names:

a
abc
ABC
ABC123
abc123
_123
GrandTotal

Note that _123 is legal because it starts with an underscore, not a digit.

Words like int and include which are used by the C language are called reserved keywords and can't be used as variable names. However, Int and INCLUDE are both legal variable names, because C is case sensitive, and int is different from Int.

The printf Syntax - A Little More Detail

That leaves only the last printf statement to be explained. You'll notice that here, we've passed not one argument (as in the hello program, of the last chapter) but two parameters to printf, the first being the string "The sum is %d\n" and the second being the variable Sum. You'll notice that all but the %d is printed verbatim on the screen, and the \n as earlier, acts as a newline character, and puts the cursor on the next line. But what about the %d?

The % sign followed by a letter (here, d) is called a format specifier and tells printf how to interpret the next parameter. The letter d tells it that the next parameter is an integer.

It is called d rather that, say, i to tell it to print the integer in the decimal format, i.e. in the normal number format we use to count, using the digits 0 to 9. We could also use %x to ask printf to print the integer in hexadecimal format, i.e. using sixteen digits, 0 to 9 and a to f. If you don't understand the various number formats (binary, octal, decimal, hexadecimal etc.) don't worry about it at the moment; you won't really need to know about them in the immediate future.

There is more to the printf syntax than is explained here. What is explained here is enough for the moment. The printf syntax is dealt with in detail in the chapter on Basic I/O. (Yes, the f in printf does stand for Formatted).

The first parameter we pass to printf is always a string (enclosed in double quotes "). It is called the format string. It tells printf how to format the text, and how to interpret rest of the arguments (by way of format specifiers). Any number of format specifiers can be included in the format string. When printf displays the string the format specifier is replaced with the value of the corresponding argument. However, one must be careful to include the correct number of arguments, of the correct type and in the correct order, otherwise printf is likely to print junk on the screen. For example, in try substituting the printf statement in the above program with:

printf ("%d + %d = %d", Num1, Num2, Ans);

The program now prints '5 + 2 = 7'. The first %d is replaced by the value of the second argument (Num1) the second %d is replaced by the value of the third argument (Num2) and the third %d is replaced by the value of the fourth argument (Ans). The rest of the string is printed as such.

Have you guessed how to print the % sign on the screen using printf? Like you use "\\" to print '\', you use "%%" to print '%'! Note that there is a fundamental difference between, the format specifiers (beginning with %) and the escape sequences (beginning with \). The format specifiers are a part of the printf syntax only; % signs will behave normally in other strings (i.e. strings that are not printf as the format strings. We'll deal with other uses of strings in later chapters). However,  escape sequences are a part of the C syntax itself and will behave as single characters in all strings.

Integer Math

Now in the program given at the beginning of this chapter, replace the line

Ans = Num1 + Num2;

by

Ans = Num1 / Num2;

The program responds with "The answer is 2". But 5 / 2 is 2.5, not 2! Remember that we declared Ans to be of type int, and 2.5 is not an integer. When we convert decimal numbers to integers, we usually round them, hence 2.5 would become 3. However the computer truncates all digits to the right of the decimal point. Thus, the answer becomes 2.

You've got to be a little careful when using integer math, otherwise you may get unacceptably inaccurate results. For example, examine a short program that converts a few temperatures from fahrenheit to celsius. Recall that:

Celsius = 5/9 * (Fahrenheit - 32);

is the equation used to convert from fahrenheit to celsius. The following program uses this equation.

#include <stdio.h>

main()
{
     printf (" 32 F is %d C\n", 5 / 9 * (32 - 32));
     printf ("100 F is %d C\n", 5 / 9 * (100 - 32));
     printf ("212 F is %d C\n", 5 / 9 * (212 - 32));
}

If you've understood all the previous programs in this chapter, this one should be fairly easy to understand. Only one point to mention is that here, the second parameter passed to printf is not a variable, but an expression. The program first evaluates (i.e. finds the value of) the expression and then passes it to the function (here printf).

When you run the program, you'll find that the program gives the following result:

 32 F is 0 C
100 F is 0 C
212 F is 0 C

All three conversions are found to be zero! Why is that? Look carefully at the expression. As in mathematics, multiplication and division have the same priority (called precedence), and hence 'five divided by nine' is evaluated first, as it occurs to the left of multiplication with the bracket. Actually, 5 / 9 equals 0.5555... Since we are doing integer math, it is truncated to 0! Then the bracket is evaluated, and multiplied by zero, so all three conversions are found to be zero. When using integer math, the multiplication should be done first. Hence, for example, the second printf should be replaced by:

printf ("100 F is %d C\n", 5 * (100 - 32) / 9);

This yields 37 which is still inaccurate (correct value is 37.7777...) but a lot better than zero!

More Accurate Math - Floating Point

You'll often find that integer arithmetic, is too inaccurate to serve your needs. Its then time to move to floating point, which is a way of representing real numbers, not necessarily integers, say 0.5, 1.25, 3.142. Its called floating point, because the number of digits after the decimal point is not fixed.

The following program is similar to the division program you wrote earlier (divided 5 by 2, remember?) but it's now written using floating point instead of integers. Also, this program no longer uses the Num1 and Num2 variables, because you're probably reasonably familiar with expressions by now. (The first program of this chapter used them, only for clarity).

#include <stdio.h>

main()
{
     float Ans;

     Ans = 5.0 / 2.0;
     printf ("5 / 2 is %f", Ans);
}

Now, the program says "5 / 2 is 2.500000". All those zeroes after the decimal point may look unsightly, but this program shows you how to use floating point and thus serves its purpose. You'll learn how to format the text better in the chapter on Formatted Output.

Notice that Ans is now declared as type float. This tells the compiler that Ans is a floating point variable. Moreover, in printf the %d is replaced by %f.

Finally, the expression in the line following the declaration of Ans is written as 5.0/2.0 to tell the compiler that they are floating point numbers. If it is written as 5/2 the compiler will calculate it using integer arithmetic, and then convert it to floating point. So you'll end up with 2.0 instead of 2.5.

After reading the above program, you'll no doubt conclude that there is nothing particularly complicated or difficult about using floating point. So you're probably wondering why one bothers with integer arithmetic, anyway. There are several reasons. Integer arithmetic is much faster than floating point, even with 486 and Pentium processors which have a floating point unit (FPU, also called math co-processor or numeric processor) built in. The speed difference is not so much at the moment to make even the slightest difference in the execution speed of any program you write in the immediate future, but the slowness of floating point will be noticeable with calculations that need to be repeated thousands or tens of thousands of times, especially on 386 and earlier processors. Programmers writing high-performance programs like animation and game software often go to great lengths to avoid floating point. However, in this tutorial, floating point is used freely. You can learn to use integer-only calculations once you learn C thoroughly and begin to write really demanding applications.

You're probably wondering why its 5.0 / 2.0 rather than just 5 / 2. That is to tell the compiler that the 5 and 2 here are floating point, not integers. Otherwise you'll end up with 2.000000 again.

Now, let's rewrite the temperature-conversion program to use floating point.

#include <stdio.h>

main()
{
     printf (" 32 F is %f C\n", 5.0 / 9.0 * (32 - 32));
     printf ("100 F is %f C\n", 5.0 / 9.0 * (100 - 32));
     printf ("212 F is %f C\n", 5.0 / 9.0 * (212 - 32));
}

The program now yields fairly accurate results. Moreover, there is no longer a need to do the multiplication first, because with floating point, 5.0 / 9.0 will not be truncated to zero.

For out purposes, the 32-bit float is precise enough. But there is also the option of the double and long double which are 64 and 80 bits long respectively.

The need for the double and long double floating point formats is usually felt only in scientific and mathematical operations. Apart from the declaration, there really is very little difference in using any type of floating point.

Signed and Unsigned Integers

Try writing a program to multiply 1024 by 64 (integers, not floating point). You should get 65536 right? You'll find that you actually end up with zero!

From now on, I'll assume that you're reasonably comfortable with writing simple programs like these. If you're not, you should read the earlier chapter and previous parts of this chapter. I won't provide such simple programs, to encourage you to start writing your own programs - after all its the only way to learn to program! In some cases I'll only give a part of a program that illustrates the topic we're studying (such a part is called a code fragment because, programs (not the compiled, .EXE programs) are often called source code or simply code).

Now try multiplying 1024 by 32. Instead of 32768, you get -32768. If you try other additions or multiplications, you'll find that you get rather strange results with operations that give results larger that 32767 or smaller than -32768. That is because on the PC, integers are 16-bit, and in case of signed integers which we've been using, the highest bit (bit 15; bits are numbered from 0-15 from right to left) is the sign bit, so only 15 bits are used to represent the integer. Thus the largest positive integer is (215 - 1) i.e. 32767. The smallest negative number is -32768.

The computer represents numbers (and indeed, all data) in a sequence of zeroes and ones. Each zero or one is called a bit, i.e. BInary Digit. Since only two possible digits exists, its called the binary system.

If don't know about the binary number system, you can go to the Binary/Hexadecimal Tutorial (also in the Beginners' Section) to learn about Binary and Hexadecimal number representations. However, its not very important that you learn it now, since you won't need it in the immediate future. All talk of the number of bits and sign bits in this chapter is only to help you understand where the funny numbers like 32737 come from. I'll let you know when you reach a stage when you really need to learn the binary number system.

There may be times when you are only interested in dealing with positive numbers. In such a case you can use unsigned integers. You do that by prefixing the int in variable declarations with unsigned. Now all 16 bits will be used to represent the positive integer (no sign bit). Hence the range becomes 0 to 65535. Constants should be suffixed with u to tell to compiler that they are unsigned integers.

The following program will help you understand how to use unsigned integers.

#include <stdio.h>

main()
{
     unsigned int InRange, OutRange, Neg;

     InRange = 50000U + 15000U;
     OutRange = 1024 * 64;
     Neg = 1000 - 2000;
     printf ("InRange = %u\n", InRange);
     printf ("OutRange = %u\n", OutRange);
     printf ("Neg = %u\n", Neg);
}

Note that the %d in printf's format string has been replaced by %u. You can omit the int in the declaration. Thus saying just unsigned means the same thing as unsigned int.

You'll find that only the first printf gives the expected answer. The names of the variables should make it obvious why the second and third values are wrong.

Long Integers

Does a range of 0 to 65535 still seem rather restrictive to you? Or do you need integers smaller than -32768? Its time to shift to long integers. You can have both signed and unsigned long integers. Signed long integers are declared by replacing the int by long. Unsigned long integers are declared by replacing unsigned with unsigned long. Like the u suffix used with unsigned integers, the L suffix is used with long integers.

The range of signed long integers is -2,147,483,648 to 2,147,483,647. That's about +/- 2.1 billion. The range of unsigned long integers is 0 to 4,294,967,295. That's almost 4.3 billion.

The following program illustrates the use of long integers

#include <stdio.h>

main()
{
     long Long;
     unsigned long UnsignedLong;

     Long = -20000L * 50000L;
     UnsignedLong = 10000000L / 100L;
     printf ("Long = %ld", Long);
     printf ("Long = %lu", UnsignedLong);
}

The only thing to note in this program that a prefix l has been used in the printf format string the convert unsigned to unsigned long and to convert int to long. Thus %d is replaced by %ld and %u has been replaced by %lu.

Recall that long is a reserved keyword and would make an illegal variable name, but Long is legal because C is case-sensitive.

Its generally bad programming practice to use such potentially confusing variable names such as these. I've just used these names here to emphasize the case-sensitive nature of variable names.

Character Variables

In this chapter, we've covered integers (signed, unsigned, long and unsigned long) and floating point variables. There is only one more basic data type - the character. A character variable is declared with the keyword char. While a special notation is used to represent characters like alphabets or digits (which we shall discuss shortly), character variables can also hold 8-bit integers, on which you can perform all mathematical operations. Like ints, characters can be either signed or unsigned, declared with char and unsigned char, and have range -128 to 127, and 0 to 255 respectively.

Actually, both unsigned and long are prefixes used with declarations. They are called type modifiers. There are only three basic data types - int, float and char. The type modifiers just modify the size (and hence range) the basic data types. As you'll see in a later chapter, the prefix long can also be used with floating point declarations.

Note that the long prefix can not be used with char.

Character Constants

All variables we've dealt with so far have been number variables (int, float etc.). Character variables, too can hold integers, as explained above. However, character variables are usually used to hold characters like alphabets, digits symbols (like %, $ and #). Each character is assigned a number, between 0 and 255 as defined by the ASCII code. (If you don't know about the ASCII code, you should briefly visit the ASCII Info Page) Escape sequences like \n are also valid characters. Once again, remember that \n, \\ and other escape sequences all represent single characters. A character constant is defined by enclosing it is single quotes ''.  For example-

/* This is a code fragment, not a complete program
   you should be able to write a complete program, if
   you wish to run this code
*/

char ExChar, ExDigit, ExEscape, ExNewLine;
ExChar = 'A';
ExDigit = '0';
ExEscape = '\\';
ExNewLine = '\n';

printf ("The characters are %c, %c, %c%c", ExChar, ExDigit, ExEscape, ExNewLine);

What you will get is "The characters are A, 0, \ ", and the cursor will be on the next line, though we did not include and \n in the format string. That is because the ExNewLine variable has been assigned the value '\n'.

You can also specify a character by its ASCII value in hexadecimal or octal format, using:

A character is a small integer, as mentioned earlier. C does not make an artificial distinction between a character and its ASCII value, so you can directly assign a value to a character variable, without using escape sequences, e.g.

char c;

c = 65;
printf ("%c\n", c);

This prints the letter A, because the ASCII value of (capital) A is 65.

It is important to realize the difference between '0' and 0. The former is the character representing the digit zero, its value being 48, the ASCII value of the digit zero. The latter is a character having ASCII value zero. It is the non-printable NULL character.

An interesting side effect of the fact that C allows you to readily interchange between a character and its ASCII value, and perform arithmetic on it like any other integer, is that a character representing a digit can readily be converted into an integer with that value, by subrtacting '0' from it., for example:

'5' - '0' equals five.

Initializing Variables

You must have noticed that in all programs, the first thing we do after declaring a variable is to assign a value to it. This is necessary because variables have undefined values before the programs assign them a value. In other words, you can't rely on the value of a variable before you assign one. For example, and int variable won't necessarily start out with value zero - it can have any value. (Whatever value that happens to be present in the memory area, now used by the variable you've declared)

Assigning the first value to a variable is often called initializing. Initializing a variable with a constant can be done in the declaration itself, like this-

int Num1 = 5, Num2 = 2;

This declares two variables, Num1 and Num2 and assigns values of five and two respectively. However the value must be a constant. The following would be illegal-

int Sum = Num1 + Num2;

However, the following is legal, because the value of the expression is constant-

int MagicNum = 1024 * 25;

While compiling, the compiler will evaluate 1024*25=25600, and initialize MagicNum with value 25600.

Hexadecimal and Octal Values

It is usually convenient to specify values in normal decimal format. However there are sometimes situations where it may be more convenient to used hexadecimal or octal values. This can easily be done.

The usual L and U suffixes still apply. You can use hexadecimal or octal numbers wherever you can use decimal numbers.

You usually won't want to specify values in hexadecimal or octal. However, there are times when it might be useful particularly when it comes to hardware, and system programming.

Type Conversion

Its not always convenient to have all terms in an expression of the same type. What type will result if you perform operations using, say, an int and a float? The answer is float. This is because, every int is also a float, in other words, int is a subset of float, or float is a broader type than int. In general, the compiler will convert to the broader type when an expression involving various types is involved. For example:

5 + 1.5 yeilds a float, not an int. The same, of course, applies to variables, and more complicated expressions.

What You've Learnt

In this chapter you've learnt:

Moving Ahead

Now that you've completed the chapter on Variables and Basic Data Types, you should go through the above list and make sure you understand each aspect of the chapter completely. If not, you should read the chapter again, several times if necessary. Variables are very important and will be used in practically every program in the future. Before moving on to the next chapter, you should try out the following:

The Next Step

If you've tried writing the above programs, and have understood everything done in this chapter, you're now ready to move on the next chapter.

Chapter 3: Loops

Back to Top


I Want Your Feedback - Sign My Guestbook (View Guestbook)
If you have any suggestions, contributions or comments, E-mail me at vinaypai@crosswinds.net