Format Strings
The printf() function can be used to print more than just fixed strings. This
function can also use format strings to print variables in many different formats.
A format string is just a character string with special escape sequences
that tell the function to insert variables printed in a specific format in place
of the escape sequence. The way the printf() function has been used in the
previous programs, the "Hello, world!\n" string technically is the format string;
however, it is devoid of special escape sequences. These escape sequences are
also called format parameters, and for each one found in the format string, the
function is expected to take an additional argument. Each format parameter
begins with a percent sign (%) and uses a single-character shorthand very
similar to formatting characters used by GDB’s examine command.
All of the preceding format parameters receive their data as values,
not pointers to values. There are also some format parameters that expect
pointers, such as the following.
The %s format parameter expects to be given a memory address; it prints
the data at that memory address until a null byte is encountered. The %n
format parameter is unique in that it actually writes data. It also expects to be
given a memory address, and it writes the number of bytes that have been
written so far into that memory address.
For now, our focus will just be the format parameters used for displaying
data. The fmt_strings.c program shows some examples of different format
parameters.
fmt_strings.c
#include <stdio.h>
int main() {
char string[10];
int A = -73;
unsigned int B = 31337;
strcpy(string, "sample");
Parameter Output Type
%d Decimal
%u Unsigned decimal
%x Hexadecimal
Parameter Output Type
%s String
%n Number of bytes written so far
P rogramming 49
// Example of printing with different format string
printf("[A] Dec: %d, Hex: %x, Unsigned: %u\n", A, A, A);
printf("[B] Dec: %d, Hex: %x, Unsigned: %u\n", B, B, B);
printf("[field width on B] 3: '%3u', 10: '%10u', '%08u'\n", B, B, B);
printf("[string] %s Address %08x\n", string, string);
// Example of unary address operator (dereferencing) and a %x format string
printf("variable A is at address: %08x\n", &A);
}
In the preceding code, additional variable arguments are passed to each
printf() call for every format parameter in the format string. The final printf()
call uses the argument &A, which will provide the address of the variable A.
The program’s compilation and execution are as follows.
reader@hacking:~/booksrc $ gcc -o fmt_strings fmt_strings.c
reader@hacking:~/booksrc $ ./fmt_strings
[A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: ' 31337', '00031337'
[string] sample Address bffff870
variable A is at address: bffff86c
reader@hacking:~/booksrc $
The first two calls to printf() demonstrate the printing of variables A and B,
using different format parameters. Since there are three format parameters
in each line, the variables A and B need to be supplied three times each. The
%d format parameter allows for negative values, while %u does not, since it is
expecting unsigned values.
When the variable A is printed using the %u format parameter, it appears
as a very high value. This is because A is a negative number stored in two’s
complement, and the format parameter is trying to print it as if it were an
unsigned value. Since two’s complement flips all the bits and adds one, the
very high bits that used to be zero are now one.
The third line in the example, labeled [field width on B], shows the use
of the field-width option in a format parameter. This is just an integer that
designates the minimum field width for that format parameter. However,
this is not a maximum field width—if the value to be outputted is greater
than the field width, the field width will be exceeded. This happens when 3 is
used, since the output data needs 5 bytes. When 10 is used as the field width,
5 bytes of blank space are outputted before the output data. Additionally, if a
field width value begins with a 0, this means the field should be padded with
zeros. When 08 is used, for example, the output is 00031337.
The fourth line, labeled [string], simply shows the use of the %s format
parameter. Remember that the variable string is actually a pointer containing
the address of the string, which works out wonderfully, since the %s format
parameter expects its data to be passed by reference.
50 0x200
The final line just shows the address of the variable A, using the unary
address operator to dereference the variable. This value is displayed as eight
hexadecimal digits, padded by zeros.
As these examples show, you should use %d for decimal, %u for unsigned,
and %x for hexadecimal values. Minimum field widths can be set by putting a
number right after the percent sign, and if the field width begins with 0, it
will be padded with zeros. The %s parameter can be used to print strings and
should be passed the address of the string. So far, so good.
Format strings are used by an entire family of standard I/O functions,
including scanf(), which basically works like printf() but is used for input
instead of output. One key difference is that the scanf() function expects all
of its arguments to be pointers, so the arguments must actually be variable
addresses—not the variables themselves. This can be done using pointer
variables or by using the unary address operator to retrieve the address of the
normal variables. The input.c program and execution should help explain.
input.c
#include <stdio.h>
#include <string.h>
int main() {
char message[10];
int count, i;
strcpy(message, "Hello, world!");
printf("Repeat how many times? ");
scanf("%d", &count);
for(i=0; i < count; i++)
printf("%3d - %s\n", i, message);
}
In input.c, the scanf() function is used to set the count variable. The output
below demonstrates its use.
reader@hacking:~/booksrc $ gcc -o input input.c
reader@hacking:~/booksrc $ ./input
Repeat how many times? 3
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $ ./input
Repeat how many times? 12
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
3 - Hello, world!
4 - Hello, world!
5 - Hello, world!
6 - Hello, world!
P rogramming 51
7 - Hello, world!
8 - Hello, world!
9 - Hello, world!
10 - Hello, world!
11 - Hello, world!
reader@hacking:~/booksrc $
Format strings are used quite often, so familiarity with them is valuable.
In addition, the ability to output the values of variables allows for debugging in
the program, without the use of a debugger. Having some form of immediate
feedback is fairly vital to the hacker’s learning process, and something as
simple as printing the value of a variable can allow for lots of exploitation.
0x265 Typecasting
Typecasting is simply a way to temporarily change a variable’s data type, despite
how it was originally defined. When a variable is typecast into a different
type, the compiler is basically told to treat that variable as if it were the
new data type, but only for that operation. The syntax for typecasting is
as follows:
(typecast_data_type) variable
This can be used when dealing with integers and floating-point variables,
as typecasting.c demonstrates.
typecasting.c
#include <stdio.h>
int main() {
int a, b;
float c, d;
a = 13;
b = 5;
c = a / b; // Divide using integers.
d = (float) a / (float) b; // Divide integers typecast as floats.
printf("[integers]\t a = %d\t b = %d\n", a, b);
printf("[floats]\t c = %f\t d = %f\n", c, d);
}
The results of compiling and executing typecasting.c are as follows.
reader@hacking:~/booksrc $ gcc typecasting.c
reader@hacking:~/booksrc $ ./a.out
[integers] a = 13 b = 5
[floats] c = 2.000000 d = 2.600000
reader@hacking:~/booksrc $
52 0x200
As discussed earlier, dividing the integer 13 by 5 will round down to the
incorrect answer of 2, even if this value is being stored into a floating-point
variable. However, if these integer variables are typecast into floats, they will
be treated as such. This allows for the correct calculation of 2.6.
This example is illustrative, but where typecasting really shines is when it
is used with pointer variables. Even though a pointer is just a memory address,
the C compiler still demands a data type for every pointer. One reason for
this is to try to limit programming errors. An integer pointer should only
point to integer data, while a character pointer should only point to character
data. Another reason is for pointer arithmetic. An integer is four bytes
in size, while a character only takes up a single byte. The pointer_types.c program
will demonstrate and explain these concepts further. This code uses the
format parameter %p to output memory addresses. This is shorthand meant
for displaying pointers and is basically equivalent to 0x%08x.
pointer_types.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = char_array;
int_pointer = int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the integer %d\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the char '%c'\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
In this code two arrays are defined in memory—one containing integer
data and the other containing character data. Two pointers are also defined,
one with the integer data type and one with the character data type, and they
are set to point at the start of the corresponding data arrays. Two separate for
loops iterate through the arrays using pointer arithmetic to adjust the pointer
to point at the next value. In the loops, when the integer and character values
P rogramming 53
are actually printed with the %d and %c format parameters, notice that the
corresponding printf() arguments must dereference the pointer variables.
This is done using the unary * operator and has been marked above
in bold.
reader@hacking:~/booksrc $ gcc pointer_types.c
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff7f0, which contains the integer 1
[integer pointer] points to 0xbffff7f4, which contains the integer 2
[integer pointer] points to 0xbffff7f8, which contains the integer 3
[integer pointer] points to 0xbffff7fc, which contains the integer 4
[integer pointer] points to 0xbffff800, which contains the integer 5
[char pointer] points to 0xbffff810, which contains the char 'a'
[char pointer] points to 0xbffff811, which contains the char 'b'
[char pointer] points to 0xbffff812, which contains the char 'c'
[char pointer] points to 0xbffff813, which contains the char 'd'
[char pointer] points to 0xbffff814, which contains the char 'e'
reader@hacking:~/booksrc $
Even though the same value of 1 is added to int_pointer and char_pointer
in their respective loops, the compiler increments the pointer’s addresses by
different amounts. Since a char is only 1 byte, the pointer to the next char
would naturally also be 1 byte over. But since an integer is 4 bytes, a pointer
to the next integer has to be 4 bytes over.
In pointer_types2.c, the pointers are juxtaposed such that the int_pointer
points to the character data and vice versa. The major changes to the code
are marked in bold.
pointer_types2.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = int_array; // The char_pointer and int_pointer now
int_pointer = char_array; // point to incompatible data types.
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the char '%c'\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
54 0x200
printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
The output below shows the warnings spewed forth from the compiler.
reader@hacking:~/booksrc $ gcc pointer_types2.c
pointer_types2.c: In function `main':
pointer_types2.c:12: warning: assignment from incompatible pointer type
pointer_types2.c:13: warning: assignment from incompatible pointer type
reader@hacking:~/booksrc $
In an attempt to prevent programming mistakes, the compiler gives warnings
about pointers that point to incompatible data types. But the compiler
and perhaps the programmer are the only ones that care about a pointer’s
type. In the compiled code, a pointer is nothing more than a memory
address, so the compiler will still compile the code if a pointer points to
an incompatible data type—it simply warns the programmer to anticipate
unexpected results.
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff818, which contains the char '8'
[integer pointer] points to 0xbffff81c, which contains the char '
[integer pointer] points to 0xbffff820, which contains the char '?'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f1, which contains the integer 0
[char pointer] points to 0xbffff7f2, which contains the integer 0
[char pointer] points to 0xbffff7f3, which contains the integer 0
[char pointer] points to 0xbffff7f4, which contains the integer 2
reader@hacking:~/booksrc $
Even though the int_pointer points to character data that only contains
5 bytes of data, it is still typed as an integer. This means that adding 1 to the
pointer will increment the address by 4 each time. Similarly, the char_pointer’s
address is only incremented by 1 each time, stepping through the 20 bytes of
integer data (five 4-byte integers), one byte at a time. Once again, the littleendian
byte order of the integer data is apparent when the 4-byte integer is
examined one byte at a time. The 4-byte value of 0x00000001 is actually stored
in memory as 0x01, 0x00, 0x00, 0x00.
There will be situations like this in which you are using a pointer that
points to data with a conflicting type. Since the pointer type determines the
size of the data it points to, it’s important that the type is correct. As you can
see in pointer_types3.c below, typecasting is just a way to change the type of a
variable on the fly.
P rogramming 55
pointer_types3.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = (char *) int_array; // Typecast into the
int_pointer = (int *) char_array; // pointer's data type.
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the char '%c'\n",
int_pointer, *int_pointer);
int_pointer = (int *) ((char *) int_pointer + 1);
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = (char *) ((int *) char_pointer + 1);
}
}
In this code, when the pointers are initially set, the data is typecast into
the pointer’s data type. This will prevent the C compiler from complaining
about the conflicting data types; however, any pointer arithmetic will still be
incorrect. To fix that, when 1 is added to the pointers, they must first be typecast
into the correct data type so the address is incremented by the correct
amount. Then this pointer needs to be typecast back into the pointer’s data
type once again. It doesn’t look too pretty, but it works.
reader@hacking:~/booksrc $ gcc pointer_types3.c
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff811, which contains the char 'b'
[integer pointer] points to 0xbffff812, which contains the char 'c'
[integer pointer] points to 0xbffff813, which contains the char 'd'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f4, which contains the integer 2
[char pointer] points to 0xbffff7f8, which contains the integer 3
[char pointer] points to 0xbffff7fc, which contains the integer 4
[char pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $
56 0x200
Naturally, it is far easier just to use the correct data type for pointers
in the first place; however, sometimes a generic, typeless pointer is desired.
In C, a void pointer is a typeless pointer, defined by the void keyword.
Experimenting with void pointers quickly reveals a few things about typeless
pointers. First, pointers cannot be dereferenced unless they have a type.
In order to retrieve the value stored in the pointer’s memory address, the
compiler must first know what type of data it is. Secondly, void pointers must
also be typecast before doing pointer arithmetic. These are fairly intuitive
limitations, which means that a void pointer’s main purpose is to simply hold
a memory address.
The pointer_types3.c program can be modified to use a single void
pointer by typecasting it to the proper type each time it’s used. The compiler
knows that a void pointer is typeless, so any type of pointer can be stored in a
void pointer without typecasting. This also means a void pointer must always
be typecast when dereferencing it, however. These differences can be seen in
pointer_types4.c, which uses a void pointer.
pointer_types4.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
void *void_pointer;
void_pointer = (void *) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[char pointer] points to %p, which contains the char '%c'\n",
void_pointer, *((char *) void_pointer));
void_pointer = (void *) ((char *) void_pointer + 1);
}
void_pointer = (void *) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the integer %d\n",
void_pointer, *((int *) void_pointer));
void_pointer = (void *) ((int *) void_pointer + 1);
}
}
The results of compiling and executing pointer_types4.c are as
follows.
P rogramming 57
reader@hacking:~/booksrc $ gcc pointer_types4.c
reader@hacking:~/booksrc $ ./a.out
[char pointer] points to 0xbffff810, which contains the char 'a'
[char pointer] points to 0xbffff811, which contains the char 'b'
[char pointer] points to 0xbffff812, which contains the char 'c'
[char pointer] points to 0xbffff813, which contains the char 'd'
[char pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff7f0, which contains the integer 1
[integer pointer] points to 0xbffff7f4, which contains the integer 2
[integer pointer] points to 0xbffff7f8, which contains the integer 3
[integer pointer] points to 0xbffff7fc, which contains the integer 4
[integer pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $
The compilation and output of this pointer_types4.c is basically the same
as that for pointer_types3.c. The void pointer is really just holding the memory
addresses, while the hard-coded typecasting is telling the compiler to use the
proper types whenever the pointer is used.
Since the type is taken care of by the typecasts, the void pointer is truly
nothing more than a memory address. With the data types defined by typecasting,
anything that is big enough to hold a four-byte value can work the
same way as a void pointer. In pointer_types5.c, an unsigned integer is used
to store this address.
pointer_types5.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
unsigned int hacky_nonpointer;
hacky_nonpointer = (unsigned int) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[hacky_nonpointer] points to %p, which contains the char '%c'\n",
hacky_nonpointer, *((char *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(char);
}
hacky_nonpointer = (unsigned int) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[hacky_nonpointer] points to %p, which contains the integer %d\n",
hacky_nonpointer, *((int *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(int);
}
}
58 0x200
This is rather hacky, but since this integer value is typecast into the
proper pointer types when it is assigned and dereferenced, the end result is
the same. Notice that instead of typecasting multiple times to do pointer
arithmetic on an unsigned integer (which isn’t even a pointer), the sizeof()
function is used to achieve the same result using normal arithmetic.
reader@hacking:~/booksrc $ gcc pointer_types5.c
reader@hacking:~/booksrc $ ./a.out
[hacky_nonpointer] points to 0xbffff810, which contains the char 'a'
[hacky_nonpointer] points to 0xbffff811, which contains the char 'b'
[hacky_nonpointer] points to 0xbffff812, which contains the char 'c'
[hacky_nonpointer] points to 0xbffff813, which contains the char 'd'
[hacky_nonpointer] points to 0xbffff814, which contains the char 'e'
[hacky_nonpointer] points to 0xbffff7f0, which contains the integer 1
[hacky_nonpointer] points to 0xbffff7f4, which contains the integer 2
[hacky_nonpointer] points to 0xbffff7f8, which contains the integer 3
[hacky_nonpointer] points to 0xbffff7fc, which contains the integer 4
[hacky_nonpointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $
The important thing to remember about variables in C is that the compiler
is the only thing that cares about a variable’s type. In the end, after the
program has been compiled, the variables are nothing more than memory
addresses. This means that variables of one type can easily be coerced into
behaving like another type by telling the compiler to typecast them into the
desired type.
0x266 Command-Line Arguments
Many nongraphical programs receive input in the form of command-line
arguments. Unlike inputting with scanf(), command-line arguments don’t
require user interaction after the program has begun execution. This tends
to be more efficient and is a useful input method.
In C, command-line arguments can be accessed in the main() function by
including two additional arguments to the function: an integer and a pointer
to an array of strings. The integer will contain the number of arguments, and
the array of strings will contain each of those arguments. The commandline.c
program and its execution should explain things.
commandline.c
#include <stdio.h>
int main(int arg_count, char *arg_list[]) {
int i;
printf("There were %d arguments provided:\n", arg_count);
for(i=0; i < arg_count; i++)
printf("argument #%d\t-\t%s\n", i, arg_list[i]);
}
P rogramming 59
reader@hacking:~/booksrc $ gcc -o commandline commandline.c
reader@hacking:~/booksrc $ ./commandline
There were 1 arguments provided:
argument #0 - ./commandline
reader@hacking:~/booksrc $ ./commandline this is a test
There were 5 arguments provided:
argument #0 - ./commandline
argument #1 - this
argument #2 - is
argument #3 - a
argument #4 - test
reader@hacking:~/booksrc $
The zeroth argument is always the name of the executing binary, and
the rest of the argument array (often called an argument vector) contains the
remaining arguments as strings.
Sometimes a program will want to use a command-line argument as an
integer as opposed to a string. Regardless of this, the argument is passed in
as a string; however, there are standard conversion functions. Unlike simple
typecasting, these functions can actually convert character arrays containing
numbers into actual integers. The most common of these functions is atoi(),
which is short for ASCII to integer. This function accepts a pointer to a string
as its argument and returns the integer value it represents. Observe its usage
in convert.c.
convert.c
#include <stdio.h>
void usage(char *program_name) {
printf("Usage: %s <message> <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
if(argc < 3) // If fewer than 3 arguments are used,
usage(argv[0]); // display usage message and exit.
count = atoi(argv[2]); // Convert the 2nd arg into an integer.
printf("Repeating %d times..\n", count);
for(i=0; i < count; i++)
printf("%3d - %s\n", i, argv[1]); // Print the 1st arg.
}
The results of compiling and executing convert.c are as follows.
reader@hacking:~/booksrc $ gcc convert.c
reader@hacking:~/booksrc $ ./a.out
Usage: ./a.out <message> <# of times to repeat>
60 0x200
reader@hacking:~/booksrc $ ./a.out 'Hello, world!' 3
Repeating 3 times..
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $
In the preceding code, an if statement makes sure that three arguments
are used before these strings are accessed. If the program tries to access memory
that doesn’t exist or that the program doesn’t have permission to read,
the program will crash. In C it’s important to check for these types of conditions
and handle them in program logic. If the error-checking if statement is
commented out, this memory violation can be explored. The convert2.c
program should make this more clear.
convert2.c
#include <stdio.h>
void usage(char *program_name) {
printf("Usage: %s <message> <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
// if(argc < 3) // If fewer than 3 arguments are used,
// usage(argv[0]); // display usage message and exit.
count = atoi(argv[2]); // Convert the 2nd arg into an integer.
printf("Repeating %d times..\n", count);
for(i=0; i < count; i++)
printf("%3d - %s\n", i, argv[1]); // Print the 1st arg.
}
The results of compiling and executing convert2.c are as follows.
reader@hacking:~/booksrc $ gcc convert2.c
reader@hacking:~/booksrc $ ./a.out test
Segmentation fault (core dumped)
reader@hacking:~/booksrc $
When the program isn’t given enough command-line arguments, it still
tries to access elements of the argument array, even though they don’t exist.
This results in the program crashing due to a segmentation fault.
Memory is split into segments (which will be discussed later), and some
memory addresses aren’t within the boundaries of the memory segments the
program is given access to. When the program attempts to access an address
that is out of bounds, it will crash and die in what’s called a segmentation fault.
This effect can be explored further with GDB.
P rogramming 61
reader@hacking:~/booksrc $ gcc -g convert2.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run test
Starting program: /home/reader/booksrc/a.out test
Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) where
#0 0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
#1 0xb800183c in ?? ()
#2 0x00000000 in ?? ()
(gdb) break main
Breakpoint 1 at 0x8048419: file convert2.c, line 14.
(gdb) run test
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/a.out test
Breakpoint 1, main (argc=2, argv=0xbffff894) at convert2.c:14
14 count = atoi(argv[2]); // convert the 2nd arg into an integer
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) x/3xw 0xbffff894
0xbffff894: 0xbffff9b3 0xbffff9ce 0x00000000
(gdb) x/s 0xbffff9b3
0xbffff9b3: "/home/reader/booksrc/a.out"
(gdb) x/s 0xbffff9ce
0xbffff9ce: "test"
(gdb) x/s 0x00000000
0x0: <Address 0x0 out of bounds>
(gdb) quit
The program is running. Exit anyway? (y or n) y
reader@hacking:~/booksrc $
The program is executed with a single command-line argument of test
within GDB, which causes the program to crash. The where command will
sometimes show a useful backtrace of the stack; however, in this case, the
stack was too badly mangled in the crash. A breakpoint is set on main and
the program is re-executed to get the value of the argument vector (shown in
bold). Since the argument vector is a pointer to list of strings, it is actually a
pointer to a list of pointers. Using the command x/3xw to examine the first
three memory addresses stored at the argument vector’s address shows that
they are themselves pointers to strings. The first one is the zeroth argument,
the second is the test argument, and the third is zero, which is out of bounds.
When the program tries to access this memory address, it crashes with a
segmentation fault.
62 0x200
The printf() function can be used to print more than just fixed strings. This
function can also use format strings to print variables in many different formats.
A format string is just a character string with special escape sequences
that tell the function to insert variables printed in a specific format in place
of the escape sequence. The way the printf() function has been used in the
previous programs, the "Hello, world!\n" string technically is the format string;
however, it is devoid of special escape sequences. These escape sequences are
also called format parameters, and for each one found in the format string, the
function is expected to take an additional argument. Each format parameter
begins with a percent sign (%) and uses a single-character shorthand very
similar to formatting characters used by GDB’s examine command.
All of the preceding format parameters receive their data as values,
not pointers to values. There are also some format parameters that expect
pointers, such as the following.
The %s format parameter expects to be given a memory address; it prints
the data at that memory address until a null byte is encountered. The %n
format parameter is unique in that it actually writes data. It also expects to be
given a memory address, and it writes the number of bytes that have been
written so far into that memory address.
For now, our focus will just be the format parameters used for displaying
data. The fmt_strings.c program shows some examples of different format
parameters.
fmt_strings.c
#include <stdio.h>
int main() {
char string[10];
int A = -73;
unsigned int B = 31337;
strcpy(string, "sample");
Parameter Output Type
%d Decimal
%u Unsigned decimal
%x Hexadecimal
Parameter Output Type
%s String
%n Number of bytes written so far
P rogramming 49
// Example of printing with different format string
printf("[A] Dec: %d, Hex: %x, Unsigned: %u\n", A, A, A);
printf("[B] Dec: %d, Hex: %x, Unsigned: %u\n", B, B, B);
printf("[field width on B] 3: '%3u', 10: '%10u', '%08u'\n", B, B, B);
printf("[string] %s Address %08x\n", string, string);
// Example of unary address operator (dereferencing) and a %x format string
printf("variable A is at address: %08x\n", &A);
}
In the preceding code, additional variable arguments are passed to each
printf() call for every format parameter in the format string. The final printf()
call uses the argument &A, which will provide the address of the variable A.
The program’s compilation and execution are as follows.
reader@hacking:~/booksrc $ gcc -o fmt_strings fmt_strings.c
reader@hacking:~/booksrc $ ./fmt_strings
[A] Dec: -73, Hex: ffffffb7, Unsigned: 4294967223
[B] Dec: 31337, Hex: 7a69, Unsigned: 31337
[field width on B] 3: '31337', 10: ' 31337', '00031337'
[string] sample Address bffff870
variable A is at address: bffff86c
reader@hacking:~/booksrc $
The first two calls to printf() demonstrate the printing of variables A and B,
using different format parameters. Since there are three format parameters
in each line, the variables A and B need to be supplied three times each. The
%d format parameter allows for negative values, while %u does not, since it is
expecting unsigned values.
When the variable A is printed using the %u format parameter, it appears
as a very high value. This is because A is a negative number stored in two’s
complement, and the format parameter is trying to print it as if it were an
unsigned value. Since two’s complement flips all the bits and adds one, the
very high bits that used to be zero are now one.
The third line in the example, labeled [field width on B], shows the use
of the field-width option in a format parameter. This is just an integer that
designates the minimum field width for that format parameter. However,
this is not a maximum field width—if the value to be outputted is greater
than the field width, the field width will be exceeded. This happens when 3 is
used, since the output data needs 5 bytes. When 10 is used as the field width,
5 bytes of blank space are outputted before the output data. Additionally, if a
field width value begins with a 0, this means the field should be padded with
zeros. When 08 is used, for example, the output is 00031337.
The fourth line, labeled [string], simply shows the use of the %s format
parameter. Remember that the variable string is actually a pointer containing
the address of the string, which works out wonderfully, since the %s format
parameter expects its data to be passed by reference.
50 0x200
The final line just shows the address of the variable A, using the unary
address operator to dereference the variable. This value is displayed as eight
hexadecimal digits, padded by zeros.
As these examples show, you should use %d for decimal, %u for unsigned,
and %x for hexadecimal values. Minimum field widths can be set by putting a
number right after the percent sign, and if the field width begins with 0, it
will be padded with zeros. The %s parameter can be used to print strings and
should be passed the address of the string. So far, so good.
Format strings are used by an entire family of standard I/O functions,
including scanf(), which basically works like printf() but is used for input
instead of output. One key difference is that the scanf() function expects all
of its arguments to be pointers, so the arguments must actually be variable
addresses—not the variables themselves. This can be done using pointer
variables or by using the unary address operator to retrieve the address of the
normal variables. The input.c program and execution should help explain.
input.c
#include <stdio.h>
#include <string.h>
int main() {
char message[10];
int count, i;
strcpy(message, "Hello, world!");
printf("Repeat how many times? ");
scanf("%d", &count);
for(i=0; i < count; i++)
printf("%3d - %s\n", i, message);
}
In input.c, the scanf() function is used to set the count variable. The output
below demonstrates its use.
reader@hacking:~/booksrc $ gcc -o input input.c
reader@hacking:~/booksrc $ ./input
Repeat how many times? 3
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $ ./input
Repeat how many times? 12
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
3 - Hello, world!
4 - Hello, world!
5 - Hello, world!
6 - Hello, world!
P rogramming 51
7 - Hello, world!
8 - Hello, world!
9 - Hello, world!
10 - Hello, world!
11 - Hello, world!
reader@hacking:~/booksrc $
Format strings are used quite often, so familiarity with them is valuable.
In addition, the ability to output the values of variables allows for debugging in
the program, without the use of a debugger. Having some form of immediate
feedback is fairly vital to the hacker’s learning process, and something as
simple as printing the value of a variable can allow for lots of exploitation.
0x265 Typecasting
Typecasting is simply a way to temporarily change a variable’s data type, despite
how it was originally defined. When a variable is typecast into a different
type, the compiler is basically told to treat that variable as if it were the
new data type, but only for that operation. The syntax for typecasting is
as follows:
(typecast_data_type) variable
This can be used when dealing with integers and floating-point variables,
as typecasting.c demonstrates.
typecasting.c
#include <stdio.h>
int main() {
int a, b;
float c, d;
a = 13;
b = 5;
c = a / b; // Divide using integers.
d = (float) a / (float) b; // Divide integers typecast as floats.
printf("[integers]\t a = %d\t b = %d\n", a, b);
printf("[floats]\t c = %f\t d = %f\n", c, d);
}
The results of compiling and executing typecasting.c are as follows.
reader@hacking:~/booksrc $ gcc typecasting.c
reader@hacking:~/booksrc $ ./a.out
[integers] a = 13 b = 5
[floats] c = 2.000000 d = 2.600000
reader@hacking:~/booksrc $
52 0x200
As discussed earlier, dividing the integer 13 by 5 will round down to the
incorrect answer of 2, even if this value is being stored into a floating-point
variable. However, if these integer variables are typecast into floats, they will
be treated as such. This allows for the correct calculation of 2.6.
This example is illustrative, but where typecasting really shines is when it
is used with pointer variables. Even though a pointer is just a memory address,
the C compiler still demands a data type for every pointer. One reason for
this is to try to limit programming errors. An integer pointer should only
point to integer data, while a character pointer should only point to character
data. Another reason is for pointer arithmetic. An integer is four bytes
in size, while a character only takes up a single byte. The pointer_types.c program
will demonstrate and explain these concepts further. This code uses the
format parameter %p to output memory addresses. This is shorthand meant
for displaying pointers and is basically equivalent to 0x%08x.
pointer_types.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = char_array;
int_pointer = int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the integer %d\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the char '%c'\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
In this code two arrays are defined in memory—one containing integer
data and the other containing character data. Two pointers are also defined,
one with the integer data type and one with the character data type, and they
are set to point at the start of the corresponding data arrays. Two separate for
loops iterate through the arrays using pointer arithmetic to adjust the pointer
to point at the next value. In the loops, when the integer and character values
P rogramming 53
are actually printed with the %d and %c format parameters, notice that the
corresponding printf() arguments must dereference the pointer variables.
This is done using the unary * operator and has been marked above
in bold.
reader@hacking:~/booksrc $ gcc pointer_types.c
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff7f0, which contains the integer 1
[integer pointer] points to 0xbffff7f4, which contains the integer 2
[integer pointer] points to 0xbffff7f8, which contains the integer 3
[integer pointer] points to 0xbffff7fc, which contains the integer 4
[integer pointer] points to 0xbffff800, which contains the integer 5
[char pointer] points to 0xbffff810, which contains the char 'a'
[char pointer] points to 0xbffff811, which contains the char 'b'
[char pointer] points to 0xbffff812, which contains the char 'c'
[char pointer] points to 0xbffff813, which contains the char 'd'
[char pointer] points to 0xbffff814, which contains the char 'e'
reader@hacking:~/booksrc $
Even though the same value of 1 is added to int_pointer and char_pointer
in their respective loops, the compiler increments the pointer’s addresses by
different amounts. Since a char is only 1 byte, the pointer to the next char
would naturally also be 1 byte over. But since an integer is 4 bytes, a pointer
to the next integer has to be 4 bytes over.
In pointer_types2.c, the pointers are juxtaposed such that the int_pointer
points to the character data and vice versa. The major changes to the code
are marked in bold.
pointer_types2.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = int_array; // The char_pointer and int_pointer now
int_pointer = char_array; // point to incompatible data types.
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the char '%c'\n",
int_pointer, *int_pointer);
int_pointer = int_pointer + 1;
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
54 0x200
printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = char_pointer + 1;
}
}
The output below shows the warnings spewed forth from the compiler.
reader@hacking:~/booksrc $ gcc pointer_types2.c
pointer_types2.c: In function `main':
pointer_types2.c:12: warning: assignment from incompatible pointer type
pointer_types2.c:13: warning: assignment from incompatible pointer type
reader@hacking:~/booksrc $
In an attempt to prevent programming mistakes, the compiler gives warnings
about pointers that point to incompatible data types. But the compiler
and perhaps the programmer are the only ones that care about a pointer’s
type. In the compiled code, a pointer is nothing more than a memory
address, so the compiler will still compile the code if a pointer points to
an incompatible data type—it simply warns the programmer to anticipate
unexpected results.
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff818, which contains the char '8'
[integer pointer] points to 0xbffff81c, which contains the char '
[integer pointer] points to 0xbffff820, which contains the char '?'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f1, which contains the integer 0
[char pointer] points to 0xbffff7f2, which contains the integer 0
[char pointer] points to 0xbffff7f3, which contains the integer 0
[char pointer] points to 0xbffff7f4, which contains the integer 2
reader@hacking:~/booksrc $
Even though the int_pointer points to character data that only contains
5 bytes of data, it is still typed as an integer. This means that adding 1 to the
pointer will increment the address by 4 each time. Similarly, the char_pointer’s
address is only incremented by 1 each time, stepping through the 20 bytes of
integer data (five 4-byte integers), one byte at a time. Once again, the littleendian
byte order of the integer data is apparent when the 4-byte integer is
examined one byte at a time. The 4-byte value of 0x00000001 is actually stored
in memory as 0x01, 0x00, 0x00, 0x00.
There will be situations like this in which you are using a pointer that
points to data with a conflicting type. Since the pointer type determines the
size of the data it points to, it’s important that the type is correct. As you can
see in pointer_types3.c below, typecasting is just a way to change the type of a
variable on the fly.
P rogramming 55
pointer_types3.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
char *char_pointer;
int *int_pointer;
char_pointer = (char *) int_array; // Typecast into the
int_pointer = (int *) char_array; // pointer's data type.
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the char '%c'\n",
int_pointer, *int_pointer);
int_pointer = (int *) ((char *) int_pointer + 1);
}
for(i=0; i < 5; i++) { // Iterate through the char array with the char_pointer.
printf("[char pointer] points to %p, which contains the integer %d\n",
char_pointer, *char_pointer);
char_pointer = (char *) ((int *) char_pointer + 1);
}
}
In this code, when the pointers are initially set, the data is typecast into
the pointer’s data type. This will prevent the C compiler from complaining
about the conflicting data types; however, any pointer arithmetic will still be
incorrect. To fix that, when 1 is added to the pointers, they must first be typecast
into the correct data type so the address is incremented by the correct
amount. Then this pointer needs to be typecast back into the pointer’s data
type once again. It doesn’t look too pretty, but it works.
reader@hacking:~/booksrc $ gcc pointer_types3.c
reader@hacking:~/booksrc $ ./a.out
[integer pointer] points to 0xbffff810, which contains the char 'a'
[integer pointer] points to 0xbffff811, which contains the char 'b'
[integer pointer] points to 0xbffff812, which contains the char 'c'
[integer pointer] points to 0xbffff813, which contains the char 'd'
[integer pointer] points to 0xbffff814, which contains the char 'e'
[char pointer] points to 0xbffff7f0, which contains the integer 1
[char pointer] points to 0xbffff7f4, which contains the integer 2
[char pointer] points to 0xbffff7f8, which contains the integer 3
[char pointer] points to 0xbffff7fc, which contains the integer 4
[char pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $
56 0x200
Naturally, it is far easier just to use the correct data type for pointers
in the first place; however, sometimes a generic, typeless pointer is desired.
In C, a void pointer is a typeless pointer, defined by the void keyword.
Experimenting with void pointers quickly reveals a few things about typeless
pointers. First, pointers cannot be dereferenced unless they have a type.
In order to retrieve the value stored in the pointer’s memory address, the
compiler must first know what type of data it is. Secondly, void pointers must
also be typecast before doing pointer arithmetic. These are fairly intuitive
limitations, which means that a void pointer’s main purpose is to simply hold
a memory address.
The pointer_types3.c program can be modified to use a single void
pointer by typecasting it to the proper type each time it’s used. The compiler
knows that a void pointer is typeless, so any type of pointer can be stored in a
void pointer without typecasting. This also means a void pointer must always
be typecast when dereferencing it, however. These differences can be seen in
pointer_types4.c, which uses a void pointer.
pointer_types4.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
void *void_pointer;
void_pointer = (void *) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[char pointer] points to %p, which contains the char '%c'\n",
void_pointer, *((char *) void_pointer));
void_pointer = (void *) ((char *) void_pointer + 1);
}
void_pointer = (void *) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[integer pointer] points to %p, which contains the integer %d\n",
void_pointer, *((int *) void_pointer));
void_pointer = (void *) ((int *) void_pointer + 1);
}
}
The results of compiling and executing pointer_types4.c are as
follows.
P rogramming 57
reader@hacking:~/booksrc $ gcc pointer_types4.c
reader@hacking:~/booksrc $ ./a.out
[char pointer] points to 0xbffff810, which contains the char 'a'
[char pointer] points to 0xbffff811, which contains the char 'b'
[char pointer] points to 0xbffff812, which contains the char 'c'
[char pointer] points to 0xbffff813, which contains the char 'd'
[char pointer] points to 0xbffff814, which contains the char 'e'
[integer pointer] points to 0xbffff7f0, which contains the integer 1
[integer pointer] points to 0xbffff7f4, which contains the integer 2
[integer pointer] points to 0xbffff7f8, which contains the integer 3
[integer pointer] points to 0xbffff7fc, which contains the integer 4
[integer pointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $
The compilation and output of this pointer_types4.c is basically the same
as that for pointer_types3.c. The void pointer is really just holding the memory
addresses, while the hard-coded typecasting is telling the compiler to use the
proper types whenever the pointer is used.
Since the type is taken care of by the typecasts, the void pointer is truly
nothing more than a memory address. With the data types defined by typecasting,
anything that is big enough to hold a four-byte value can work the
same way as a void pointer. In pointer_types5.c, an unsigned integer is used
to store this address.
pointer_types5.c
#include <stdio.h>
int main() {
int i;
char char_array[5] = {'a', 'b', 'c', 'd', 'e'};
int int_array[5] = {1, 2, 3, 4, 5};
unsigned int hacky_nonpointer;
hacky_nonpointer = (unsigned int) char_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[hacky_nonpointer] points to %p, which contains the char '%c'\n",
hacky_nonpointer, *((char *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(char);
}
hacky_nonpointer = (unsigned int) int_array;
for(i=0; i < 5; i++) { // Iterate through the int array with the int_pointer.
printf("[hacky_nonpointer] points to %p, which contains the integer %d\n",
hacky_nonpointer, *((int *) hacky_nonpointer));
hacky_nonpointer = hacky_nonpointer + sizeof(int);
}
}
58 0x200
This is rather hacky, but since this integer value is typecast into the
proper pointer types when it is assigned and dereferenced, the end result is
the same. Notice that instead of typecasting multiple times to do pointer
arithmetic on an unsigned integer (which isn’t even a pointer), the sizeof()
function is used to achieve the same result using normal arithmetic.
reader@hacking:~/booksrc $ gcc pointer_types5.c
reader@hacking:~/booksrc $ ./a.out
[hacky_nonpointer] points to 0xbffff810, which contains the char 'a'
[hacky_nonpointer] points to 0xbffff811, which contains the char 'b'
[hacky_nonpointer] points to 0xbffff812, which contains the char 'c'
[hacky_nonpointer] points to 0xbffff813, which contains the char 'd'
[hacky_nonpointer] points to 0xbffff814, which contains the char 'e'
[hacky_nonpointer] points to 0xbffff7f0, which contains the integer 1
[hacky_nonpointer] points to 0xbffff7f4, which contains the integer 2
[hacky_nonpointer] points to 0xbffff7f8, which contains the integer 3
[hacky_nonpointer] points to 0xbffff7fc, which contains the integer 4
[hacky_nonpointer] points to 0xbffff800, which contains the integer 5
reader@hacking:~/booksrc $
The important thing to remember about variables in C is that the compiler
is the only thing that cares about a variable’s type. In the end, after the
program has been compiled, the variables are nothing more than memory
addresses. This means that variables of one type can easily be coerced into
behaving like another type by telling the compiler to typecast them into the
desired type.
0x266 Command-Line Arguments
Many nongraphical programs receive input in the form of command-line
arguments. Unlike inputting with scanf(), command-line arguments don’t
require user interaction after the program has begun execution. This tends
to be more efficient and is a useful input method.
In C, command-line arguments can be accessed in the main() function by
including two additional arguments to the function: an integer and a pointer
to an array of strings. The integer will contain the number of arguments, and
the array of strings will contain each of those arguments. The commandline.c
program and its execution should explain things.
commandline.c
#include <stdio.h>
int main(int arg_count, char *arg_list[]) {
int i;
printf("There were %d arguments provided:\n", arg_count);
for(i=0; i < arg_count; i++)
printf("argument #%d\t-\t%s\n", i, arg_list[i]);
}
P rogramming 59
reader@hacking:~/booksrc $ gcc -o commandline commandline.c
reader@hacking:~/booksrc $ ./commandline
There were 1 arguments provided:
argument #0 - ./commandline
reader@hacking:~/booksrc $ ./commandline this is a test
There were 5 arguments provided:
argument #0 - ./commandline
argument #1 - this
argument #2 - is
argument #3 - a
argument #4 - test
reader@hacking:~/booksrc $
The zeroth argument is always the name of the executing binary, and
the rest of the argument array (often called an argument vector) contains the
remaining arguments as strings.
Sometimes a program will want to use a command-line argument as an
integer as opposed to a string. Regardless of this, the argument is passed in
as a string; however, there are standard conversion functions. Unlike simple
typecasting, these functions can actually convert character arrays containing
numbers into actual integers. The most common of these functions is atoi(),
which is short for ASCII to integer. This function accepts a pointer to a string
as its argument and returns the integer value it represents. Observe its usage
in convert.c.
convert.c
#include <stdio.h>
void usage(char *program_name) {
printf("Usage: %s <message> <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
if(argc < 3) // If fewer than 3 arguments are used,
usage(argv[0]); // display usage message and exit.
count = atoi(argv[2]); // Convert the 2nd arg into an integer.
printf("Repeating %d times..\n", count);
for(i=0; i < count; i++)
printf("%3d - %s\n", i, argv[1]); // Print the 1st arg.
}
The results of compiling and executing convert.c are as follows.
reader@hacking:~/booksrc $ gcc convert.c
reader@hacking:~/booksrc $ ./a.out
Usage: ./a.out <message> <# of times to repeat>
60 0x200
reader@hacking:~/booksrc $ ./a.out 'Hello, world!' 3
Repeating 3 times..
0 - Hello, world!
1 - Hello, world!
2 - Hello, world!
reader@hacking:~/booksrc $
In the preceding code, an if statement makes sure that three arguments
are used before these strings are accessed. If the program tries to access memory
that doesn’t exist or that the program doesn’t have permission to read,
the program will crash. In C it’s important to check for these types of conditions
and handle them in program logic. If the error-checking if statement is
commented out, this memory violation can be explored. The convert2.c
program should make this more clear.
convert2.c
#include <stdio.h>
void usage(char *program_name) {
printf("Usage: %s <message> <# of times to repeat>\n", program_name);
exit(1);
}
int main(int argc, char *argv[]) {
int i, count;
// if(argc < 3) // If fewer than 3 arguments are used,
// usage(argv[0]); // display usage message and exit.
count = atoi(argv[2]); // Convert the 2nd arg into an integer.
printf("Repeating %d times..\n", count);
for(i=0; i < count; i++)
printf("%3d - %s\n", i, argv[1]); // Print the 1st arg.
}
The results of compiling and executing convert2.c are as follows.
reader@hacking:~/booksrc $ gcc convert2.c
reader@hacking:~/booksrc $ ./a.out test
Segmentation fault (core dumped)
reader@hacking:~/booksrc $
When the program isn’t given enough command-line arguments, it still
tries to access elements of the argument array, even though they don’t exist.
This results in the program crashing due to a segmentation fault.
Memory is split into segments (which will be discussed later), and some
memory addresses aren’t within the boundaries of the memory segments the
program is given access to. When the program attempts to access an address
that is out of bounds, it will crash and die in what’s called a segmentation fault.
This effect can be explored further with GDB.
P rogramming 61
reader@hacking:~/booksrc $ gcc -g convert2.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) run test
Starting program: /home/reader/booksrc/a.out test
Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) where
#0 0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
#1 0xb800183c in ?? ()
#2 0x00000000 in ?? ()
(gdb) break main
Breakpoint 1 at 0x8048419: file convert2.c, line 14.
(gdb) run test
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/a.out test
Breakpoint 1, main (argc=2, argv=0xbffff894) at convert2.c:14
14 count = atoi(argv[2]); // convert the 2nd arg into an integer
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0xb7ec819b in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb) x/3xw 0xbffff894
0xbffff894: 0xbffff9b3 0xbffff9ce 0x00000000
(gdb) x/s 0xbffff9b3
0xbffff9b3: "/home/reader/booksrc/a.out"
(gdb) x/s 0xbffff9ce
0xbffff9ce: "test"
(gdb) x/s 0x00000000
0x0: <Address 0x0 out of bounds>
(gdb) quit
The program is running. Exit anyway? (y or n) y
reader@hacking:~/booksrc $
The program is executed with a single command-line argument of test
within GDB, which causes the program to crash. The where command will
sometimes show a useful backtrace of the stack; however, in this case, the
stack was too badly mangled in the crash. A breakpoint is set on main and
the program is re-executed to get the value of the argument vector (shown in
bold). Since the argument vector is a pointer to list of strings, it is actually a
pointer to a list of pointers. Using the command x/3xw to examine the first
three memory addresses stored at the argument vector’s address shows that
they are themselves pointers to strings. The first one is the zeroth argument,
the second is the test argument, and the third is zero, which is out of bounds.
When the program tries to access this memory address, it crashes with a
segmentation fault.
62 0x200
0 comments:
Post a Comment