Hacking Tutorials: Hacking Basics C

Hacking Basics C
Now that the idea of programming is less abstract, there are a few other
important concepts to know about C. Assembly language and computer
processors existed before higher-level programming languages, and many
modern programming concepts have evolved through time. In the same way
that knowing a little about Latin can greatly improve one’s understanding of
38 0x200
the English language, knowledge of low-level programming concepts can
assist the comprehension of higher-level ones. When continuing to the next
section, remember that C code must be compiled into machine instructions
before it can do anything.
0x261 Strings
The value "Hello, world!\n" passed to the printf() function in the previous
program is a string—technically, a character array. In C, an array is simply a
list of n elements of a specific data type. A 20-character array is simply 20
adjacent characters located in memory. Arrays are also referred to as buffers.
The char_array.c program is an example of a character array.
char_array.c
#include <stdio.h>
int main()
{
char str_a[20];
str_a[0] = 'H';
str_a[1] = 'e';
str_a[2] = 'l';
str_a[3] = 'l';
str_a[4] = 'o';
str_a[5] = ',';
str_a[6] = ' ';
str_a[7] = 'w';
str_a[8] = 'o';
str_a[9] = 'r';
str_a[10] = 'l';
str_a[11] = 'd';
str_a[12] = '!';
str_a[13] = '\n';
str_a[14] = 0;
printf(str_a);
}
The GCC compiler can also be given the -o switch to define the output
file to compile to. This switch is used below to compile the program into an
executable binary called char_array.
reader@hacking:~/booksrc $ gcc -o char_array char_array.c
reader@hacking:~/booksrc $ ./char_array
Hello, world!
reader@hacking:~/booksrc $
In the preceding program, a 20-element character array is defined as
str_a, and each element of the array is written to, one by one. Notice that the
number begins at 0, as opposed to 1. Also notice that the last character is a 0.
(This is also called a null byte.) The character array was defined, so 20 bytes
are allocated for it, but only 12 of these bytes are actually used. The null byte
P rogramming 39
at the end is used as a delimiter character to tell any function that is dealing
with the string to stop operations right there. The remaining extra bytes are
just garbage and will be ignored. If a null byte is inserted in the fifth element
of the character array, only the characters Hello would be printed by the
printf() function.
Since setting each character in a character array is painstaking and
strings are used fairly often, a set of standard functions was created for string
manipulation. For example, the strcpy() function will copy a string from a
source to a destination, iterating through the source string and copying each
byte to the destination (and stopping after it copies the null termination byte).
The order of the function’s arguments is similar to Intel assembly syntax:
destination first and then source. The char_array.c program can be rewritten
using strcpy() to accomplish the same thing using the string library. The
next version of the char_array program shown below includes string.h since
it uses a string function.
char_array2.c
#include <stdio.h>
#include <string.h>
int main() {
char str_a[20];
strcpy(str_a, "Hello, world!\n");
printf(str_a);
}
Let’s take a look at this program with GDB. In the output below, the
compiled program is opened with GDB and breakpoints are set before, in, and
after the strcpy() call shown in bold. The debugger will pause the program at
each breakpoint, giving us a chance to examine registers and memory. The
strcpy() function’s code comes from a shared library, so the breakpoint in this
function can’t actually be set until the program is executed.
reader@hacking:~/booksrc $ gcc -g -o char_array2 char_array2.c
reader@hacking:~/booksrc $ gdb -q ./char_array2
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list
1 #include <stdio.h>
2 #include <string.h>
3
4 int main() {
5 char str_a[20];
6
7 strcpy(str_a, "Hello, world!\n");
8 printf(str_a);
9 }
(gdb) break 6
Breakpoint 1 at 0x80483c4: file char_array2.c, line 6.
(gdb) break strcpy
40 0x200
Function "strcpy" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (strcpy) pending.
(gdb) break 8
Breakpoint 3 at 0x80483d7: file char_array2.c, line 8.
(gdb)
When the program is run, the strcpy() breakpoint is resolved. At each
breakpoint, we’re going to look at EIP and the instructions it points to. Notice
that the memory location for EIP at the middle breakpoint is different.
(gdb) run
Starting program: /home/reader/booksrc/char_array2
Breakpoint 4 at 0xb7f076f4
Pending breakpoint "strcpy" resolved
Breakpoint 1, main () at char_array2.c:7
7 strcpy(str_a, "Hello, world!\n");
(gdb) i r eip
eip 0x80483c4 0x80483c4 <main+16>
(gdb) x/5i $eip
0x80483c4 <main+16>: mov DWORD PTR [esp+4],0x80484c4
0x80483cc <main+24>: lea eax,[ebp-40]
0x80483cf <main+27>: mov DWORD PTR [esp],eax
0x80483d2 <main+30>: call 0x80482c4 <strcpy@plt>
0x80483d7 <main+35>: lea eax,[ebp-40]
(gdb) continue
Continuing.
Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6
(gdb) i r eip
eip 0xb7f076f4 0xb7f076f4 <strcpy+4>
(gdb) x/5i $eip
0xb7f076f4 <strcpy+4>: mov esi,DWORD PTR [ebp+8]
0xb7f076f7 <strcpy+7>: mov eax,DWORD PTR [ebp+12]
0xb7f076fa <strcpy+10>: mov ecx,esi
0xb7f076fc <strcpy+12>: sub ecx,eax
0xb7f076fe <strcpy+14>: mov edx,eax
(gdb) continue
Continuing.
Breakpoint 3, main () at char_array2.c:8
8 printf(str_a);
(gdb) i r eip
eip 0x80483d7 0x80483d7 <main+35>
(gdb) x/5i $eip
0x80483d7 <main+35>: lea eax,[ebp-40]
0x80483da <main+38>: mov DWORD PTR [esp],eax
0x80483dd <main+41>: call 0x80482d4 <printf@plt>
0x80483e2 <main+46>: leave
0x80483e3 <main+47>: ret
(gdb)
P rogramming 41
The address in EIP at the middle breakpoint is different because the
code for the strcpy() function comes from a loaded library. In fact, the
debugger shows EIP for the middle breakpoint in the strcpy() function,
while EIP at the other two breakpoints is in the main() function. I’d like to
point out that EIP is able to travel from the main code to the strcpy() code
and back again. Each time a function is called, a record is kept on a data
structure simply called the stack. The stack lets EIP return through long
chains of function calls. In GDB, the bt command can be used to backtrace the
stack. In the output below, the stack backtrace is shown at each breakpoint.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/reader/booksrc/char_array2
Error in re-setting breakpoint 4:
Function "strcpy" not defined.
Breakpoint 1, main () at char_array2.c:7
7 strcpy(str_a, "Hello, world!\n");
(gdb) bt
#0 main () at char_array2.c:7
(gdb) cont
Continuing.
Breakpoint 4, 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0 0xb7f076f4 in strcpy () from /lib/tls/i686/cmov/libc.so.6
#1 0x080483d7 in main () at char_array2.c:7
(gdb) cont
Continuing.
Breakpoint 3, main () at char_array2.c:8
8 printf(str_a);
(gdb) bt
#0 main () at char_array2.c:8
(gdb)
At the middle breakpoint, the backtrace of the stack shows its record of
the strcpy() call. Also, you may notice that the strcpy() function is at a slightly
different address during the second run. This is due to an exploit protection
method that is turned on by default in the Linux kernel since 2.6.11. We will
talk about this protection in more detail later.
0x262 Signed, Unsigned, Long, and Short
By default, numerical values in C are signed, which means they can be both
negative and positive. In contrast, unsigned values don’t allow negative numbers.
Since it’s all just memory in the end, all numerical values must be stored
in binary, and unsigned values make the most sense in binary. A 32-bit
unsigned integer can contain values from 0 (all binary 0s) to 4,294,967,295
(all binary 1s). A 32-bit signed integer is still just 32 bits, which means it can
42 0x200
only be in one of 232 possible bit combinations. This allows 32-bit signed
integers to range from −2,147,483,648 to 2,147,483,647. Essentially, one of
the bits is a flag marking the value positive or negative. Positively signed values
look the same as unsigned values, but negative numbers are stored differently
using a method called two’s complement. Two’s complement represents negative
numbers in a form suited for binary adders—when a negative value in
two’s complement is added to a positive number of the same magnitude, the
result will be 0. This is done by first writing the positive number in binary, then
inverting all the bits, and finally adding 1. It sounds strange, but it works and
allows negative numbers to be added in combination with positive numbers
using simple binary adders.
This can be explored quickly on a smaller scale using pcalc, a simple
programmer’s calculator that displays results in decimal, hexadecimal, and
binary formats. For simplicity’s sake, 8-bit numbers are used in this example.
reader@hacking:~/booksrc $ pcalc 0y01001001
73 0x49 0y1001001
reader@hacking:~/booksrc $ pcalc 0y10110110 + 1
183 0xb7 0y10110111
reader@hacking:~/booksrc $ pcalc 0y01001001 + 0y10110111
256 0x100 0y100000000
reader@hacking:~/booksrc $
First, the binary value 01001001 is shown to be positive 73. Then all the
bits are flipped, and 1 is added to result in the two’s complement representation
for negative 73, 10110111. When these two values are added together,
the result of the original 8 bits is 0. The program pcalc shows the value 256
because it’s not aware that we’re only dealing with 8-bit values. In a binary
adder, that carry bit would just be thrown away because the end of the variable’s
memory would have been reached. This example might shed some
light on how two’s complement works its magic.
In C, variables can be declared as unsigned by simply prepending the
keyword unsigned to the declaration. An unsigned integer would be declared
with unsigned int. In addition, the size of numerical variables can be extended
or shortened by adding the keywords long or short. The actual sizes will vary
depending on the architecture the code is compiled for. The language of C
provides a macro called sizeof() that can determine the size of certain data
types. This works like a function that takes a data type as its input and returns
the size of a variable declared with that data type for the target architecture.
The datatype_sizes.c program explores the sizes of various data types, using
the sizeof() function.
datatype_sizes.c
#include <stdio.h>
int main() {
printf("The 'int' data type is\t\t %d bytes\n", sizeof(int));
P rogramming 43
printf("The 'unsigned int' data type is\t %d bytes\n", sizeof(unsigned int));
printf("The 'short int' data type is\t %d bytes\n", sizeof(short int));
printf("The 'long int' data type is\t %d bytes\n", sizeof(long int));
printf("The 'long long int' data type is %d bytes\n", sizeof(long long int));
printf("The 'float' data type is\t %d bytes\n", sizeof(float));
printf("The 'char' data type is\t\t %d bytes\n", sizeof(char));
}
This piece of code uses the printf() function in a slightly different way.
It uses something called a format specifier to display the value returned from
the sizeof() function calls. Format specifiers will be explained in depth later,
so for now, let’s just focus on the program’s output.
reader@hacking:~/booksrc $ gcc datatype_sizes.c
reader@hacking:~/booksrc $ ./a.out
The 'int' data type is 4 bytes
The 'unsigned int' data type is 4 bytes
The 'short int' data type is 2 bytes
The 'long int' data type is 4 bytes
The 'long long int' data type is 8 bytes
The 'float' data type is 4 bytes
The 'char' data type is 1 bytes
reader@hacking:~/booksrc $
As previously stated, both signed and unsigned integers are four bytes in
size on the x86 architecture. A float is also four bytes, while a char only needs
a single byte. The long and short keywords can also be used with floating-point
variables to extend and shorten their sizes.
0x263 Pointers
The EIP register is a pointer that “points” to the current instruction during a
program’s execution by containing its memory address. The idea of pointers
is used in C, also. Since the physical memory cannot actually be moved, the
information in it must be copied. It can be very computationally expensive to
copy large chunks of memory to be used by different functions or in different
places. This is also expensive from a memory standpoint, since space for
the new destination copy must be saved or allocated before the source can be
copied. Pointers are a solution to this problem. Instead of copying a large
block of memory, it is much simpler to pass around the address of the beginning
of that block of memory.
Pointers in C can be defined and used like any other variable type.
Since memory on the x86 architecture uses 32-bit addressing, pointers are
also 32 bits in size (4 bytes). Pointers are defined by prepending an asterisk (*)
to the variable name. Instead of defining a variable of that type, a pointer is
defined as something that points to data of that type. The pointer.c program
is an example of a pointer being used with the char data type, which is only
1 byte in size.
44 0x200
pointer.c
#include <stdio.h>
#include <string.h>
int main() {
char str_a[20]; // A 20-element character array
char *pointer; // A pointer, meant for a character array
char *pointer2; // And yet another one
strcpy(str_a, "Hello, world!\n");
pointer = str_a; // Set the first pointer to the start of the array.
printf(pointer);
pointer2 = pointer + 2; // Set the second one 2 bytes further in.
printf(pointer2); // Print it.
strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
printf(pointer); // Print again.
}
As the comments in the code indicate, the first pointer is set at the beginning
of the character array. When the character array is referenced like this,
it is actually a pointer itself. This is how this buffer was passed as a pointer to
the printf() and strcpy() functions earlier. The second pointer is set to the
first pointer’s address plus two, and then some things are printed (shown in
the output below).
reader@hacking:~/booksrc $ gcc -o pointer pointer.c
reader@hacking:~/booksrc $ ./pointer
Hello, world!
llo, world!
Hey you guys!
reader@hacking:~/booksrc $
Let’s take a look at this with GDB. The program is recompiled, and a
breakpoint is set on the tenth line of the source code. This will stop the
program after the "Hello, world!\n" string has been copied into the str_a
buffer and the pointer variable is set to the beginning of it.
reader@hacking:~/booksrc $ gcc -g -o pointer pointer.c
reader@hacking:~/booksrc $ gdb -q ./pointer
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list
1 #include <stdio.h>
2 #include <string.h>
3
4 int main() {
5 char str_a[20]; // A 20-element character array
6 char *pointer; // A pointer, meant for a character array
P rogramming 45
7 char *pointer2; // And yet another one
8
9 strcpy(str_a, "Hello, world!\n");
10 pointer = str_a; // Set the first pointer to the start of the array.
(gdb)
11 printf(pointer);
12
13 pointer2 = pointer + 2; // Set the second one 2 bytes further in.
14 printf(pointer2); // Print it.
15 strcpy(pointer2, "y you guys!\n"); // Copy into that spot.
16 printf(pointer); // Print again.
17 }
(gdb) break 11
Breakpoint 1 at 0x80483dd: file pointer.c, line 11.
(gdb) run
Starting program: /home/reader/booksrc/pointer
Breakpoint 1, main () at pointer.c:11
11 printf(pointer);
(gdb) x/xw pointer
0xbffff7e0: 0x6c6c6548
(gdb) x/s pointer
0xbffff7e0: "Hello, world!\n"
(gdb)
When the pointer is examined as a string, it’s apparent that the given
string is there and is located at memory address 0xbffff7e0. Remember that
the string itself isn’t stored in the pointer variable—only the memory address
0xbffff7e0 is stored there.
In order to see the actual data stored in the pointer variable, you must
use the address-of operator. The address-of operator is a unary operator,
which simply means it operates on a single argument. This operator is just
an ampersand (&) prepended to a variable name. When it’s used, the address
of that variable is returned, instead of the variable itself. This operator exists
both in GDB and in the C programming language.
(gdb) x/xw &pointer
0xbffff7dc: 0xbffff7e0
(gdb) print &pointer
$1 = (char **) 0xbffff7dc
(gdb) print pointer
$2 = 0xbffff7e0 "Hello, world!\n"
(gdb)
When the address-of operator is used, the pointer variable is shown to
be located at the address 0xbffff7dc in memory, and it contains the address
0xbffff7e0.
The address-of operator is often used in conjunction with pointers, since
pointers contain memory addresses. The addressof.c program demonstrates
the address-of operator being used to put the address of an integer variable
into a pointer. This line is shown in bold below.
46 0x200
addressof.c
#include <stdio.h>
int main() {
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // put the address of int_var into int_ptr
}
The program itself doesn’t actually output anything, but you can probably
guess what happens, even before debugging with GDB.
reader@hacking:~/booksrc $ gcc -g addressof.c
reader@hacking:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) list
1 #include <stdio.h>
2
3 int main() {
4 int int_var = 5;
5 int *int_ptr;
6
7 int_ptr = &int_var; // Put the address of int_var into int_ptr.
8 }
(gdb) break 8
Breakpoint 1 at 0x8048361: file addressof.c, line 8.
(gdb) run
Starting program: /home/reader/booksrc/a.out
Breakpoint 1, main () at addressof.c:8
8 }
(gdb) print int_var
$1 = 5
(gdb) print &int_var
$2 = (int *) 0xbffff804
(gdb) print int_ptr
$3 = (int *) 0xbffff804
(gdb) print &int_ptr
$4 = (int **) 0xbffff800
(gdb)
As usual, a breakpoint is set and the program is executed in the
debugger. At this point the majority of the program has executed. The first
print command shows the value of int_var, and the second shows its address
using the address-of operator. The next two print commands show that
int_ptr contains the address of int_var, and they also show the address of
the int_ptr for good measure.
P rogramming 47
An additional unary operator called the dereference operator exists for use
with pointers. This operator will return the data found in the address the
pointer is pointing to, instead of the address itself. It takes the form of an
asterisk in front of the variable name, similar to the declaration of a pointer.
Once again, the dereference operator exists both in GDB and in C. Used in
GDB, it can retrieve the integer value int_ptr points to.
(gdb) print *int_ptr
$5 = 5
A few additions to the addressof.c code (shown in addressof2.c) will
demonstrate all of these concepts. The added printf() functions use format
parameters, which I’ll explain in the next section. For now, just focus on the
program’s output.
addressof2.c
#include <stdio.h>
int main() {
int int_var = 5;
int *int_ptr;
int_ptr = &int_var; // Put the address of int_var into int_ptr.
printf("int_ptr = 0x%08x\n", int_ptr);
printf("&int_ptr = 0x%08x\n", &int_ptr);
printf("*int_ptr = 0x%08x\n\n", *int_ptr);
printf("int_var is located at 0x%08x and contains %d\n", &int_var, int_var);
printf("int_ptr is located at 0x%08x, contains 0x%08x, and points to %d\n\n",
&int_ptr, int_ptr, *int_ptr);
}
The results of compiling and executing addressof2.c are as follows.
reader@hacking:~/booksrc $ gcc addressof2.c
reader@hacking:~/booksrc $ ./a.out
int_ptr = 0xbffff834
&int_ptr = 0xbffff830
*int_ptr = 0x00000005
int_var is located at 0xbffff834 and contains 5
int_ptr is located at 0xbffff830, contains 0xbffff834, and points to 5
reader@hacking:~/booksrc $
When the unary operators are used with pointers, the address-of operator
can be thought of as moving backward, while the dereference operator
moves forward in the direction the pointer is pointing.
48 0x200

Hacking Tutorials

Pages

About Me

Wednesday, September 3, 2014

Hacking Basics C

0 comments:

Post a Comment

Followers

Blog Archive