A man is only as good as his tools - this, however, does not give him an excuse to be willfully ignorant of what his tools do! In that spirit, let us sprint onto the road of discovery and explore the internals of disassembling.
* On a side note, I do ask you to excuse any rust that has seemed to creep up on me with age! *
Most of us have experience with disassembling - hell, OllyDbg's main view is nothing but disassembly! And yes, before the cries emerge, there are technical differences between assembly and disassembly, though minute, that make both worthy of their own category (the biggest of these being lack of comments, original labels, and original function calls in disassembly). But do not fret, as writing a dis-assembler is actually far easier than one thinks (especially for those who have experience with DFA machines!)
* Before we continue, I would like to address an issue that seems to plague the internet:
- Assembly
- Assembler
- assembling
These words do not mean the same thing (and eo ipso disassembly, dis-assembler, and disassembling do not either), but it is an easy mistake to make. Assembly (I vaguely recall writing about this before) is the language, in the same vain as C, Java, Ruby, etc. - an assembler is the program that converts assembly into an object file (much like a compiler for high-level languages). You wouldn't call C++ "Compiler," so make the weakened Assembly gods happy by not calling Assembly "Assembler." As for the last term, that is the present-tense verb for the process of an assembler; when running an assembler, you are assembling the code. *
Despite its name, it is easiest to think of disassembling as a process of reconstruction, whereby we build a program back up from just bytes. And, like most reconstruction processes, our job will be infinitely easier if we understand how the sad program got to its current state.
* Please note that like pointers, this is a topic people have written entire theses on. The fact that my explanation will be little more than a paragraph should let you know that it is not comprehensive at all. *
Modern compilers (thankfully) take much of the work out of building a program - you include all your files in a project, hit "Build," and magically an executable appears. Behind the scenes, the build process itself consists of the four main steps:
1. Preprocessing
2. Compiling
3. Assembling
3. Linking
* For this example, we will assume a traditional structure high-level language (or HLL) such as C or C++. For comparison, Java undergoes a similar process, the compiling step producing byte-code instead of an object file (which is then executed by the JVM). I must admit I am less familiar with scripting languages, but they often skip the last three steps and instead execute their code dynamically, usually through use of a basic emulator or state machine. *
The preprocessor goes through your code and rips out everything the computer doesn't care about - namely comments, white-space, and various other non-code elements. It will also go through and replace bits of your code (e.g., replacing #defined elements with their true data, substituting constants in where it makes sense, and optimising code). How far it goes depends on the compiler, as some will compile your code as-is, and others will replace statements like:
Code:
int x = 5;
if( i == x )
{ ... } With:
Code:
if( i == 5 )
{ ... } Though I speak like this of one step, this process is normally broken up into several different tasks, executing different tools at each part, and reiterating through many of the steps multiple times (especially when it comes to optimising code - often different tools will be responsible for optimising loops, conditionals, etc.)
The code then goes to the second step, compiling, which will translate the code to assembly. As an example:
I touched briefly on keywords in a previous tutorial, and this is why they are important, as the compiler can only translate things it has well-defined explanations of. Upon hitting our code, the compiler will match to an if conditional, and substitute in the following assembly:
Code:
mov eax, dword ptr ds:[0x101EBO1A] //example location of i
cmp eax, 5
jne _false
;body of the conditional
_false:
Remember when I talked about things the preprocessor rips out? Variables are another one of those things us stupid humans need - the computer just substitutes in their memory addresses. So really, the code coming into the compiler would look like:
Code:
if( *0x101EBO1A == 5 )
{
} Following this, the code will again be run through a series of optimisers before arriving at the assembler which will assemble the code generated from compiling and produce an object file. An object file for all intents and purposes is an executable file, except for three (albeit, huge) differences:
1. No external calls to libraries point to correct locations.
2. No calls to other object files are valid. (Remember, an object file is created for each .c (or likewise) file!)
3. The file is not packed in a valid executable format.
Enter the linker, which links the calls in the object file to the correct libraries and packs the program to run on the target system (for Windows, the PE format).
Shew, that was quite the process - and what do we even get for so much work! Just a bunch of bytes! Outrageous!
This is actually an important thing to keep in mind - though executables have magic properties, they are nothing more than a series of bytes with a well-documented format. I promise you, no scary monsters lurking in there!
* For the rest of this tutorial we will be using the following code to test on:
Code:
.486
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
include \masm32\include\user32.inc
includelib \masm32\lib\user32.lib
.code
_start:
xor eax,eax
mov eax,5
mov ebx,8
push eax
pop eax
xor eax,eax
xor ebx,ebx
test ebx,ebx
test eax,eax
push ebx
pop ebx
cmp ebx,0
cmp eax,0
je @_moo
push 1
push 0
push 0
push 0
call MessageBoxA
push 4Dh
call GetAsyncKeyState
push 0
push 0
push 0
push 0
call ReadFile
@_moo:
push 0
call ExitProcess
end _start It does nothing, but it provides us a good base of instructions to test our dis-assembler on. Assembly is the language of choice - both to give us increased visibility of our progress and to ensure that mean Mr. Compiler doesn't destroy our code. You can either compile this yourself using masm32, or download the program "test.exe" attached to this post. *
Before we try to disassembly this, let us calm our nerves and start out just dumping the bytes of the file to reaffirm our previous point that executables are nothing but some lame bytes.
Create a console application, set it up as an empty project, add a "main.cpp," and then add the following code:
Code:
#include <stdio.h>
#include <Windows.h>
int main( int argc, char** argv )
{
FILE *f = NULL;
BYTE buffer[ 4096 ] = { 0 };
f = fopen( "test.exe", "r" );
if( f != NULL )
{
fread( buffer, 4096, 1, f );
fclose( f );
for( int i = 0; i < 4096; i++ )
{
if( buffer[ i ] != 0x00 )
printf( "%X", buffer[ i ] );
else
printf( " " );
}
}
else
{
printf( "File Not Found" );
}
getchar( );
return 0;
} * Remember kids, hard-coding things is bad, but we are just testing ideas here, so we can get away with it. *
* Make sure to adjust fopen to open to wherever you placed test.exe! *
This program is rather basic - it simply reads in all the bytes of a file, and then displays their hexadecimal form, with the caveat that it will skip null bytes (to help us distinguish different sections). Running the program will produce the following output:

Why highlight that random section in red? Well if we open up our program in Olly...

... we will see that our original thesis of programs not being magical are correct, as the opcodes for each instruction are stored in plain-sight!
Now, of course, the question arises of what an opcode is - I suppose we can stop being the cool kids and label it by its true term, "operation code." For each assembly instruction (push, mov, cmp, etc.) there exists a set of bytes that "represent" that instruction to the processor - to elaborate, when the processor is running our code and encounters 0x33C0, it knows that it needs to push eax on the stack. When it encounters 0xE8, it knows it needs to shift EIP to point to a new line of code. And so on, and so forth. Keep in mind that these opcodes shift for each processor - most modern Windows machines use the x86/x64 architecture as their foundation. While we could cover the internals of each processor and how they work, such a thing will wait for a different day.
* "But, but, attila! I thought programs were all in binary!" Well yes, but they are also in hexadecimal - I've done too much homework with this to ever want to approach the issue again, but it is important to remember that the difference between our view and binary is just in the way it is represented. The same opcodes are present in the binary form, just represented with 0's and 1's.
As an analogy, imagine the following - "three," "3," "5-2:" we see three ways of representing three. One is in English, one in the decimal system, and one represented in equation form - three different mediums. Despite this, the data is still the same - likewise with our previous argument with hexadecimal and binary. *
So we know our code is in there, but what remains of all the crap cluttered at the top? Enter the PE header (it would appear the linker has competition).
To admit my faults, my versatility with the PE header is rather limited, but here is a crash course on it: in the most basic of senses, the PE header contains all the information that Windows needs to load and execute the code correctly, including sections, references to external libraries, data, and random other bits of information. It's basic layout is as follows:

* For a more detailed look at the PE structure and format, check out Icezlion's series on it at http://win32assembly.online.fr/pe-tut1.html (to view other sections of it, change the 1 at the end). *
We could care less about the DOS header, other than its e_lfanew member which points to the start of the NT header. Of the NT header, we are primarily concerned with section-related stuff.
Before we get carried away with sections, however, let us slightly modify our original file dumper and reanalyse our executable to get a feel of the format:
Code:
#include <stdio.h>
#include <Windows.h>
int main( int argc, char** argv )
{
FILE *f = NULL;
BYTE buffer[ 4096 ] = { 0 };
f = fopen( "test.exe", "r" );
if( f != NULL )
{
fread( buffer, 4096, 1, f );
fclose( f );
for( int i = 0; i < 4096; i++ )
{
//if( buffer[ i ] != 0x00 )
printf( "%X", buffer[ i ] );
//else
//printf( " " );
}
}
else
{
printf( "File Not Found" );
}
getchar( );
return 0;
} * Yes, all we are doing here is commenting out the code and printing everything.*
This will produce the following result:

We see that one part of that big hunk of crap is the PE header (highlighted in red), complete with the DOS magic number and PE magic number, which are nothing more than some bytes that label valid PE files. The purple highlights our first section header, but let us delay again and instead write some code to read the PE header!
* Most of this is just Icezelion's code converted to C. *
Code:
#include <Windows.h>
#include <stdio.h>
int main( int argc, char** argv )
{
HANDLE hFile = NULL, hFileMappingObject = NULL;
LPVOID base = NULL;
PIMAGE_DOS_HEADER image_dos_header;
PIMAGE_NT_HEADERS image_nt_header;
PIMAGE_SECTION_HEADER image_section_header;
hFile = CreateFile( "test.exe", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
if( hFile != INVALID_HANDLE_VALUE )
{
hFileMappingObject = CreateFileMapping( hFile, NULL, PAGE_READONLY, 0, 0, NULL );
base = MapViewOfFile( hFileMappingObject, FILE_MAP_READ, 0, 0, 0 );
image_dos_header = (PIMAGE_DOS_HEADER) base;
if( image_dos_header->e_magic != IMAGE_DOS_SIGNATURE )
{
printf( "DOS magic number invalid" );
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
getchar();
return 0;
}
image_nt_header = (PIMAGE_NT_HEADERS) ((DWORD)base + image_dos_header->e_lfanew );
if( image_nt_header->Signature != IMAGE_NT_SIGNATURE )
{
printf( "Not a valid PE file" );
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
getchar( );
return 0;
}
//read in our sections
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
}
getchar();
return 0;
} * If Visual Studio complains about "test.exe", Right-click on project->Select Properties->and Select Use Multi-Byte Character Set. *
I typically like to avoid throwing out large sections of code, but if we take this line-by-line, it will be less of a shock!
Code:
HANDLE hFile = NULL, hFileMappingObject = NULL;
LPVOID base = NULL;
PIMAGE_DOS_HEADER image_dos_header;
PIMAGE_NT_HEADERS image_nt_header;
PIMAGE_SECTION_HEADER image_section_header; The two handles will be placeholders for CreateFile and CreateFileMapping. Base will contain the base address of our file when mapped into memory, and will be the basic block we reference everything with.
As for the three latter elements, they represent data structures that model the DOS, PE, and Section headers. They are declared as pointers (since we will have them point at specific memory from the mapping).
Code:
hFile = CreateFile( "test.exe", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
if( hFile != INVALID_HANDLE_VALUE )
{ CreateFile will return a handle to our opened file, similar to fopen. So then why CreateFile? Because we need a handle for CreateFileMapping, and because CreateFile will allow us to easily modify our file pointer (current location in the file) so that we can easily jump around when reading the data in.
Code:
hFileMappingObject = CreateFileMapping( hFile, NULL, PAGE_READONLY, 0, 0, NULL );
base = MapViewOfFile( hFileMappingObject, FILE_MAP_READ, 0, 0, 0 );
image_dos_header = (PIMAGE_DOS_HEADER) base; CreateFileMapping will simply create a file mapping object for a file, which we will then pass into MapViewOfFile, which will load the file into memory and return the starting address of our mapped data. Since the DOS header is the first element of any PE file, it would make sense to point our DOS Header structure to the base.
Code:
if( image_dos_header->e_magic != IMAGE_DOS_SIGNATURE )
{
printf( "DOS magic number invalid" );
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
getchar();
return 0;
} Just to make sure, we check to make sure our DOS magic number is present, and that this is a valid DOS file. If not, we bail and clean up everything.
Code:
image_nt_header = (PIMAGE_NT_HEADERS) ((DWORD)base + image_dos_header->e_lfanew );
if( image_nt_header->Signature != IMAGE_NT_SIGNATURE )
{
printf( "Not a valid PE file" );
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
getchar( );
return 0;
} Next we do a similar process with the NT header, taking into account that the e_lfanew in the DOS header points to the NT header, and adding that to base. We then check to make sure this is actually a PE file.
If everything works, you should be able to open a variety of files (change test.exe to point to things like pdfs), and verify that our program picks out files that aren't executables!
I have now succeeded in substantially delaying talking about sections - for no good reason either! The section table is just an array of sections, where sections are just grouping of data by their common attributes (think .text, .rsrc, etc.). As referenced in the diagram earlier, a section is made up of a section header, which contains various tidbits (like name, size, and address) and the raw data that makes up the section.
* I cannot write with Korean music in the background. *
Since programs are made up of multiple sections, we now want to leverage our data to walk the section table:
* Ever want to sound like you know what you're talking about? Throw around the term "walking the section table" liberally. *
Code:
int numOfSections = 0;
DWORD preferredBase = 0, sectionHeaderBase = 0;
...
//read in our sections
numOfSections = image_nt_header->FileHeader.NumberOfSections;
preferredBase = image_nt_header->OptionalHeader.AddressOfEntryPoint + image_nt_header->OptionalHeader.ImageBase;
sectionHeaderBase = (DWORD)base + image_dos_header->e_lfanew + sizeof( IMAGE_NT_HEADERS ); There are three steps to this code:
1. First, we read the total number of sections from the NT header, so that we know how many times to iterate, then
2. We grab the preferred base of the application, by adding the base of the image and the application's entry point. This isn't directly related to the section table, but we will need it later.
3. Finally, we grab the base of the first section header, which directly follows the NT header.
With all that in hand, we can now iterate through all the sections, and start displaying information on them!
Code:
for( int i = 0; i < numOfSections; i++ )
{
image_section_header = (PIMAGE_SECTION_HEADER) sectionHeaderBase;
printf( "%s %x %x %x %x %x\n", image_section_header->Name, image_section_header->Misc.VirtualSize,
image_section_header->VirtualAddress, image_section_header->SizeOfRawData, image_section_header->PointerToRawData,
image_section_header->Characteristics );
//find our section data
//parse our data
sectionHeaderBase += sizeof( IMAGE_SECTION_HEADER );
} We do not need all these elements, but it’s a good idea to print them to make sure we are pulling down everything correctly. If we run the code, we are presented with the following:

It looks like our test application has three sections (we primarily care about .text for now, since that houses the code). If we take one last trip back to our file dumper - I promise this is the last - we see the following:

Guess what 2e74657874 is in ascii!
* It's .text. I knew you weren't going to look it up. *
We are finally pulling down our data correctly, so let us now finish this dis-assembler up by both finding the relevant section data and parsing it (lots of code incoming):
Code:
BYTE *buffer = NULL;
DWORD dwBytesRead = 0;
...
for( int i = 0; i < numOfSections; i++ )
{
image_section_header = (PIMAGE_SECTION_HEADER) sectionHeaderBase;
printf( "%s %x %x %x %x %x\n", image_section_header->Name, image_section_header->Misc.VirtualSize,
image_section_header->VirtualAddress, image_section_header->SizeOfRawData, image_section_header->PointerToRawData,
image_section_header->Characteristics );
//find our section data
buffer = new BYTE[ image_section_header->SizeOfRawData ];
SetFilePointer( hFile, image_section_header->PointerToRawData, NULL, FILE_BEGIN );
ReadFile( hFile, buffer, image_section_header->SizeOfRawData, &dwBytesRead, NULL );
//parse our data
for( int j = 0; j < image_section_header->SizeOfRawData; j++ )
{
printf( "%x:\t", preferredBase + j );
if( buffer[ j ] == 0x33 )
{
printf( "xor " );
if( buffer[ j + 1] == 0xC0 )
printf("eax,eax" );
else if( buffer[ j + 1] == 0xDB )
printf("ebx,ebx" );
printf( "\n" );
j++;
}
else if( buffer[ j ] == 0x83 )
{
printf( "cmp" );
if( buffer[ j + 1 ] == 0xFB )
printf( " ebx, %x", buffer[ j + 2 ] );
else if( buffer[ j + 1 ] == 0xF8 )
printf( " eax, %x", buffer[ j + 2 ] );
printf( "\n" );
j += 2;
}
else if( buffer[ j ] == 0x85 )
{
printf( "test " );
if( buffer[ j + 1 ] == 0xDB )
printf( "ebx,ebx" );
else if( buffer[ j + 1] == 0xC0 )
printf("eax,eax" );
printf( "\n" );
j++;
}
else if( buffer[ j ] == 0x74 )
{
printf( "je short %x\n", preferredBase + j + 2 + buffer[ j + 1 ] );
j++;
}
else if( buffer[ j ] == 0x53 )
{
printf( "push ebx\n" );
}
else if( buffer[ j ] == 0xB8 )
{
printf("mov eax,%x\n", buffer[ j + 1 ] );
j+=4;
}
else if( buffer[ j ] == 0xBB )
{
printf("mov ebx %x\n", buffer[ j + 1 ] );
j+=4;
}
else if( buffer[ j ] == 0x50 )
{
printf("push eax\n" );
}
else if( buffer[ j ] == 0x58 )
{
printf("pop eax\n" );
}
else if( buffer[ j ] == 0x6A )
{
printf( "push %d\n", buffer[ j + 1 ] );
j++;
}
else if( buffer[ j ] == 0x5B )
{
printf( "pop ebx\n" );
}
else if( buffer[ j ] == 0xE8 )
{
printf( "call %x%x%x%x\n", buffer[ j + 1 ], buffer[ j + 2 ], buffer[ j + 3 ], buffer[ j + 4 ] );
j+=4;
}
else if( buffer[ j ] == 0xFF && buffer[ j + 1 ] == 0x25 )
{
printf( "jmp dword ptr ds:[%x%x%x%x]\n", buffer[ j + 2 ], buffer[ j + 3], buffer[ j + 4 ], buffer[ j + 5 ] );
j+=5;
}
else
printf( "db %x\n", buffer[ j ] );
}
printf("----------------------------------------------------------------\n" );
delete[] buffer;
sectionHeaderBase += sizeof( IMAGE_SECTION_HEADER );
} Again, let us take this step-by-step:
Code:
buffer = new BYTE[ image_section_header->SizeOfRawData ];
SetFilePointer( hFile, image_section_header->PointerToRawData, NULL, FILE_BEGIN );
ReadFile( hFile, buffer, image_section_header->SizeOfRawData, &dwBytesRead, NULL ); This section is rather straight-forward - we initialise a buffer the size of the section's data, point our file pointer to where our raw data resides (in our case, the code), and then read the code into the buffer.
Now for the parsing:
Code:
for( int j = 0; j < image_section_header->SizeOfRawData; j++ )
{
printf( "%x:\t", preferredBase + j );
if( buffer[ j ] == 0x33 )
{
printf( "xor " );
if( buffer[ j + 1] == 0xC0 )
printf("eax,eax" );
else if( buffer[ j + 1] == 0xDB )
printf("ebx,ebx" );
printf( "\n" );
j++;
}
else if( buffer[ j ] == 0x83 )
{
printf( "cmp" );
if( buffer[ j + 1 ] == 0xFB )
printf( " ebx, %x", buffer[ j + 2 ] );
else if( buffer[ j + 1 ] == 0xF8 )
printf( " eax, %x", buffer[ j + 2 ] );
printf( "\n" );
j += 2;
}
else if( buffer[ j ] == 0x85 )
{
printf( "test " );
if( buffer[ j + 1 ] == 0xDB )
printf( "ebx,ebx" );
else if( buffer[ j + 1] == 0xC0 )
printf("eax,eax" );
printf( "\n" );
j++;
}
else if( buffer[ j ] == 0x74 )
{
printf( "je short %x\n", preferredBase + j + 2 + buffer[ j + 1 ] );
j++;
}
else if( buffer[ j ] == 0x53 )
{
printf( "push ebx\n" );
}
else if( buffer[ j ] == 0xB8 )
{
printf("mov eax,%x\n", buffer[ j + 1 ] );
j+=4;
}
else if( buffer[ j ] == 0xBB )
{
printf("mov ebx %x\n", buffer[ j + 1 ] );
j+=4;
}
else if( buffer[ j ] == 0x50 )
{
printf("push eax\n" );
}
else if( buffer[ j ] == 0x58 )
{
printf("pop eax\n" );
}
else if( buffer[ j ] == 0x6A )
{
printf( "push %d\n", buffer[ j + 1 ] );
j++;
}
else if( buffer[ j ] == 0x5B )
{
printf( "pop ebx\n" );
}
else if( buffer[ j ] == 0xE8 )
{
printf( "call %x%x%x%x\n", buffer[ j + 1 ], buffer[ j + 2 ], buffer[ j + 3 ], buffer[ j + 4 ] );
j+=4;
}
else if( buffer[ j ] == 0xFF && buffer[ j + 1 ] == 0x25 )
{
printf( "jmp dword ptr ds:[%x%x%x%x]\n", buffer[ j + 2 ], buffer[ j + 3], buffer[ j + 4 ], buffer[ j + 5 ] );
j+=5;
}
else
printf( "db %x\n", buffer[ j ] ); * I'm sorry for using lots of if's, don't kill me! *
* This is literally the worst way of doing this. Switches, pointers, or leveraging arrays would have been much better. I am lazy. *
* And no, this is not the complete instruction set, but you are always free to expand it! *
The story of this code is as follows: since we now possess all our data in a buffer, we loop through the buffer's size, examining each byte for know opcodes and on a known one print out the resulting assembly code. We also use our preferred base we got earlier to give an address to each byte of data. I pulled the opcodes from Olly, but a better place would be the Intel's Developer Manual.
You will notice if you try to run this, the output will flow off the screen - to get around this we will make use of piping to place our output in a file:

If everything went well, opening the file should be sight for sore eyes:

Now let us test how our dis-assembler does at reading data - change the CreateFile line so we can give it input:
Code:
hFile = CreateFile( argv[ 1 ], GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
test2.exe is as follows, and can also be found attached:
Code:
.486
.model flat,stdcall
option casemap:none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
includelib \masm32\lib\kernel32.lib
.data
candy dd 1
moo dd 2
.code
_start:
xor eax,eax
push 0
pop eax
push ebx
pop eax
xor eax,eax
push 0
call ExitProcess
end _start We then execute our new command:

And scroll down to the .data section to see our variables:

The final code for comparison:
Code:
#include <Windows.h>
#include <stdio.h>
int main( int argc, char** argv )
{
HANDLE hFile = NULL, hFileMappingObject = NULL;
LPVOID base = NULL;
int numOfSections = 0;
DWORD preferredBase = 0, sectionHeaderBase = 0;
BYTE *buffer = NULL;
DWORD dwBytesRead = 0;
PIMAGE_DOS_HEADER image_dos_header;
PIMAGE_NT_HEADERS image_nt_header;
PIMAGE_SECTION_HEADER image_section_header;
hFile = CreateFile( argv[ 1 ], GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL );
if( hFile != INVALID_HANDLE_VALUE )
{
hFileMappingObject = CreateFileMapping( hFile, NULL, PAGE_READONLY, 0, 0, NULL );
base = MapViewOfFile( hFileMappingObject, FILE_MAP_READ, 0, 0, 0 );
image_dos_header = (PIMAGE_DOS_HEADER) base;
if( image_dos_header->e_magic != IMAGE_DOS_SIGNATURE )
{
printf( "DOS magic number invalid" );
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
getchar();
return 0;
}
image_nt_header = (PIMAGE_NT_HEADERS) ((DWORD)base + image_dos_header->e_lfanew );
if( image_nt_header->Signature != IMAGE_NT_SIGNATURE )
{
printf( "Not a valid PE file" );
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
getchar( );
return 0;
}
//read in our sections
numOfSections = image_nt_header->FileHeader.NumberOfSections;
preferredBase = image_nt_header->OptionalHeader.AddressOfEntryPoint + image_nt_header->OptionalHeader.ImageBase;
sectionHeaderBase = (DWORD)base + image_dos_header->e_lfanew + sizeof( IMAGE_NT_HEADERS );
for( int i = 0; i < numOfSections; i++ )
{
image_section_header = (PIMAGE_SECTION_HEADER) sectionHeaderBase;
printf( "%s %x %x %x %x %x\n", image_section_header->Name, image_section_header->Misc.VirtualSize,
image_section_header->VirtualAddress, image_section_header->SizeOfRawData, image_section_header->PointerToRawData,
image_section_header->Characteristics );
//find our section data
buffer = new BYTE[ image_section_header->SizeOfRawData ];
SetFilePointer( hFile, image_section_header->PointerToRawData, NULL, FILE_BEGIN );
ReadFile( hFile, buffer, image_section_header->SizeOfRawData, &dwBytesRead, NULL );
//parse our data
for( int j = 0; j < image_section_header->SizeOfRawData; j++ )
{
printf( "%x:\t", preferredBase + j );
if( buffer[ j ] == 0x33 )
{
printf( "xor " );
if( buffer[ j + 1] == 0xC0 )
printf("eax,eax" );
else if( buffer[ j + 1] == 0xDB )
printf("ebx,ebx" );
printf( "\n" );
j++;
}
else if( buffer[ j ] == 0x83 )
{
printf( "cmp" );
if( buffer[ j + 1 ] == 0xFB )
printf( " ebx, %x", buffer[ j + 2 ] );
else if( buffer[ j + 1 ] == 0xF8 )
printf( " eax, %x", buffer[ j + 2 ] );
printf( "\n" );
j += 2;
}
else if( buffer[ j ] == 0x85 )
{
printf( "test " );
if( buffer[ j + 1 ] == 0xDB )
printf( "ebx,ebx" );
else if( buffer[ j + 1] == 0xC0 )
printf("eax,eax" );
printf( "\n" );
j++;
}
else if( buffer[ j ] == 0x74 )
{
printf( "je short %x\n", preferredBase + j + 2 + buffer[ j + 1 ] );
j++;
}
else if( buffer[ j ] == 0x53 )
{
printf( "push ebx\n" );
}
else if( buffer[ j ] == 0xB8 )
{
printf("mov eax,%x\n", buffer[ j + 1 ] );
j+=4;
}
else if( buffer[ j ] == 0xBB )
{
printf("mov ebx %x\n", buffer[ j + 1 ] );
j+=4;
}
else if( buffer[ j ] == 0x50 )
{
printf("push eax\n" );
}
else if( buffer[ j ] == 0x58 )
{
printf("pop eax\n" );
}
else if( buffer[ j ] == 0x6A )
{
printf( "push %d\n", buffer[ j + 1 ] );
j++;
}
else if( buffer[ j ] == 0x5B )
{
printf( "pop ebx\n" );
}
else if( buffer[ j ] == 0xE8 )
{
printf( "call %x%x%x%x\n", buffer[ j + 1 ], buffer[ j + 2 ], buffer[ j + 3 ], buffer[ j + 4 ] );
j+=4;
}
else if( buffer[ j ] == 0xFF && buffer[ j + 1 ] == 0x25 )
{
printf( "jmp dword ptr ds:[%x%x%x%x]\n", buffer[ j + 2 ], buffer[ j + 3], buffer[ j + 4 ], buffer[ j + 5 ] );
j+=5;
}
else
printf( "db %x\n", buffer[ j ] );
}
printf("----------------------------------------------------------------\n" );
delete[] buffer;
sectionHeaderBase += sizeof( IMAGE_SECTION_HEADER );
}
UnmapViewOfFile( base );
CloseHandle( hFileMappingObject );
CloseHandle( hFile );
}
getchar();
return 0;
} Hope you had a fun time - it does make you appreciate the dis-assemblers currently out so much more, no?
Until next time,
<3 attilathedud
References:
http://win32assembly.online.fr/pe-tut1.html
http://win32assembly.online.fr/pe-tut2.html
http://win32assembly.online.fr/pe-tut3.html
http://win32assembly.online.fr/pe-tut4.html
http://win32assembly.online.fr/pe-tut5.html
Steps of Compilation
Bookmarks