It is important for us to be able to understand C and C++ program binaries. As a beginner we will be restricting ourselves to just inspect C binaries. This can be extended to C++ binaries as well.
This program over here defines a string called str which is initialised to “hello world”, and then invokes the printf function to print that particular string.
1
2
3
4
5
6
#include<stdio.h>
int main()
{
char str[]="Hello readers";
printf("%s",str);
}
Now, in order to execute this particular program. The first thing we do is enter this program in a text editor and save it as hello.c. Then use the compiler like this $> gcc hello.c
which would create an a.out
executable.
This executable is stored in the disk. So, this executable has a format known as the ELF format or Executable Linker Format.
When you want to execute this program, you run the command $> ./a.out
from your shell. When this happens, the operating system gets invoked it loads the executable files from the hard disk and creates a process out of it which is present on the RAM.
The process is then made to execute, and you would get the string “hello world” printed on your terminal.
Executable Linker Format
ELF format describes a structure by which object files and executables need to be stored.
So, there are two views for the ELF format, one is the linker view
and the other one is the executable view
The linker view is applicable for object files, while executables have the executable view.
So, an object file (hello.o) for hello.c can be created by this particular command. Now, hello.o is an ELF object file
$> gcc hello.c -c
When an object file is get opened it will shows something like this
At the start you have an ELF header which describes the entire file organisation.
Then you have various sections which contain the code, the data, the symbol table, relocation information and so on and you also have a section header table.
So, in the section header table, essentially there is a structure which would help you locate the various sections present in the ELF object file. There is also a structure known as a program header table, but this is typically not present in the object file.
ELF Header
ELF header defines a structure with various parameters.
Identifier
is a magic number
which can be used to determine whether the file is an ELF file
so for example all ELF objects, ELF executables, libraries and so on would start off with this identifier.
Entry
describes the type - whether the file is an object, or an executable a shared object or a core file
Machine details
which processor was this file compiled for.
Entry
which describes the virtual address, where program begins execution. So, this is more applicable for executables rather than object files or libraries.
When the above mentioned C program is converted to executable file it will look like this. Use the command to view the ELF header of our program.
$> readelf -H hello.o
Note that it has the magic number
which essentially is the ELF identification. Then you can see machine details, which tells object file was compiled for AMD, X86-64, that is, this object file can be only used by AMD and Intel machines which are configured for 64-bit.
Then you have the entry point address
, start of the section headers which is an offset of 368 bytes into the file. Also, the number of section headers that are present is in this case is 13.
Section Headers
The section header table for this particular program can be obtained by running readelf with the -S option
$> readelf -S hello.o
For better visualisation you can view the below mentioned image
The offset
specifies, the offset within that Elf object where you could find this specific section.
For example, the .text section is present at an offset of 34 and has a size of 3C. So 3C here is the hexadecimal notation.
There are other columns like Flag
For example, A implies Allocated while X stands for Executable, which means that the section contains executable code and can be executed. In a similar way, for example, you have the .data section, has the flag WA, so W stands for Write. So, note that the data segment is writable but cannot be executed.
Virtual Address
- This is an object file, therefore each of these sections are relocatable. Therefore, the addresses present over here are all zero