link editing (ld)
ink-editor, ld(1) concatenates one or more input files (relocatable objects, shared objects or archive libraries)
to produce one output file ( relocatable object, exe, or shared object ). Most commonly evoked as a part of the
compilation ( cc, gcc ).
Link Editing (ld)
Takes input files from cc, as, or ld and produces one output file of the following formats: relocatable objects,
static exe, dynamic exe or shared object. All input files to ld are in the Executable Linker Format (ELF). It is
therefore crucial that we understand ELF file format in order to understand link editing. First we shall examine
the types of ELF files one can have and there purpose.
- Relocatable Objects - concatenation of relocatable object input files into one output that can be
used again in link-editing. These files contain data telling the linker how to link them to other
relocatable objects, shared objects, and executable's.
- Static exe - all symbol references get bound to the exe, and thus represent a ready to run process.
Both forms of executable files contain the data necessary for the operating system to produce an
executable image.
- Dynamic exe - concatenation of relocatable objects that requires intervention by the runtime linker
to produce the runnable process. The symbols in the symtab might need binding at runtime. The dynamic
executable may also be dependent on shared objects(so). Dynamic executable's are the default output
of a compilation.
- Shared Objects - concatenation of relocatable objects that provides services to dynamic executable's
bound at runtime by the runtime linker ld.so.1. Shared objects might also be dependent on other
shared objects. Think of Shared objects as dynamic executable's that have not been assigned any virtual
address space.
The graphic below demonstrates how to create the various file format discussed above.
Executable Linker Format (ELF)
The ELF file format was created by Unix System Laboratories as a better alternative to a.out and COFF
binary formats. Some capabilities of the ELF format include: dynamic linking, dynamic loading, imposing
runtime control on a program, and an improved method for creating shared libraries. ELF files contain five
section types that may or may not be included in the file. The five types include:
- The ELF header.
- The Program header table.
- The Section header table.
- ELF sections. (linker view)
- ELF segments. (executable view)
Each of the ELF file formats described above can be looked at in 2 ways (called views). The first view is
the linker view and the second is the executable view. The views are summarized in the figure below:
The linker view of ELF files is partitioned into sections while the executable view is partitioned into segments.
Sections represent the smallest indivisible unit that can be processed in the ELF file. A segment is a collection
of sections and is the smallest unit that can be mapped (mmap) to memory by (exec) or (ld.so.1). These two views
allows us to look at information that is specific to linking such as the symbol table and relocation information
separate from information specific to creating the process image, like text and data segments. The bulk of the data
is therefore stored in sections and segments with the rest of the file (headers) devoted to the organization and
access of those sections/segments. The following is a brief description of each of the five file parts.
ELF Header.
This is the only fixed portion of the ELF file, always occurring at the start. It provides information such as:
ELF version, target architecture, location of program header table, location of section header table, location of
strings table(storing the names of sections), along with the size of each table, and lastly the location of the
first instruction that is going to be executed.
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
uint16_t e_type;
uint16_t e_machine;
uint32_t e_version;
ElfN_Addr e_entry;
ElfN_Off e_phoff;
ElfN_Off e_shoff;
uint32_t e_flags;
uint16_t e_ehsize;
uint16_t e_phentsize;
uint16_t e_phnum;
uint16_t e_shentsize;
uint16_t e_shnum;
uint16_t e_shstrndx;
} ElfN_Ehdr;
|
Program Header Table
The program header table is only useful to executables and shared objects. This provides organizational
information on the array of segments in the file. Each entry in the program header table contains the type,
file offset, physical address, virtual address, file size, memory image size, and alignment for a segment in
the program. Each segment is copied into memory if its pt_type=PT_LOAD. ?? Question how do we know the
physical address ??
typedef struct {
uint32_t p_type;
Elf32_Off p_offset;
Elf32_Addr p_vaddr;
Elf32_Addr p_paddr;
uint32_t p_filesz;
uint32_t p_memsz;
uint32_t p_flags;
uint32_t p_align;
} Elf32_Phdr;
|
Section Header Table
Provides organization information on the array of sections in the ELF file. These entries provide the name,
type, memory image starting address (if loadable), file offset, the section's size in bytes, alignment, and how the
information in the section should be interpreted.
typedef struct {
uint32_t sh_name;
uint32_t sh_type;
uint32_t sh_flags;
Elf32_Addr sh_addr;
Elf32_Off sh_offset;
uint32_t sh_size;
uint32_t sh_link;
uint32_t sh_info;
uint32_t sh_addralign;
uint32_t sh_entsize;
} Elf32_Shdr;
|
ELF Sections
Sections can hold executable code, data, dynamic linking information, debugging data, symbol tables, relocation
information, comments, string tables, and notes. Some sections provide information on liking, others are loaded
into the process image, while others provide information on building an executable.
ELF Segments
Segments are a groupings of like sections ( text segment, data segment). A process image is created by
loading segments into virtual memory segments described by the program header.
Tools readelf
readelf is a tool for viewing elf files. Click here to view and example elfdump. Make sure to view the sections in
the example file and return to the example when needed. I found that it gave me a better understanding of the material
having an example elf file handy.
Sections of Interest to us
So the basic idea from here is that the link editor concatenates program .text, .data, and .bss sections
into the new output file. The rest of the relocation and symbol information is modified or generated to
the output file.
ld Execution
So the basic idea from here is that the link editor concatenates program .text, .data, and .bss sections into
the new output file. The rest of the relocation and symbol information is modified or generated to the
output file.
Here is the program flow for the linker:
- Verify options passed to it.
- Concatenate like sections (type, attribute, name) from input relocatable objects to
form sections within the output file.
- Read symbol tables from relocatable object's and shared object's and apply the info
to output file by updating other input sections. In addition an output relocation section
might be generated.
- Generate program headers that describe all the segments created.
- generate dynamic linking info section providing shared object's dependencies and
symbol bindings to the runtime linker.
You can change how these sections get mapped by creating a mapping file and using the -M option with (ld).
More on this later.
Your Compiler
In practice you rarely invoke ld yourself and it is generally good practice not to. This is because the
linker will not attach init and termination code to your program. But we will run some tests on our example
program to better understand this (example test.c - the simplest c program).
int main( )
{
return 0;
}
Then we can ask nicely for gcc to compile our test program but not to link it. Once we are done this we can
try to manually link the file
gcc -c test.c
ld test.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048094
Click here to view the "readelf -a" of the resulting file
The normal way is to have the compiler dirver invoke the linker as follows
gcc test.o
Click here to view the readelf of the resulting file. The deference
is rather substantial. To the tune of a lot of extra crap gets included into my simple little program.
There is actually more stuff added then there is stuff in my program. At this point it could be said that
gcc is the author of my program and not me. So what is all this extra crap that is being added? Lets
find out.
One of the only times that it is acceptable to invoke the linker on your own is when you are creating another
relocatable object. This is done with the -r option for ld.
ld -r test.o
The moral of the story is that during compilation there is a bunch of extra stuff that gets included in your file.
Upon realizing this a good question is what is it? On a Solaris box we can use the -# option to have the compiler
display these mysterious files that are included into our code. In linux and gcc you can get the same output with
a call to gcc --verbose..
cc -# -o prog test.c
Here is the results on Solaris.
/opt/SUNWspro/bin/../WS6U1/bin/acomp -i test.c -y-fbe -y/opt/SUNWspro/bin/../WS6U1/bin/fbe -y-xarch=generic
-y-o -ytest.o -y-s -y-verbose -y-xmemalign=4s -Qy -D__SunOS_5_8 -D__SUNPRO_C=0x520 -D__SVR4 -D__unix -D__sun
-D__sparc -D__BUILTIN_VA_ARG_INCR -D__SUN_PREFETCH -Xa -D__PRAGMA_REDEFINE_EXTNAME -Dunix -Dsun -Dsparc
-D__RESTRICT -I/opt/SUNWspro/WS6U1/include/cc "-g/opt/SUNWspro/bin/../WS6U1/bin/cc -c "
### Note: LD_LIBRARY_PATH = <null>
### Note: LD_RUN_PATH = <null>
/usr/ccs/bin/ld /opt/SUNWspro/WS6U1/lib/crti.o /opt/SUNWspro/WS6U1/lib/crt1.o /opt/SUNWspro/WS6U1/lib/values-xa.o
-o prog test.o -Y "P,/opt/SUNWspro/WS6U1/lib:/usr/ccs/lib:/usr/lib" -Qy -lc /opt/SUNWspro/WS6U1/lib/crtn.o
gcc --verbose test.c
Here is the results under debian linux
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs
Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.4 (Debian)
/usr/lib/gcc-lib/i486-linux/3.3.4/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=4 test.c -quiet -dumpbase test.c -auxbase test -version -o /tmp/ccSbXIgh.s
GNU C version 3.3.4 (Debian) (i486-linux)
compiled by GNU C version 3.3.4 (Debian)
GGC heuristics: --param ggc-min-expand=98 --param ggc-min-heapsize=129048
ignoring nonexistent directory "/usr/i486-linux/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc-lib/i486-linux/3.3.4/include
/usr/include
End of search list.
as -V -Qy -o /tmp/ccWmNHhp.o /tmp/ccSbXIgh.s
GNU assembler version 2.15 (i386-linux) using BFD version 2.15
/usr/lib/gcc-lib/i486-linux/3.3.4/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i486-linux/3.3.4/../../../crt1.o /usr/lib/gcc-lib/i486-linux/3.3.4/../../../crti.o /usr/lib/gcc-lib/i486-linux/3.3.4/crtbegin.o -L/usr/lib/gcc-lib/i486-linux/3.3.4 -L/usr/lib/gcc-lib/i486-linux/3.3.4/../../.. /tmp/ccWmNHhp.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc-lib/i486-linux/3.3.4/crtend.o /usr/lib/gcc-lib/i486-linux/3.3.4/../../../crtn.o
Initialization and Termination Sections
Dynamic Objects provide code for runtime initialization and termination. This code may be in the form of
function pointers or one entire block. Each of these sections is built from like section types given by input
relocatable objects. Sections:
- .preinit_array
- .init_array
- .fini_array
When creating dynamic objects the link editor identifies these arrays with .dynamic tags DT_PREINIT_ARRAY,
DT_PREINIT_ARRAYSZ, AND DT_INIT_ARRAY, DT_INIT_ARRAYSZ, AND DT_FINI_ARRAY, DT_INI_ARRAYSZ.
The sections .init and .fini provide the runtime initialization and termination code for your dynamic
executable. Compiler drivers usually supply these sections as files that are tacked onto the beginning and
end of the input file list. These sections are provide the requred code in the form of two reserved functions
named _init and _fini. When creating a dynamic object the link editor provides symbols with .dynamic tags
DT_INIT and DT_FINI. One thing that is very kewl is that you can add functions to the ini_array and
the fini_array.
refer back to our ELF file to locate these symbols.
Symbol Processing and Resolution
During input file processing the link editor passes any local symbols straight through to the output
file, while global symbols are accumulated internally. The internal symbol table is searched for each
new global symbol entry to determine if two are the same and some form of resolution needs to occur.
Basic types of symbol resulution
- Undefined - global
- Tentative - occupy storage at runtime
- Defined - occupy storage in file
|