ld/ldint.texi

   1 \input texinfo
   2 @setfilename ldint.info
   3 @c Copyright (C) 1992-2019 Free Software Foundation, Inc.
   4
   5 @ifnottex
   6 @dircategory Software development
   7 @direntry
   8 * Ld-Internals: (ldint).        The GNU linker internals.
   9 @end direntry
  10 @end ifnottex
  11
  12 @copying
  13 This file documents the internals of the GNU linker ld.
  14
  15 Copyright @copyright{} 1992-2019 Free Software Foundation, Inc.
  16 Contributed by Cygnus Support.
  17
  18 Permission is granted to copy, distribute and/or modify this document
  19 under the terms of the GNU Free Documentation License, Version 1.3 or
  20 any later version published by the Free Software Foundation; with the
  21 Invariant Sections being ``GNU General Public License'' and ``Funding
  22 Free Software'', the Front-Cover texts being (a) (see below), and with
  23 the Back-Cover Texts being (b) (see below).  A copy of the license is
  24 included in the section entitled ``GNU Free Documentation License''.
  25
  26 (a) The FSF's Front-Cover Text is:
  27
  28      A GNU Manual
  29
  30 (b) The FSF's Back-Cover Text is:
  31
  32      You have freedom to copy and modify this GNU Manual, like GNU
  33      software.  Copies published by the Free Software Foundation raise
  34      funds for GNU development.
  35 @end copying
  36
  37 @iftex
  38 @finalout
  39 @setchapternewpage off
  40 @settitle GNU Linker Internals
  41 @titlepage
  42 @title{A guide to the internals of the GNU linker}
  43 @author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
  44 @author Cygnus Support
  45 @page
  46
  47 @tex
  48 \def\$#1${{#1}}  % Kluge: collect RCS revision info without $...$
  49 \xdef\manvers{2.10.91}  % For use in headers, footers too
  50 {\parskip=0pt
  51 \hfill Cygnus Support\par
  52 \hfill \manvers\par
  53 \hfill \TeX{}info \texinfoversion\par
  54 }
  55 @end tex
  56
  57 @vskip 0pt plus 1filll
  58 Copyright @copyright{} 1992-2019 Free Software Foundation, Inc.
  59
  60       Permission is granted to copy, distribute and/or modify this document
  61       under the terms of the GNU Free Documentation License, Version 1.3
  62       or any later version published by the Free Software Foundation;
  63       with no Invariant Sections, with no Front-Cover Texts, and with no
  64       Back-Cover Texts.  A copy of the license is included in the
  65       section entitled "GNU Free Documentation License".
  66
  67 @end titlepage
  68 @end iftex
  69
  70 @node Top
  71 @top
  72
  73 This file documents the internals of the GNU linker @code{ld}.  It is a
  74 collection of miscellaneous information with little form at this point.
  75 Mostly, it is a repository into which you can put information about
  76 GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
  77
  78 This document is distributed under the terms of the GNU Free
  79 Documentation License.  A copy of the license is included in the
  80 section entitled "GNU Free Documentation License".
  81
  82 @menu
  83 * README::                      The README File
  84 * Emulations::                  How linker emulations are generated
  85 * Emulation Walkthrough::       A Walkthrough of a Typical Emulation
  86 * Architecture Specific::       Some Architecture Specific Notes
  87 * GNU Free Documentation License::  GNU Free Documentation License
  88 @end menu
  89
  90 @node README
  91 @chapter The @file{README} File
  92
  93 Check the @file{README} file; it often has useful information that does not
  94 appear anywhere else in the directory.
  95
  96 @node Emulations
  97 @chapter How linker emulations are generated
  98
  99 Each linker target has an @dfn{emulation}.  The emulation includes the
 100 default linker script, and certain emulations also modify certain types
 101 of linker behaviour.
 102
 103 Emulations are created during the build process by the shell script
 104 @file{genscripts.sh}.
 105
 106 The @file{genscripts.sh} script starts by reading a file in the
 107 @file{emulparams} directory.  This is a shell script which sets various
 108 shell variables used by @file{genscripts.sh} and the other shell scripts
 109 it invokes.
 110
 111 The @file{genscripts.sh} script will invoke a shell script in the
 112 @file{scripttempl} directory in order to create default linker scripts
 113 written in the linker command language.  The @file{scripttempl} script
 114 will be invoked 5 (or, in some cases, 6) times, with different
 115 assignments to shell variables, to create different default scripts.
 116 The choice of script is made based on the command-line options.
 117
 118 After creating the scripts, @file{genscripts.sh} will invoke yet another
 119 shell script, this time in the @file{emultempl} directory.  That shell
 120 script will create the emulation source file, which contains C code.
 121 This C code permits the linker emulation to override various linker
 122 behaviours.  Most targets use the generic emulation code, which is in
 123 @file{emultempl/generic.em}.
 124
 125 To summarize, @file{genscripts.sh} reads three shell scripts: an
 126 emulation parameters script in the @file{emulparams} directory, a linker
 127 script generation script in the @file{scripttempl} directory, and an
 128 emulation source file generation script in the @file{emultempl}
 129 directory.
 130
 131 For example, the Sun 4 linker sets up variables in
 132 @file{emulparams/sun4.sh}, creates linker scripts using
 133 @file{scripttempl/aout.sc}, and creates the emulation code using
 134 @file{emultempl/sunos.em}.
 135
 136 Note that the linker can support several emulations simultaneously,
 137 depending upon how it is configured.  An emulation can be selected with
 138 the @code{-m} option.  The @code{-V} option will list all supported
 139 emulations.
 140
 141 @menu
 142 * emulation parameters::        @file{emulparams} scripts
 143 * linker scripts::              @file{scripttempl} scripts
 144 * linker emulations::           @file{emultempl} scripts
 145 @end menu
 146
 147 @node emulation parameters
 148 @section @file{emulparams} scripts
 149
 150 Each target selects a particular file in the @file{emulparams} directory
 151 by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
 152 This shell variable is used by the @file{configure} script to control
 153 building an emulation source file.
 154
 155 Certain conventions are enforced.  Suppose the @code{targ_emul} variable
 156 is set to @var{emul} in @file{configure.tgt}.  The name of the emulation
 157 shell script will be @file{emulparams/@var{emul}.sh}.  The
 158 @file{Makefile} must have a target named @file{e@var{emul}.c}; this
 159 target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
 160 appropriate scripts in the @file{scripttempl} and @file{emultempl}
 161 directories.  The @file{Makefile} target must invoke @code{GENSCRIPTS}
 162 with two arguments: @var{emul}, and the value of the make variable
 163 @code{tdir_@var{emul}}.  The value of the latter variable will be set by
 164 the @file{configure} script, and is used to set the default target
 165 directory to search.
 166
 167 By convention, the @file{emulparams/@var{emul}.sh} shell script should
 168 only set shell variables.  It may set shell variables which are to be
 169 interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
 170 Certain shell variables are interpreted directly by the
 171 @file{genscripts.sh} script.
 172
 173 Here is a list of shell variables interpreted by @file{genscripts.sh},
 174 as well as some conventional shell variables interpreted by the
 175 @file{scripttempl} and @file{emultempl} scripts.
 176
 177 @table @code
 178 @item SCRIPT_NAME
 179 This is the name of the @file{scripttempl} script to use.  If
 180 @code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
 181 the script @file{scripttempl/@var{script}.sc}.
 182
 183 @item TEMPLATE_NAME
 184 This is the name of the @file{emultempl} script to use.  If
 185 @code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
 186 use the script @file{emultempl/@var{template}.em}.  If this variable is
 187 not set, the default value is @samp{generic}.
 188
 189 @item GENERATE_SHLIB_SCRIPT
 190 If this is set to a nonempty string, @file{genscripts.sh} will invoke
 191 the @file{scripttempl} script an extra time to create a shared library
 192 script.  @ref{linker scripts}.
 193
 194 @item OUTPUT_FORMAT
 195 This is normally set to indicate the BFD output format use (e.g.,
 196 @samp{"a.out-sunos-big"}.  The @file{scripttempl} script will normally
 197 use it in an @code{OUTPUT_FORMAT} expression in the linker script.
 198
 199 @item ARCH
 200 This is normally set to indicate the architecture to use (e.g.,
 201 @samp{sparc}).  The @file{scripttempl} script will normally use it in an
 202 @code{OUTPUT_ARCH} expression in the linker script.
 203
 204 @item ENTRY
 205 Some @file{scripttempl} scripts use this to set the entry address, in an
 206 @code{ENTRY} expression in the linker script.
 207
 208 @item TEXT_START_ADDR
 209 Some @file{scripttempl} scripts use this to set the start address of the
 210 @samp{.text} section.
 211
 212 @item SEGMENT_SIZE
 213 The @file{genscripts.sh} script uses this to set the default value of
 214 @code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
 215
 216 @item TARGET_PAGE_SIZE
 217 If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
 218 uses this to define it.
 219
 220 @item ALIGNMENT
 221 Some @file{scripttempl} scripts set this to a number to pass to
 222 @code{ALIGN} to set the required alignment for the @code{end} symbol.
 223 @end table
 224
 225 @node linker scripts
 226 @section @file{scripttempl} scripts
 227
 228 Each linker target uses a @file{scripttempl} script to generate the
 229 default linker scripts.  The name of the @file{scripttempl} script is
 230 set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
 231 If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
 232 invoke @file{scripttempl/@var{script}.sc}.
 233
 234 The @file{genscripts.sh} script will invoke the @file{scripttempl}
 235 script 5 to 9 times.  Each time it will set the shell variable
 236 @code{LD_FLAG} to a different value.  When the linker is run, the
 237 options used will direct it to select a particular script.  (Script
 238 selection is controlled by the @code{get_script} emulation entry point;
 239 this describes the conventional behaviour).
 240
 241 The @file{scripttempl} script should just write a linker script, written
 242 in the linker command language, to standard output.  If the emulation
 243 name--the name of the @file{emulparams} file without the @file{.sc}
 244 extension--is @var{emul}, then the output will be directed to
 245 @file{ldscripts/@var{emul}.@var{extension}} in the build directory,
 246 where @var{extension} changes each time the @file{scripttempl} script is
 247 invoked.
 248
 249 Here is the list of values assigned to @code{LD_FLAG}.
 250
 251 @table @code
 252 @item (empty)
 253 The script generated is used by default (when none of the following
 254 cases apply).  The output has an extension of @file{.x}.
 255 @item n
 256 The script generated is used when the linker is invoked with the
 257 @code{-n} option.  The output has an extension of @file{.xn}.
 258 @item N
 259 The script generated is used when the linker is invoked with the
 260 @code{-N} option.  The output has an extension of @file{.xbn}.
 261 @item r
 262 The script generated is used when the linker is invoked with the
 263 @code{-r} option.  The output has an extension of @file{.xr}.
 264 @item u
 265 The script generated is used when the linker is invoked with the
 266 @code{-Ur} option.  The output has an extension of @file{.xu}.
 267 @item shared
 268 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
 269 this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
 270 @file{emulparams} file.  The @file{emultempl} script must arrange to use
 271 this script at the appropriate time, normally when the linker is invoked
 272 with the @code{-shared} option.  The output has an extension of
 273 @file{.xs}.
 274 @item c
 275 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
 276 this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
 277 @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The
 278 @file{emultempl} script must arrange to use this script at the appropriate
 279 time, normally when the linker is invoked with the @code{-z combreloc}
 280 option.  The output has an extension of
 281 @file{.xc}.
 282 @item cshared
 283 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
 284 this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
 285 @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and
 286 @code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file.
 287 The @file{emultempl} script must arrange to use this script at the
 288 appropriate time, normally when the linker is invoked with the @code{-shared
 289 -z combreloc} option.  The output has an extension of @file{.xsc}.
 290 @item auto_import
 291 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
 292 this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the
 293 @file{emulparams} file.  The @file{emultempl} script must arrange to
 294 use this script at the appropriate time, normally when the linker is
 295 invoked with the @code{--enable-auto-import} option.  The output has
 296 an extension of @file{.xa}.
 297 @end table
 298
 299 Besides the shell variables set by the @file{emulparams} script, and the
 300 @code{LD_FLAG} variable, the @file{genscripts.sh} script will set
 301 certain variables for each run of the @file{scripttempl} script.
 302
 303 @table @code
 304 @item RELOCATING
 305 This will be set to a non-empty string when the linker is doing a final
 306 relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
 307
 308 @item CONSTRUCTING
 309 This will be set to a non-empty string when the linker is building
 310 global constructor and destructor tables (e.g., all scripts other than
 311 @code{-r}).
 312
 313 @item DATA_ALIGNMENT
 314 This will be set to an @code{ALIGN} expression when the output should be
 315 page aligned, or to @samp{.} when generating the @code{-N} script.
 316
 317 @item CREATE_SHLIB
 318 This will be set to a non-empty string when generating a @code{-shared}
 319 script.
 320
 321 @item COMBRELOC
 322 This will be set to a non-empty string when generating @code{-z combreloc}
 323 scripts to a temporary file name which can be used during script generation.
 324 @end table
 325
 326 The conventional way to write a @file{scripttempl} script is to first
 327 set a few shell variables, and then write out a linker script using
 328 @code{cat} with a here document.  The linker script will use variable
 329 substitutions, based on the above variables and those set in the
 330 @file{emulparams} script, to control its behaviour.
 331
 332 When there are parts of the @file{scripttempl} script which should only
 333 be run when doing a final relocation, they should be enclosed within a
 334 variable substitution based on @code{RELOCATING}.  For example, on many
 335 targets special symbols such as @code{_end} should be defined when doing
 336 a final link.  Naturally, those symbols should not be defined when doing
 337 a relocatable link using @code{-r}.  The @file{scripttempl} script
 338 could use a construct like this to define those symbols:
 339 @smallexample
 340   $@{RELOCATING+ _end = .;@}
 341 @end smallexample
 342 This will do the symbol assignment only if the @code{RELOCATING}
 343 variable is defined.
 344
 345 The basic job of the linker script is to put the sections in the correct
 346 order, and at the correct memory addresses.  For some targets, the
 347 linker script may have to do some other operations.
 348
 349 For example, on most MIPS platforms, the linker is responsible for
 350 defining the special symbol @code{_gp}, used to initialize the
 351 @code{$gp} register.  It must be set to the start of the small data
 352 section plus @code{0x8000}.  Naturally, it should only be defined when
 353 doing a final relocation.  This will typically be done like this:
 354 @smallexample
 355   $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
 356 @end smallexample
 357 This line would appear just before the sections which compose the small
 358 data section (@samp{.sdata}, @samp{.sbss}).  All those sections would be
 359 contiguous in memory.
 360
 361 Many COFF systems build constructor tables in the linker script.  The
 362 compiler will arrange to output the address of each global constructor
 363 in a @samp{.ctor} section, and the address of each global destructor in
 364 a @samp{.dtor} section (this is done by defining
 365 @code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
 366 @code{gcc} configuration files).  The @code{gcc} runtime support
 367 routines expect the constructor table to be named @code{__CTOR_LIST__}.
 368 They expect it to be a list of words, with the first word being the
 369 count of the number of entries.  There should be a trailing zero word.
 370 (Actually, the count may be -1 if the trailing word is present, and the
 371 trailing word may be omitted if the count is correct, but, as the
 372 @code{gcc} behaviour has changed slightly over the years, it is safest
 373 to provide both).  Here is a typical way that might be handled in a
 374 @file{scripttempl} file.
 375 @smallexample
 376     $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
 377     $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
 378     $@{CONSTRUCTING+ *(.ctors)@}
 379     $@{CONSTRUCTING+ LONG(0)@}
 380     $@{CONSTRUCTING+ __CTOR_END__ = .;@}
 381     $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
 382     $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
 383     $@{CONSTRUCTING+ *(.dtors)@}
 384     $@{CONSTRUCTING+ LONG(0)@}
 385     $@{CONSTRUCTING+ __DTOR_END__ = .;@}
 386 @end smallexample
 387 The use of @code{CONSTRUCTING} ensures that these linker script commands
 388 will only appear when the linker is supposed to be building the
 389 constructor and destructor tables.  This example is written for a target
 390 which uses 4 byte pointers.
 391
 392 Embedded systems often need to set a stack address.  This is normally
 393 best done by using the @code{PROVIDE} construct with a default stack
 394 address.  This permits the user to easily override the stack address
 395 using the @code{--defsym} option.  Here is an example:
 396 @smallexample
 397   $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
 398 @end smallexample
 399 The value of the symbol @code{__stack} would then be used in the startup
 400 code to initialize the stack pointer.
 401
 402 @node linker emulations
 403 @section @file{emultempl} scripts
 404
 405 Each linker target uses an @file{emultempl} script to generate the
 406 emulation code.  The name of the @file{emultempl} script is set by the
 407 @code{TEMPLATE_NAME} variable in the @file{emulparams} script.  If the
 408 @code{TEMPLATE_NAME} variable is not set, the default is
 409 @samp{generic}.  If the value of @code{TEMPLATE_NAME} is @var{template},
 410 @file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
 411
 412 Most targets use the generic @file{emultempl} script,
 413 @file{emultempl/generic.em}.  A different @file{emultempl} script is
 414 only needed if the linker must support unusual actions, such as linking
 415 against shared libraries.
 416
 417 The @file{emultempl} script is normally written as a simple invocation
 418 of @code{cat} with a here document.  The document will use a few
 419 variable substitutions.  Typically each function names uses a
 420 substitution involving @code{EMULATION_NAME}, for ease of debugging when
 421 the linker supports multiple emulations.
 422
 423 Every function and variable in the emitted file should be static.  The
 424 only globally visible object must be named
 425 @code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
 426 the name of the emulation set in @file{configure.tgt} (this is also the
 427 name of the @file{emulparams} file without the @file{.sh} extension).
 428 The @file{genscripts.sh} script will set the shell variable
 429 @code{EMULATION_NAME} before invoking the @file{emultempl} script.
 430
 431 The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
 432 @code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
 433 It defines a set of function pointers which are invoked by the linker,
 434 as well as strings for the emulation name (normally set from the shell
 435 variable @code{EMULATION_NAME} and the default BFD target name (normally
 436 set from the shell variable @code{OUTPUT_FORMAT} which is normally set
 437 by the @file{emulparams} file).
 438
 439 The @file{genscripts.sh} script will set the shell variable
 440 @code{COMPILE_IN} when it invokes the @file{emultempl} script for the
 441 default emulation.  In this case, the @file{emultempl} script should
 442 include the linker scripts directly, and return them from the
 443 @code{get_scripts} entry point.  When the emulation is not the default,
 444 the @code{get_scripts} entry point should just return a file name.  See
 445 @file{emultempl/generic.em} for an example of how this is done.
 446
 447 At some point, the linker emulation entry points should be documented.
 448
 449 @node Emulation Walkthrough
 450 @chapter A Walkthrough of a Typical Emulation
 451
 452 This chapter is to help people who are new to the way emulations
 453 interact with the linker, or who are suddenly thrust into the position
 454 of having to work with existing emulations.  It will discuss the files
 455 you need to be aware of.  It will tell you when the given "hooks" in
 456 the emulation will be called.  It will, hopefully, give you enough
 457 information about when and how things happen that you'll be able to
 458 get by.  As always, the source is the definitive reference to this.
 459
 460 The starting point for the linker is in @file{ldmain.c} where
 461 @code{main} is defined.  The bulk of the code that's emulation
 462 specific will initially be in @code{emultempl/@var{emulation}.em} but
 463 will end up in @code{e@var{emulation}.c} when the build is done.
 464 Most of the work to select and interface with emulations is in
 465 @code{ldemul.h} and @code{ldemul.c}.  Specifically, @code{ldemul.h}
 466 defines the @code{ld_emulation_xfer_struct} structure your emulation
 467 exports.
 468
 469 Your emulation file exports a symbol
 470 @code{ld_@var{EMULATION_NAME}_emulation}.  If your emulation is
 471 selected (it usually is, since usually there's only one),
 472 @code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
 473 @code{ldemul.c} also defines a number of API functions that interface
 474 to your emulation, like @code{ldemul_after_parse} which simply calls
 475 your @code{ld_@var{EMULATION}_emulation.after_parse} function.  For
 476 the rest of this section, the functions will be mentioned, but you
 477 should assume the indirect reference to your emulation also.
 478
 479 We will also skip or gloss over parts of the link process that don't
 480 relate to emulations, like setting up internationalization.
 481
 482 After initialization, @code{main} selects an emulation by pre-scanning
 483 the command-line arguments.  It calls @code{ldemul_choose_target} to
 484 choose a target.  If you set @code{choose_target} to
 485 @code{ldemul_default_target}, it picks your @code{target_name} by
 486 default.
 487
 488 @code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
 489 @code{parse_args} calls @code{ldemul_parse_args} for each arg, which
 490 must update the @code{getopt} globals if it recognizes the argument.
 491 If the emulation doesn't recognize it, then parse_args checks to see
 492 if it recognizes it.
 493
 494 Now that the emulation has had access to all its command-line options,
 495 @code{main} calls @code{ldemul_set_symbols}.  This can be used for any
 496 initialization that may be affected by options.  It is also supposed
 497 to set up any variables needed by the emulation script.
 498
 499 @code{main} now calls @code{ldemul_get_script} to get the emulation
 500 script to use (based on arguments, no doubt, @pxref{Emulations}) and
 501 runs it.  While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
 502 @code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
 503 commands.  It may call @code{ldemul_unrecognized_file} if you asked
 504 the linker to link a file it doesn't recognize.  It will call
 505 @code{ldemul_recognized_file} for each file it does recognize, in case
 506 the emulation wants to handle some files specially.  All the while,
 507 it's loading the files (possibly calling
 508 @code{ldemul_open_dynamic_archive}) and symbols and stuff.  After it's
 509 done reading the script, @code{main} calls @code{ldemul_after_parse}.
 510 Use the after-parse hook to set up anything that depends on stuff the
 511 script might have set up, like the entry point.
 512
 513 @code{main} next calls @code{lang_process} in @code{ldlang.c}.  This
 514 appears to be the main core of the linking itself, as far as emulation
 515 hooks are concerned(*).  It first opens the output file's BFD, calling
 516 @code{ldemul_set_output_arch}, and calls
 517 @code{ldemul_create_output_section_statements} in case you need to use
 518 other means to find or create object files (i.e. shared libraries
 519 found on a path, or fake stub objects).  Despite the name, nobody
 520 creates output sections here.
 521
 522 (*) In most cases, the BFD library does the bulk of the actual
 523 linking, handling symbol tables, symbol resolution, relocations, and
 524 building the final output file.  See the BFD reference for all the
 525 details.  Your emulation is usually concerned more with managing
 526 things at the file and section level, like "put this here, add this
 527 section", etc.
 528
 529 Next, the objects to be linked are opened and BFDs created for them,
 530 and @code{ldemul_after_open} is called.  At this point, you have all
 531 the objects and symbols loaded, but none of the data has been placed
 532 yet.
 533
 534 Next comes the Big Linking Thingy (except for the parts BFD does).
 535 All input sections are mapped to output sections according to the
 536 script.  If a section doesn't get mapped by default,
 537 @code{ldemul_place_orphan} will get called to figure out where it goes.
 538 Next it figures out the offsets for each section, calling
 539 @code{ldemul_before_allocation} before and
 540 @code{ldemul_after_allocation} after deciding where each input section
 541 ends up in the output sections.
 542
 543 The last part of @code{lang_process} is to figure out all the symbols'
 544 values.  After assigning final values to the symbols,
 545 @code{ldemul_finish} is called, and after that, any undefined symbols
 546 are turned into fatal errors.
 547
 548 OK, back to @code{main}, which calls @code{ldwrite} in
 549 @file{ldwrite.c}.  @code{ldwrite} calls BFD's final_link, which does
 550 all the relocation fixups and writes the output bfd to disk, and we're
 551 done.
 552
 553 In summary,
 554
 555 @itemize @bullet
 556
 557 @item @code{main()} in @file{ldmain.c}
 558 @item @file{emultempl/@var{EMULATION}.em} has your code
 559 @item @code{ldemul_choose_target} (defaults to your @code{target_name})
 560 @item @code{ldemul_before_parse}
 561 @item Parse argv, calls @code{ldemul_parse_args} for each
 562 @item @code{ldemul_set_symbols}
 563 @item @code{ldemul_get_script}
 564 @item parse script
 565
 566 @itemize @bullet
 567 @item may call @code{ldemul_hll} or @code{ldemul_syslib}
 568 @item may call @code{ldemul_open_dynamic_archive}
 569 @end itemize
 570
 571 @item @code{ldemul_after_parse}
 572 @item @code{lang_process()} in @file{ldlang.c}
 573
 574 @itemize @bullet
 575 @item create @code{output_bfd}
 576 @item @code{ldemul_set_output_arch}
 577 @item @code{ldemul_create_output_section_statements}
 578 @item read objects, create input bfds - all symbols exist, but have no values
 579 @item may call @code{ldemul_unrecognized_file}
 580 @item will call @code{ldemul_recognized_file}
 581 @item @code{ldemul_after_open}
 582 @item map input sections to output sections
 583 @item may call @code{ldemul_place_orphan} for remaining sections
 584 @item @code{ldemul_before_allocation}
 585 @item gives input sections offsets into output sections, places output sections
 586 @item @code{ldemul_after_allocation} - section addresses valid
 587 @item assigns values to symbols
 588 @item @code{ldemul_finish} - symbol values valid
 589 @end itemize
 590
 591 @item output bfd is written to disk
 592
 593 @end itemize
 594
 595 @node Architecture Specific
 596 @chapter Some Architecture Specific Notes
 597
 598 This is the place for notes on the behavior of @code{ld} on
 599 specific platforms.  Currently, only Intel x86 is documented (and
 600 of that, only the auto-import behavior for DLLs).
 601
 602 @menu
 603 * ix86::                        Intel x86
 604 @end menu
 605
 606 @node ix86
 607 @section Intel x86
 608
 609 @table @emph
 610 @code{ld} can create DLLs that operate with various runtimes available
 611 on a common x86 operating system.  These runtimes include native (using
 612 the mingw "platform"), cygwin, and pw.
 613
 614 @item auto-import from DLLs
 615 @enumerate
 616 @item
 617 With this feature on, DLL clients can import variables from DLL
 618 without any concern from their side (for example, without any source
 619 code modifications).  Auto-import can be enabled using the
 620 @code{--enable-auto-import} flag, or disabled via the
 621 @code{--disable-auto-import} flag.  Auto-import is disabled by default.
 622
 623 @item
 624 This is done completely in bounds of the PE specification (to be fair,
 625 there's a minor violation of the spec at one point, but in practice
 626 auto-import works on all known variants of that common x86 operating
 627 system)  So, the resulting DLL can be used with any other PE
 628 compiler/linker.
 629
 630 @item
 631 Auto-import is fully compatible with standard import method, in which
 632 variables are decorated using attribute modifiers. Libraries of either
 633 type may be mixed together.
 634
 635 @item
 636 Overhead (space): 8 bytes per imported symbol, plus 20 for each
 637 reference to it; Overhead (load time): negligible; Overhead
 638 (virtual/physical memory): should be less than effect of DLL
 639 relocation.
 640 @end enumerate
 641
 642 Motivation
 643
 644 The obvious and only way to get rid of dllimport insanity is
 645 to make client access variable directly in the DLL, bypassing
 646 the extra dereference imposed by ordinary DLL runtime linking.
 647 I.e., whenever client contains something like
 648
 649 @code{mov dll_var,%eax,}
 650
 651 address of dll_var in the command should be relocated to point
 652 into loaded DLL. The aim is to make OS loader do so, and than
 653 make ld help with that.  Import section of PE made following
 654 way: there's a vector of structures each describing imports
 655 from particular DLL. Each such structure points to two other
 656 parallel vectors: one holding imported names, and one which
 657 will hold address of corresponding imported name. So, the
 658 solution is de-vectorize these structures, making import
 659 locations be sparse and pointing directly into code.
 660
 661 Implementation
 662
 663 For each reference of data symbol to be imported from DLL (to
 664 set of which belong symbols with name <sym>, if __imp_<sym> is
 665 found in implib), the import fixup entry is generated. That
 666 entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3
 667 subsection. Each fixup entry contains pointer to symbol's address
 668 within .text section (marked with __fuN_<sym> symbol, where N is
 669 integer), pointer to DLL name (so, DLL name is referenced by
 670 multiple entries), and pointer to symbol name thunk. Symbol name
 671 thunk is singleton vector (__nm_th_<symbol>) pointing to
 672 IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing
 673 imported name. Here comes that "om the edge" problem mentioned above:
 674 PE specification rambles that name vector (OriginalFirstThunk) should
 675 run in parallel with addresses vector (FirstThunk), i.e. that they
 676 should have same number of elements and terminated with zero. We violate
 677 this, since FirstThunk points directly into machine code. But in
 678 practice, OS loader implemented the sane way: it goes thru
 679 OriginalFirstThunk and puts addresses to FirstThunk, not something
 680 else. It once again should be noted that dll and symbol name
 681 structures are reused across fixup entries and should be there
 682 anyway to support standard import stuff, so sustained overhead is
 683 20 bytes per reference. Other question is whether having several
 684 IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes,
 685 it is done even by native compiler/linker (libth32's functions are in
 686 fact resident in windows9x kernel32.dll, so if you use it, you have
 687 two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is
 688 whether referencing the same PE structures several times is valid.
 689 The answer is why not, prohibiting that (detecting violation) would
 690 require more work on behalf of loader than not doing it.
 691
 692 @end table
 693
 694 @node GNU Free Documentation License
 695 @chapter GNU Free Documentation License
 696
 697 @include fdl.texi
 698
 699 @contents
 700 @bye