lld 是 LLVM 里的一个子项目,是一个链接器,目前官方还在开发中。在 MacOS 上,其链接时间和优化效果都远超苹果的 ld64。
本文大纲如下: 1、Linker works 2、Traditional linker 3、lld:Atom based 3.1、types 3.2、Fast 3.3、file model 3.3.1、Object File 3.3.2、Static Library (Archive) 3.3.3、Dynamic Library (Shared Object) 3.4、linking step 3.4.1、Resolving 3.4.2、Passes 4、feature 5、Darwin Linker 5.1、Future Directions 5.1.1、Sections 5.1.2、Mach-o Object File Format 5.1.3、New Object File Model 5.1.4、Debug Info 6、Note
refer https://www.jianshu.com/p/ef3415255808
“Binds more abstract names to more concrete names, which permits programmers to write code using more abstract names”.
开发编程时使用的抽象的名字,链接器需要将这些抽象的名字绑定为更加具体的名字。
In general concrete representations are higher performance but less flexible.
一般更具体的表示会有更高的性能表现,但会导致更低的灵活性。
In practice a tool that glues together the outputs of separate compilation into a single output.
实际操作中,链接器是一个把每个编译输出结果给缝合到单个输出的工具。
A linker is in many ways a slightly more sophisticated version of cat. The linker must copy the necessary data from all the input objects, rearrange it and resolve fixups. It is worth bearing this in mind when complaining that the linker is slow, just doing a cat of all the object files can also take a significant amount of time, particularly when debug information is involved.
链接器有点像命令行的 cat 操作,它需要从输入的目标文件中复制所需的数据,重组并修正。这些工作是值得的,光是对所有的输入的目标文件进行 cat 操作就很耗时,尤其是还有调试信息在其中的时候。
链接器像一个流水线,每一个步骤使其表示更加明确且更接近机器码。最后,每个符号都需要被明确定义其地址以及重定位会被修正。
Loading an object from a static library may introduce new undefined references as well as definitions, this may require more objects to be loaded to satisfy these references.
加载静态库里的 .o 文件可能会引入新的未定义的引用和定义,这就需要加载更多的 .o 文件来满足这些引用。
Shared libraries may also be “loaded” to provide symbol definitions and references, although loading just means that the shared library is added as a dependency of the output. The contents of the shared library beyond the symbol table aren’t needed.
动态库也可能被加载进来以提供符号的定义与引用,尽管加载只是意味着这些动态库是被加到输出的依赖。对于链接器可言,动态库里只有符号信息是必要的。
Finding all the content for a link is an iterative process that finishes when no more objects are loaded to resolve an undefined reference.
找到要链接的所有内容是一个不断遍历的过程,直到没有需要解决的未定义引用为止。
通常链接参数会有固定的顺序要求,比如 file1.o lib1.a file2.o ,如果 file2.o 需要用到 lib1.a 里的符号,则会报错,因为通常已经加载过的目标文件不会再进行加载。解决方法是调整它们的为止,或者在 file2.o 后面再加一个 lib1.a。
链接器根据重定位信息用有向图把 section 连接起来,这个阶段还没有分配地址,所以可以做一个优化,如垃圾回收。
一旦每个 section 都分配了地址,每个重定位信息和函数定义都会有对应的地址了,就可以对重定位信息进行修正了。
“section” based like traditional linkers which mostly just interlace sections from multiple object files into the output file.
传统的链接器是基于 section 来完成的,把大量的目标文件的 sections 组合到输出文件当中。
Traditional section based linking work well for simple linking, but their model makes advanced linking features difficult to implement. Features like dead code stripping, reordering functions for locality, and C++ coalescing require the linker to work at a finer grain.
传统基于 section 的链接对于一般的链接是没问题的,但它们的模型对于一些链接特性是很难去支持和实现的。例如像去除无用代码、重排局部函数、c++ 的合并操作等。
An atom is an indivisible chunk of code or data. An atom has a set of attributes, such as: name, scope, content-type, alignment, etc. An atom also has a list of References. A Reference contains: a kind, an optional offset, an optional addend, and an optional target atom.
一个 atom 是不可再分割的代码或数据。一个 atom 有很多的属性,如名字、范围、类型、对齐等。同时一个 atom 也有一系列的引用,一个引用包含有类型、可选的偏移量、可选的添加项和可选的目标 atom。
The Atom model allows the linker to use standard graph theory models for linking data structures. Each atom is a node, and each Reference is an edge. The feature of dead code stripping is implemented by following edges to mark all live atoms, and then delete the non-live atoms.
atom 模型可以让链接器使用标准的图理论模型来链接数据结构。每个 atom 就是一个节点,而每个引用就是一条边。去除无用代码的特性就是根据边去标记存活的 atom、删除不存活的 atom 来实现的。
Typically each user written function or global variable is an atom. In addition, the compiler may emit other atoms, such as for literal c-strings or floating point constants, or for runtime data structures like dwarf unwind info or pointers to initializers.
一般来说,我们编写的函数或者全局变量就是一个 atom。此外,编译器也可能生成其他 atom,如 C 的文本字符串、浮点数常量、运行时的数据结构(dwarf 栈信息等)、指针的初始化。
There are only four different types of atoms:
atom 只有 4 种类型
95% of all atoms. This is a chunk of code or data 代码块或数据块
This is a place holder in object files for a reference to some atom outside the translation unit.During core linking it is usually replaced by (coalesced into) another Atom.
这是某个目标文件里对于引用的占位符,它指向的是当前编译单元之外的一些 atom。经过主要的链接后,通常它就会被其他 atom 所替换了。
If a required symbol name turns out to be defined in a dynamic shared library (and not some object file). A SharedLibraryAtom is the placeholder Atom used to represent that fact.It is similar to an UndefinedAtom, but it also tracks information about the associated shared library.
如果所需要的符号名是在动态库当中定义的。那么 SharedLibraryAtom 就是一个占位符的 atom。有点像上面的 UndefinedAtom,但它会跟踪相关的动态库的信息。
This is for embedded support where some stuff is implemented in ROM at some fixed address. This atom has no content. It is just an address that the Writer needs to fix up any references to point to.
这个是对嵌入式的支持,一些东西可能是在 ROM 中调整过的地址中实现的。这个 atom 没有内容信息,它只是一个地址,所以所有指向它的引用都需要被修正。
LLD only reads the information it needs from the input files. Copying section data to the output very late. Alternatively it could copy the full input contents into memory at read time. This would permit arbitrary modification of the section data, but would harm performance.
LLD 只从输入文件中读取它所需要的文件。很晚的时机才会去复制 section 的数据到输出。它可以选择直接复制所有的输入到内存当中,让它可以随意修改,但会对性能影响较大。
LLD tries to keep the amount of Target specific code to a minimum, this has the advantage of making it easy to port a new Target if it happens to look similar to other targets. It can make it more difficult if the target does something radically different though.
LLD 会尽量使特定平台的代码更少量,这会使它更容易去生成新的平台的代码。
The LLVM project has a rich ADT library that contains optimized containers and libraries that have accelerated LLD’s development and give it a performance advantage over default standard library types.
LLVM 有丰富的 ADT(抽象数据类型) 库,包含有很多优化过的容器,所以也可以用来优化 LLD。
As has been seen previously a linker will need to create hundreds of thousands of sections and symbols and in some cases tens of millions of relocations. This will end up with a lot of allocations of the same type, that can’t be released until the output is written (the last thing a linker does). A region based allocator where most allocations are just incrementing a pointer within a larger block can improve performance considerably. LLD has a separate memory region per type so iterating through a type doesn’t jump all over the address space.
一个链接器需要创建成百上千的 section 和符号,还有上万的重定位信息,这些会导致很多重复的对于相同类型的重定位信息,而且这些信息直到输出被写完才会被释放。内存分配于此会十分影响性能。LLD 为每个类型分配了独立的内存空间,所以遍历时不需要遍历所有的地址空间。
且 lld 的设计充分利用了并行的能力
An object file is just a container of atoms. When linking an object file, a reader is instantiated which parses the object file and instantiates a set of atoms representing all content in the .o file. The linker adds all those atoms to a master graph.
目标文件就是 atom 的容器。当链接目标文件时,读取控制器会被实例化,然后实例化一系列的 atom 来代表所有在 .o 文件里的内容。链接器会把其所有的 atom 加入到主要的图中。
This is the traditional unix static archive which is just a collection of object files with a “table of contents”. When linking with a static library, by default nothing is added to the master graph of atoms. Instead, if after merging all atoms from object files into a master graph, if any “undefined” atoms are left remaining in the master graph, the linker reads the table of contents for each static library to see if any have the needed definitions. If so, the set of atoms from the specified object file in the static library is added to the master graph of atoms.
静态库就是一系列目标文件的集合。当链接静态库时,默认情况下不会把任何 atom 加到主要的图中。如果从目标文件中合并所有的 atom 到主要的图时,出现了未定义的 atom,然后链接器才会去读取静态库看是否有需要的定义。如果有,那么从静态库特定的目标文件中把其 atom 加入到主要的图中。
Dynamic libraries are different than object files and static libraries in that they don’t directly add any content. Their purpose is to check at build time that the remaining undefined references can be resolved at runtime, and provide a list of dynamic libraries (SO_NEEDED) that will be needed at runtime. The way this is modeled in the linker is that a dynamic library contributes no atoms to the initial graph of atoms. Instead, (like static libraries) if there are “undefined” atoms in the master graph of all atoms, then each dynamic library is checked to see if exports the required symbol. If so, a “shared library” atom is instantiated by the by the reader which the linker uses to replace the “undefined” atom.
动态库不太一样,因为它不直接添加任何内容。它的目的是在编译时检查那些未定义的引用是否能在运行时找到,然后提供运行时所需要的一系列的动态库。所以动态库不会生成任何的 atom 到初始的图中。而是如果主要的图中出现了未定义的 atom (与上面静态库情况一致),链接器才会去检查动态库有没有导出对应的符号。如果有,那么动态库 atom 会被实例化,然后用来替换未定义的 atom。
The overall steps in linking are:
The Resolving and Passes steps are done purely on the master graph of atoms, so they have no notion of file formats such as mach-o or ELF.
The resolving step takes all the atoms’ graphs from each object file and combines them into one master object graph. Unfortunately, it is not as simple as appending the atom list from each file into one big list. There are many cases where atoms need to be coalesced. That is, two or more atoms need to be coalesced into one atom. This is necessary to support: C language “tentative definitions”, C++ weak symbols for templates and inlines defined in headers, replacing undefined atoms with actual definition atoms, and for merging copies of constants like c-strings and floating point constants.
解析过程会把目标文件里的所有 atom 的图合并到主要的图中。但并非简单地把 atom 列表加到同一个大列表当中。会有很多的情况下 atom 需要被合并(把两个或以上的 atom 合并到一个 atom 当中)。这很重要,像 C 语言的未决议的定义,C++ 头文件中的弱符号模版和内联函数,将这些未定义的 atom 替换为确切的定义好的 atom,然后合并 C 字符串常量和浮点数常量。
The linker support coalescing by-name and by-content. By-name is used for tentative definitions and weak symbols. By-content is used for constant data that can be merged.
链接器支持以名字或内容来合并。以名字用在未决议的定义和弱符号。以内容用在常量数据。
When one atom has a reference (FixUp) to another atom, there is also a binding type: by-name, direct, or indirect. A Fixup contains a tagged union that if the binding type is by-name, the union field is a pointer to a c-string. If the binding type is direct, the union is a pointer to an Atom. If the binding type is indirect, the union is a index into a table of pointers to Atoms. Below is a graphical representation of the binding types:
当一个 atom 引用了另一个 atom,会绑定类型:名字、直接调用、间接调用。如果是名字绑定的话,引用会包含有标签的集合,这个集合是 C 字符串的指针。如果是直接调用,这个集合就是 atom 的指针。如果是间接调用,这个集合就是指向 atom 的指针表上的某个 index。
Input file Atoms contain only direct and by-name references. Direct references are used for atoms defined in the same object file for which the target atom is either unnamed or cannot change. For instance, calling a static function in a translation unit will result in a direct reference to the static functions's atom. Also the FDE (dwarf unwind info) for a function has a direct reference to its function. On the other hand references to global symbols (e.g. call to printf) use by-name binding in object files.
输入文件的 atom 只包含有直接调用和名字引用。直接引用用在同一目标文件中那些目标 atom 是未命名或不能改变的 atom。对于实例而言,在一个编译单元里调用一个静态函数会导致对静态函数的 atom 的直接引用。调试信息 PDE 也会对它对应的函数有直接引用。另外对全局符号(printf)的引用是使用名字绑定的。
The resolving process maintains some global linking “state”, including a “symbol table” which is a map from llvm::StringRef to lld::Atom*. With these data structures, the linker iterates all atoms in all input files. For each atom, it checks if the atom is named and has a global or hidden scope. If so, the atom is added to the symbol table map. If there already is a matching atom in that table, that means the current atom needs to be coalesced with the found atom, or it is a multiple definition error.
解析过程会保存有全局链接状态,包括符号表。链接器遍历所有的输入文件下的 atom,对于每个 atom,它会检查这个 atom 是否被命名了和是否是全局或隐藏的作用域。如果是,则这个 atom 会被加入到符号表中,如果符号表中已经有相关的 atom 了,那么意味着当前的 atom 需要被合并,或者这就是个重复定义的编译错误。
Dead code stripping (if requested) is done at the end of resolving. The linker does a simple mark-and-sweep. It starts with “root” atoms (like “main” in a main executable) and follows each references and marks each Atom that it visits as “live”. When done, all atoms not marked “live” are removed.
无用代码去除会在解析完成之后进行。链接器会进行简单的标记与删除操作。它从根出发(main),然后跟踪所有的引用而标记存活。结束后,没有标记的 atom 会被删除。
The Passes step is an open ended set of routines that each get a change to modify or enhance the current lld::File object. Some example Passes are: 这个步骤是开放的,让我们有机会去修改或优化输出文件
Identical Code Folding
The ICF algorithm looks at sections that are identical, yet have different symbols. It forms equivalence classes, where each equivalence class contains identical functions. At the end of the ICF algorithm a single candidate is chosen and the rest are eliminated. ICF needs to take account of the connections between sections when deciding whether they are equivalent, this can be quite an expensive process when there are many sections. LLD has been able to parallelise parts of the ICF algorithm.
ICF 算法会查找相同的 section,但它们具有不同的符号。它形成等价的类,而每个等价的类中含有相同的函数。最后 ICF 会选择其中一个而删除其他的。ICF 需要通过 section 之间的连边来判断它们是否相等,由于 section 量很多,所以这个操作非常耗时。但 LLD 已经可以并行部分的 ICF 算法了。
The Darwin linker is based on "Atoms".
There are two atoms: main and an anonymous atom containing the c-string literal "hello world". The Atom "main" has two fixups. One is the call site for the call to printf, and the other is a fixup for the instruction that loads the address of the c-string literal.
The current use of sections in mach-o .o files over-constrains the linker. By default, the linker should preserve the section an atom is in. But since all sections must be contiguous in the output, that limits the ability of the linker to order atoms for locality. It would be helpful to enrich the object file with with reason something is in the section it is. For instance, is the section found at runtime? Or was the use of a section just a quick way to group some content together?
现有的 MachO 格式限制 atom 链接器的发挥,因为它需要以 section 的形式存储,且里面的内容要连贯。
The messiest part of the linker is the mach-o parser. This is because mach-o is a traditional section and symbols based file format. The parser must infer atom boundaries using two approaches. The first is that some section types have well defined content which the linker can parse into atoms (e.g. cstring, eh_frame). The other approach is a naming convention (which the compiler follows) by which the linker breaks sections into atoms at any non-local (not starting with 'L') symbol. The processing the linker has to do parse mach-o .o files is a significant part of the link time.
链接器最凌乱的部分就是 MachO 的 parser 了。因为它需要用两种方式去推断 atom 的范围,第一种一些 section 的类型定义得很好,可以直接找到;另一种是命名惯例,链接器通过非局部符号来判断。
Given that the assembler writes object files once, whereas the linker reads them many times (during development), it would make sense to optimize the object file format to be something the linker can read/parse efficiently.
LLVM has a nice model for its IR. There are three representations: the binary bit code file, the in-memory object model, and a textual representation. LLVM contains utility possible code for converting between these representations. The same model makes sense for atoms too. There should be three representations for atoms: binary file, in-memory, and textual. The Darwin linker already has an in-memory C++ object model for Atoms. All we need is a textual representation and binary file format.
atom 应该也需要有三种格式:二进制 bitcode、内存目标模型、文本表示。
Around 2005 when Apple switched from using STABS to using DWARF for debug information, we made a design decision to have the linker ignore DWARF in .o files. This improves linking performance because the linker is not copying tons of debug info. Instead, the linker adds "debug notes" into output binary that contain the paths of the original .o files. During development the Darwin debugger will notice the debug notes and the load the dwarf debug information from the original object files. For release builds, a tool named dsymutil is run on the program. It finds the debug notes and then the original object files, then reads, merges and optimizes all the dwarf debug information into one .dSYM file which can be loaded by the debugger if needed.
The current way DWARF is generated is that all debug information for all functions in a translation unit are merged and optimized into sections based on debug info kind. For instance the mapping of instructions to source line numbers for all functions is compressed and put in one section. This does not play well in an Atom based file format. One idea is to have the compiler emit some intermediate representation debug information (one which is partitioned per atom) into the Atom based file format. The linker could then have code to convert that intermediate debug into to final dwarf. This is still an open question.
DWARF 的紧凑格式让以 atom 单位的链接器非常难以处理。
Linker Design: https://lld.llvm.org/design.html
what makes lld so fast?:https://archive.fosdem.org/2019/schedule/event/llvm_lld/attachments/slides/3423/export/events/attachments/llvm_lld/slides/3423/WhatMakesLLDSoFastPresenterNotes.
Inside the Linker:https://opensource.apple.com/source/ld64/ld64-136/doc/design/linker.html
lld: A Fast, Simple and Portable Linker:https://llvm.org/devmtg/2017-10/slides/Ueyama-lld.pdf
LLD from a user’s perspective:https://archive.fosdem.org/2017/schedule/event/lld/attachments/slides/1446/export/events/attachments/lld/slides/1446/FosdemLLD2017.pdf