Nerd Ralph: GCC LTO call graph generation

Thursday, July 3, 2014

GCC LTO call graph generation

Since my first post about gcc link-time optimization, I've been doing more investigation into how it works. Since I've written some Arduino libraries as a way to help out newbies to embedded development, I follow the Arduino developers list. At the request of a few developers on the list, LTO was included in the arduino nightly beta. Due to LTO builds of the USB Host Shield library being extremely large, Christian announced on June 30th that LTO would be disabled until the issue was figured out. I decided to investigate to see what was happening.

Arduino doesn't use makefiles for building, and instead has a build process which allows for configuration of some of the build flags in a platform.txt file which is read on startup of the IDE. Testing build flags involves editing the file, starting the IDE, and doing a build. To speed up the process and to have complete control over the compile, archive, and link process I worked with avr-gcc on the command line, using the temporary files compiled by Arduino.

My first discovery was that when a library's .o files are included on the command line (instead of an archive file), LTO didn't work. All the code from each .o was included in the binary, even code that was never used. That's when I started researching how gcc's callgraph generation works. The gcc internals documentation is good, though it still leaves unanswered questions.

The big question is how does it pick the root of the call graph? I thought the starting point for the call graph analysis would be the C Runtime Initialization object file, which references main. From there the linker could look at all the .o's to find main, and build the graph. After lots of experiments, I discovered LTO may generate multiple call graphs for the same executable - in other words there can be many root nodes. Every .o file specified on the command line will be used for root node candidates, and if there are only .a files specified, every .o in the first .a will be used for root node candidates.

When all the .o files for an Arduino library are included on the build line, gcc's LTO includes many functions that never get called from main. Creating an archive with the .o files solves that problem. That still doesn't result in optimal performance of LTO. Since the Arduino IDE compiles all of it's core .o files into core.a, which is the first argument in the build line, all the .o files in core.a get included in the build. Specifying main.cpp.o solves this problem.

$ avr-gcc -mmcu=atmega328p -Wl,-relax -flto -Os main.cpp.o hjrp.a core.a USB_Host_Shield_2.0-master/UHS.a -o USBHIDJ.elf

$ avr-size USBHIDJ.elf

text data bss dec hex filename

12530 194 460 13184 3380 USBHIDJ.elf

Building the same files without main.cpp.o specified (listing core.a first) was almost 3KB larger:

$ avr-size USBHIDJ.elf

text data bss dec hex filename

18152 204 493 18849 49a1 USBHIDJ.elf

For some further details on LTO check out Honza Hubička's Blog:

http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-1-brief.html

4 comments:

Honza HubičkaJuly 10, 2014 at 4:08 AM
This is quite strange behavior. GCC basically uses linked plugin feedback to figure out what symbols are needed. Either in your setup you have no linker plugin enabled (try compiling with -fuse-linker-plugin) or something is wrong with the resolution data given by the plugin. You can see the data after copmiling with --save-temps. They end up in file called *.res (often soemting like -lm.res that is anoying) and you can see what symbols are roots of GCC's unreachable code removal.

Honza
ReplyDelete
Replies

Add comment