Doug Binks - 31 Oct 2017
It's rare for me to read a blog post and immediatly put the information to use, but this post from Aras Pranckevičius, Best unknown MSVC flag: d2cgsummary is one. Within a short time I had cut elapsed compile and link times for Runtime Compiled C++ live coding with Visual Studio by 1.5x, and eventually 3x with total compile time (total time for all compile processes in a multithreaded system) down 10x. This post explains how.
Adding a new menu item using RCC++ with the changes made to improve compile times. The above video compile time was a little longer than I finally achieved due to FRAPS taking some CPU time.Runtime Compiled C++ is a technique I created to edit C++ code while it's running. RCC++ is liberally licensed and I've open sourced it on Github. I use RCC++ for Avoyd and I've been puzzled by the long compilation time for a particular source file for some time. This file handles a large part of the editor menu for Avoyd using dear imgui for the interface on top of glfw for OS abstraction window and input handling. Both are fast to compile, so my earlier efforts at investigating compile times had concentrated on the fact the menu system interacts with a lot of header files. This was born out by other approaches to investigating compile times, but so far I'd not been able to improve things by reducing the headers included. Reading Aras' post led to adding /d2cgsummary
to my additional compile options for RCC++.
g_pSys->pRuntimeObjectSystem->SetAdditionalCompileOptions( "/d2cgsummary /permissive- /D PROFILE /D RCCPPON" );
RCC++ is intended for fast iteration on source code, making changes rapidly on the fly. The tracking system enables RCC++ to deduce dependencies and compile a small DLL (.so on POSIX) with only the required code, and without needing the developer to set up the project to use DLLs. Many files can have changes compiled & linked in ~1s, so compile times of multiple seconds are a problem which needs fixing.
Using /d2cgsummary
the debug output from RCC++ before I made any code changes to improve compile times was:
Code Generation Summary Total Function Count: 2388 Elapsed Time: 8.074 sec Total Compilation Time: 21.806 sec Efficiency: 270.1%regularily Anomalistic Compile Times: 111 ?Render@MenuEditor@@UEAA_NXZ: 0.234 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0BAB@@?A0x17235d25@@UEBAPEBD_K@Z: 0.088 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NP@@?A0x17235d25@@UEBAPEBD_K@Z: 0.087 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0BAA@@?A0x17235d25@@UEBAPEBD_K@Z: 0.083 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PP@@?A0x17235d25@@UEBAPEBD_K@Z: 0.081 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PO@@?A0x17235d25@@UEBAPEBD_K@Z: 0.081 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LK@@?A0x17235d25@@UEBAPEBD_K@Z: 0.080 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PM@@?A0x17235d25@@UEBAPEBD_K@Z: 0.080 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PN@@?A0x17235d25@@UEBAPEBD_K@Z: 0.080 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PK@@?A0x17235d25@@UEBAPEBD_K@Z: 0.079 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PL@@?A0x17235d25@@UEBAPEBD_K@Z: 0.077 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PJ@@?A0x17235d25@@UEBAPEBD_K@Z: 0.077 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MD@@?A0x17235d25@@UEBAPEBD_K@Z: 0.077 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PH@@?A0x17235d25@@UEBAPEBD_K@Z: 0.077 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PI@@?A0x17235d25@@UEBAPEBD_K@Z: 0.075 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PD@@?A0x17235d25@@UEBAPEBD_K@Z: 0.075 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OF@@?A0x17235d25@@UEBAPEBD_K@Z: 0.075 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0PB@@?A0x17235d25@@UEBAPEBD_K@Z: 0.074 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LF@@?A0x17235d25@@UEBAPEBD_K@Z: 0.074 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OL@@?A0x17235d25@@UEBAPEBD_K@Z: 0.073 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OI@@?A0x17235d25@@UEBAPEBD_K@Z: 0.073 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OK@@?A0x17235d25@@UEBAPEBD_K@Z: 0.073 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OJ@@?A0x17235d25@@UEBAPEBD_K@Z: 0.072 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LA@@?A0x17235d25@@UEBAPEBD_K@Z: 0.072 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OC@@?A0x17235d25@@UEBAPEBD_K@Z: 0.070 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OH@@?A0x17235d25@@UEBAPEBD_K@Z: 0.069 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OG@@?A0x17235d25@@UEBAPEBD_K@Z: 0.069 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OE@@?A0x17235d25@@UEBAPEBD_K@Z: 0.069 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MM@@?A0x17235d25@@UEBAPEBD_K@Z: 0.069 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OD@@?A0x17235d25@@UEBAPEBD_K@Z: 0.068 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OA@@?A0x17235d25@@UEBAPEBD_K@Z: 0.066 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0OB@@?A0x17235d25@@UEBAPEBD_K@Z: 0.066 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LL@@?A0x17235d25@@UEBAPEBD_K@Z: 0.066 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NM@@?A0x17235d25@@UEBAPEBD_K@Z: 0.066 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LE@@?A0x17235d25@@UEBAPEBD_K@Z: 0.065 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NO@@?A0x17235d25@@UEBAPEBD_K@Z: 0.064 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NN@@?A0x17235d25@@UEBAPEBD_K@Z: 0.064 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NJ@@?A0x17235d25@@UEBAPEBD_K@Z: 0.063 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0ML@@?A0x17235d25@@UEBAPEBD_K@Z: 0.063 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LB@@?A0x17235d25@@UEBAPEBD_K@Z: 0.062 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NK@@?A0x17235d25@@UEBAPEBD_K@Z: 0.062 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LG@@?A0x17235d25@@UEBAPEBD_K@Z: 0.062 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NI@@?A0x17235d25@@UEBAPEBD_K@Z: 0.062 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NL@@?A0x17235d25@@UEBAPEBD_K@Z: 0.061 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0NH@@?A0x17235d25@@UEBAPEBD_K@Z: 0.061 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LI@@?A0x17235d25@@UEBAPEBD_K@Z: 0.061 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KB@@?A0x17235d25@@UEBAPEBD_K@Z: 0.060 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LJ@@?A0x17235d25@@UEBAPEBD_K@Z: 0.060 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LH@@?A0x17235d25@@UEBAPEBD_K@Z: 0.059 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MK@@?A0x17235d25@@UEBAPEBD_K@Z: 0.059 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MI@@?A0x17235d25@@UEBAPEBD_K@Z: 0.058 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KO@@?A0x17235d25@@UEBAPEBD_K@Z: 0.058 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KD@@?A0x17235d25@@UEBAPEBD_K@Z: 0.058 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MJ@@?A0x17235d25@@UEBAPEBD_K@Z: 0.057 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MF@@?A0x17235d25@@UEBAPEBD_K@Z: 0.056 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MG@@?A0x17235d25@@UEBAPEBD_K@Z: 0.056 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MH@@?A0x17235d25@@UEBAPEBD_K@Z: 0.055 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0ME@@?A0x17235d25@@UEBAPEBD_K@Z: 0.055 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LC@@?A0x17235d25@@UEBAPEBD_K@Z: 0.055 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LD@@?A0x17235d25@@UEBAPEBD_K@Z: 0.054 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MB@@?A0x17235d25@@UEBAPEBD_K@Z: 0.054 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MC@@?A0x17235d25@@UEBAPEBD_K@Z: 0.054 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LN@@?A0x17235d25@@UEBAPEBD_K@Z: 0.053 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LO@@?A0x17235d25@@UEBAPEBD_K@Z: 0.052 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JI@@?A0x17235d25@@UEBAPEBD_K@Z: 0.052 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0MA@@?A0x17235d25@@UEBAPEBD_K@Z: 0.052 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LM@@?A0x17235d25@@UEBAPEBD_K@Z: 0.051 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0LP@@?A0x17235d25@@UEBAPEBD_K@Z: 0.051 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KA@@?A0x17235d25@@UEBAPEBD_K@Z: 0.051 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KE@@?A0x17235d25@@UEBAPEBD_K@Z: 0.050 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KM@@?A0x17235d25@@UEBAPEBD_K@Z: 0.049 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KL@@?A0x17235d25@@UEBAPEBD_K@Z: 0.049 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KG@@?A0x17235d25@@UEBAPEBD_K@Z: 0.048 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0BAC@@?A0x17235d25@@QEAA@_K@Z: 0.047 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JN@@?A0x17235d25@@UEBAPEBD_K@Z: 0.047 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0PH@@?A0x17235d25@@QEAA@_K@Z: 0.046 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KH@@?A0x17235d25@@UEBAPEBD_K@Z: 0.046 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JH@@?A0x17235d25@@UEBAPEBD_K@Z: 0.046 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KC@@?A0x17235d25@@UEBAPEBD_K@Z: 0.046 sec, 0 instrs ?GetLinkLibrary@?$RuntimeLinkLibrary@$0HO@@?A0x17235d25@@UEBAPEBD_K@Z: 0.046 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0IO@@?A0x17235d25@@UEBAPEBD_K@Z: 0.045 sec, 0 instrs ??0?$RuntimeSourceDependency@$0PB@@?A0x17235d25@@QEAA@_K@Z: 0.045 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KP@@?A0x17235d25@@UEBAPEBD_K@Z: 0.045 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JM@@?A0x17235d25@@UEBAPEBD_K@Z: 0.045 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JF@@?A0x17235d25@@UEBAPEBD_K@Z: 0.044 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KN@@?A0x17235d25@@UEBAPEBD_K@Z: 0.044 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KK@@?A0x17235d25@@UEBAPEBD_K@Z: 0.044 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KF@@?A0x17235d25@@UEBAPEBD_K@Z: 0.044 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0BAB@@?A0x17235d25@@QEAA@_K@Z: 0.043 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JD@@?A0x17235d25@@UEBAPEBD_K@Z: 0.043 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KJ@@?A0x17235d25@@UEBAPEBD_K@Z: 0.042 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JA@@?A0x17235d25@@UEBAPEBD_K@Z: 0.042 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0PN@@?A0x17235d25@@QEAA@_K@Z: 0.042 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0PB@@?A0x17235d25@@QEAA@_K@Z: 0.042 sec, 0 instrs ?GetLinkLibrary@?$RuntimeLinkLibrary@$0ID@@?A0x17235d25@@UEBAPEBD_K@Z: 0.042 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JJ@@?A0x17235d25@@UEBAPEBD_K@Z: 0.042 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0PJ@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeSourceDependency@$0PJ@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0OL@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0OE@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0OP@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0KI@@?A0x17235d25@@UEBAPEBD_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0BAE@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0OP@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JE@@?A0x17235d25@@UEBAPEBD_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0OK@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0PL@@?A0x17235d25@@QEAA@_K@Z: 0.041 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0IC@@?A0x17235d25@@UEBAPEBD_K@Z: 0.040 sec, 0 instrs ??0?$RuntimeSourceDependency@$0OJ@@?A0x17235d25@@QEAA@_K@Z: 0.040 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0PJ@@?A0x17235d25@@QEAA@_K@Z: 0.039 sec, 0 instrs ?GetIncludeDir@?$RuntimeIncludeFiles@$0JG@@?A0x17235d25@@UEBAPEBD_K@Z: 0.039 sec, 0 instrs Serialized Initializer Count: 22 Serialized Initializer Time: 0.095 sec RdrReadProc Caching Stats Functions Cached: 658 Retrieved Count: 51170 Abandoned Retrieval Count: 0 Abandoned Caching Count: 0 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 653 Functions Cached and Never Retrieved: 5 Most Hits: ?GetNULL@SourceDependencyInfo@@SA?AU1@XZ: 1400 ?hash_cstring@base@@YA_KPEBD_K@Z: 440 ??0StringHash@base@@QEAA@PEBDI@Z: 436 ??__K_shashstr@@YA?BUStringHash@base@@PEBD_K@Z: 435 ??0MemberInfo@@QEAA@W4typeID_t@0@UStringHash@base@@1I_NPEAU0@I@Z: 179 ??0IRuntimeLinkLibraryList@@QEAA@_K@Z: 154 ??0?$RuntimeLinkLibrary@$00@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeSourceDependency@$05@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeSourceDependency@$02@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeSourceDependency@$0A@@?A0x17235d25@@QEAA@_K@Z: 154 ??0IRuntimeSourceDependencyList@@QEAA@_K@Z: 154 Least Hits: ??0?$RuntimeLinkLibrary@$0KB@@?A0x17235d25@@QEAA@_K@Z: 0 ?GetIncludeDir@?$RuntimeIncludeFiles@$0IK@@?A0x17235d25@@UEBAPEBD_K@Z: 0 ?GetLinkLibrary@?$RuntimeLinkLibrary@$0CC@@?A0x17235d25@@UEBAPEBD_K@Z: 0 ??0?$RuntimeSourceDependency@$0KA@@?A0x17235d25@@QEAA@_K@Z: 0 ??0?$RuntimeIncludeFiles@$0JP@@?A0x17235d25@@QEAA@_K@Z: 0 ??0?$RuntimeSourceDependency@$0JP@@?A0x17235d25@@QEAA@_K@Z: 1 ?GetLinkLibrary@?$RuntimeLinkLibrary@$0CB@@?A0x17235d25@@UEBAPEBD_K@Z: 1 ?GetIncludeDir@?$RuntimeIncludeFiles@$0IJ@@?A0x17235d25@@UEBAPEBD_K@Z: 1 ??0?$RuntimeLinkLibrary@$0KA@@?A0x17235d25@@QEAA@_K@Z: 1 ??0?$RuntimeIncludeFiles@$0JO@@?A0x17235d25@@QEAA@_K@Z: 1 ?GetLinkLibrary@?$RuntimeLinkLibrary@$0CA@@?A0x17235d25@@UEBAPEBD_K@Z: 2 [RuntimeCompiler] Complete Compilation Succeeded Serializing out from 50 old constructors... Swapping in and creating objects for 1 new constructors... Serialising in... Auto Constructing Singletons... Initialising... Object swap completed Runtime Compile Module Swap time: 0.0584065s Compile Time: 12.05480
Elapsed Compile & Link Time: 12.0s
Switching to debug in RCC++ which you can do for a single file, leaving the rest of the code in release:
Code Generation Summary Total Function Count: 2388 Elapsed Time: 0.077 sec Total Compilation Time: 0.297 sec Efficiency: 386.1% Average time per function: 0.000 sec Anomalistic Compile Times: 8 ?Render@MenuEditor@@UEAA_NXZ: 0.014 sec, 0 instrs ??0?$RuntimeLinkLibrary@$0BAF@@?A0x17235d25@@QEAA@XZ: 0.006 sec, 0 instrs ??0?$RuntimeIncludeFiles@$0BAD@@?A0x17235d25@@QEAA@XZ: 0.004 sec, 0 instrs ??0?$RuntimeSourceDependency@$0BAE@@?A0x17235d25@@QEAA@XZ: 0.004 sec, 0 instrs ??0?$TObjectConstructorConcrete@V?$TActual@VMenuEditor@@@@@@QEAA@PEBDPEAUIRuntimeIncludeFileList@@PEAUIRuntimeSourceDependencyList@@PEAUIRuntimeLinkLibraryList@@_N4@Z: 0.002 sec, 0 instrs ??0?$MemberInfoArray@USFullscreen@Config@@@@QEAA@XZ: 0.001 sec, 0 instrs ?i_mulps@@YAXAEAT__m128@@T1@1@Z: 0.001 sec, 0 instrs ?i_addss@@YAXAEAT__m128@@T1@1@Z: 0.001 sec, 0 instrs RdrReadProc Caching Stats Functions Cached: 0 Retrieved Count: 0 Abandoned Retrieval Count: 0 Abandoned Caching Count: 0 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 0 Functions Cached and Never Retrieved: 0 Most Hits: Least Hits: Creating library C:\Users\Doug\AppData\Local\Temp\DA5C.lib and object C:\Users\Doug\AppData\Local\Temp\DA5C.exp [RuntimeCompiler] Complete The thread 0x1250 has exited with code 0 (0x0). 'Avoyd.exe' (Win32): Loaded 'C:\Users\Doug\AppData\Local\Temp\DA5C.tmp'. Symbols loaded. Compilation Succeeded Serializing out from 50 old constructors... Swapping in and creating objects for 1 new constructors... Serialising in... Auto Constructing Singletons... Initialising... Object swap completed Runtime Compile Module Swap time: 0.0414918s Compile Time: 3.919083
Elapsed Compile & Link Time: 3.9s
The debug compilation was a lot faster, and the functions taking time when optimization was on were not the ones I'd expect.
There were a lot of ?GetIncludeDir@?$RuntimeIncludeFiles
functions listed which added up to a large amount of time. This was a virtual function GetIncludeDir
in template RuntimeIncludeFiles
for tracking include files for RCC++, and it turned out the function wasn't even being called! Ripping this function out reduced compile times from 12s to 8s.
Fool me once, shame on you, fool me __COUNTER__
, shame on me.
So what's going on? To understand that we need to look at the source:
#pragma once
#include < stddef.h >
#ifndef RCCPPOFF
// NOTE: the file macro will only emit the full path if /FC option is used in Visual Studio or /ZI (Which forces /FC)
// Following creates a list of files which are runtime modifiable, to be used in headers requires use of __COUNTER__ predefined macro, which is in gcc 4.3+, clang/llvm and MSVC
struct IRuntimeIncludeFileList
{
IRuntimeIncludeFileList( size_t max ) : MaxNum( max )
{
}
// GetIncludeFile may return 0, so you should iterate through to GetMaxNum() ignoring 0 returns
virtual const char* GetIncludeFile( size_t Num_ ) const
{
return 0;
}
size_t MaxNum; // initialized in constructor below
};
namespace
{
template< size_t COUNT > struct RuntimeIncludeFiles : public RuntimeIncludeFiles<COUNT-1>
{
RuntimeIncludeFiles( size_t max ) : RuntimeIncludeFiles<COUNT-1>( max )
{
}
RuntimeIncludeFiles() : RuntimeIncludeFiles<COUNT-1>( COUNT )
{
}
virtual const char* GetIncludeDir( size_t Num_ ) const
{
if( Num_ < COUNT )
{
return this->RuntimeIncludeFiles< COUNT-1 >::GetIncludeDir( Num_ );
}
else return 0;
}
};
template<> struct RuntimeIncludeFiles<0> : public IRuntimeIncludeFileList
{
RuntimeIncludeFiles( size_t max ) : IRuntimeIncludeFileList( max )
{
}
RuntimeIncludeFiles() : IRuntimeIncludeFileList( 0 )
{
}
virtual const char* GetIncludeDir( size_t Num_ ) const
{
return 0;
}
};
#define RUNTIME_MODIFIABLE_INCLUDE_BASE( N ) \
template<> struct RuntimeIncludeFiles< N + 1 > : public RuntimeIncludeFiles< N >\
{ \
RuntimeIncludeFiles( size_t max ) : RuntimeIncludeFiles<N>( max ) {} \
RuntimeIncludeFiles< N + 1 >() : RuntimeIncludeFiles<N>( N + 1 ) {} \
virtual const char* GetIncludeFile( size_t Num_ ) const \
{ \
if( Num_ <= N ) \
{ \
if( Num_ == N ) \
{ \
return __FILE__; \
} \
else return this->RuntimeIncludeFiles< N >::GetIncludeFile( Num_ ); \
} \
else return 0; \
} \
}; \
#define RUNTIME_MODIFIABLE_INCLUDE namespace { RUNTIME_MODIFIABLE_INCLUDE_BASE( __COUNTER__ ) }
}
#else
#define RUNTIME_MODIFIABLE_INCLUDE
#endif //RCCPPOFF
This involves an evil set of macros and templates, and the predefined macro __COUNTER__
.
First I define an interface class and a recursive template based on an integer parameter, along with a terminator template for COUNT = 0
.
// simplified pseudo code
// interface
struct IRuntimeIncludeFileList {};
// recursive template
template< size_t COUNT >
struct RuntimeIncludeFiles : public RuntimeIncludeFiles<COUNT-1> {};
// terminator template specialization
template<>
struct RuntimeIncludeFiles<0> : public IRuntimeIncludeFileList
I define a set of macros which instantiate this:
// simplified pseudo code
// macro for template specialization which uses __FILE__ to locate header
#define RUNTIME_MODIFIABLE_INCLUDE_BASE( N ) \
template<> struct RuntimeIncludeFiles< N + 1 > : public RuntimeIncludeFiles< N >\
{ \
RuntimeIncludeFiles( size_t max ) : RuntimeIncludeFiles<N>( max ) {} \
RuntimeIncludeFiles< N + 1 >() : RuntimeIncludeFiles<N>( N + 1 ) {} \
virtual const char* GetIncludeFile( size_t Num_ ) const \
{ \
if( Num_ <= N ) \
{ \
if( Num_ == N ) \
{ \
return __FILE__; \
} \
else return this->RuntimeIncludeFiles< N >::GetIncludeFile( Num_ ); \
} \
else return 0; \
} \
}; \
// simple end user macro using __COUNTER__
#define RUNTIME_MODIFIABLE_INCLUDE namespace { RUNTIME_MODIFIABLE_INCLUDE_BASE( __COUNTER__ ) }
This is nasty stuff, but easy to use for people using RCC++ who simply have to add the following to a header file if they want a change in that file to trigger compilation of any dependent code:
#include "RuntimeInclude.h"
RUNTIME_MODIFIABLE_INCLUDE
The result is a handy approach which gives RCC++ header file tracking in a cross platform way without needing to understand project make files.
When RUNTIME_MODIFIABLE_INCLUDE
is encountered in a header file it declares the RuntimeIncludeFiles<N+1>
template specialization with N=__COUNTER__+1
. This is recursively derived from RuntimeIncludeFiles<N>
until it hits the specialization RuntimeIncludeFiles<0>
which derives from IRuntimeIncludeFileList
. A macro used in the source file then defines an instance of class RuntimeIncludeFiles<__COUNTER__>
(note the counter has now incremented by 1). The resulting struct hierarchy (remember a struct is a class with public default in C++) is something like:
// example hierarchy pseudo code
template<>
struct RuntimeIncludeFiles<10> :
RuntimeIncludeFiles<9> :
RuntimeIncludeFiles<8> :
RuntimeIncludeFiles<7> :
RuntimeIncludeFiles<6> :
RuntimeIncludeFiles<5> :
RuntimeIncludeFiles<4> :
RuntimeIncludeFiles<3> :
RuntimeIncludeFiles<2> :
RuntimeIncludeFiles<1> :
RuntimeIncludeFiles<0> :
IRuntimeIncludeFileList
{};
Only some of these base structs will be specialized by a macro in a header expanded to define a virtual function which returns the value of __FILE__
(the include file), the rest will return NULL
due to the non specialized template.
This hierarchy is used through the IRuntimeIncludeFileList
interface to get a list of includes:
//add include file mappings
for (size_t includeNum = 0;
includeNum <= constructors_[i]->GetMaxNumIncludeFiles();
++includeNum)
{
const char* pIncludeFile = constructors_[i]->GetIncludeFile(includeNum);
if( pIncludeFile )
{
// add file path code here.
}
}
I suspect that the MSVC optimizer is trying to optimize the chain of virtual functions to devirtualize them. With a large chain of templates this causes problems.
A quick fix is to use __pragma( optimize( "", off ) )
for Windows. This pragma can be included in a macro to set optimization off, followed by __pragma( optimize( "", on ) )
which resets the optimization. However I wasn't satisfied with this alone, because the investigations had turned up an issue with large projects using RCC++ - namely that expanding from __COUNTER__
to 0 was a bad idea when the counter could be used elsewhere. Additionally RCC++ uses this tracking approach in several macros for tracking include files, libraries to be linked to (a.cpp includes b.h which has definitions in b.lib), and to track dependent source files (a.cpp includes d.h which has definitions in d.cpp).
The final solution was to combine all the tracking types into one template, and to reduce the counter by the first value observed in addition to removing optimization for these templates.
Counter reduction used an offset declared as a constant in an anonymous namespace:
// anonymous namespace so the var is local to the compilation unit
namespace
{
// create a counter offset, usage: ( __COUNTER__ - COUNTER_OFFSET )
const size_t COUNTER_OFFSET = __COUNTER__;
}
This counter offset did not do much in my case, but for other projects which use a counter in a header before a header with an RCC++ tracking macro it might.
I combined templates by introducing the template struct RuntimeTracking to replace RuntimeIncludeFileList, RuntimeLinkLibraryList and RuntimeSourceDependencyList, using duck type RuntimeTrackignInfo, see this Pull Request for RCC++. By combining these the largest template hierarchy was approximately two times smaller.
With the counter reduction and template combination our compile now looks like:
Code Generation Summary Total Function Count: 1108 Elapsed Time: 3.101 sec Total Compilation Time: 4.029 sec Efficiency: 129.9% Average time per function: 0.004 sec Anomalistic Compile Times: 48 ?Render@MenuEditor@@UEAA_NXZ: 0.206 sec, 0 instrs ??0?$RuntimeTracking@$0BAB@@?A0x17235d25@@QEAA@_K@Z: 0.029 sec, 0 instrs ??0?$RuntimeTracking@$0BAC@@?A0x17235d25@@QEAA@_K@Z: 0.029 sec, 0 instrs ??0?$RuntimeTracking@$0BAD@@?A0x17235d25@@QEAA@_K@Z: 0.029 sec, 0 instrs ??0?$RuntimeTracking@$0PK@@?A0x17235d25@@QEAA@_K@Z: 0.029 sec, 0 instrs ??0?$RuntimeTracking@$0PO@@?A0x17235d25@@QEAA@_K@Z: 0.029 sec, 0 instrs ??0?$RuntimeTracking@$0PL@@?A0x17235d25@@QEAA@_K@Z: 0.028 sec, 0 instrs ??0?$RuntimeTracking@$0PP@@?A0x17235d25@@QEAA@_K@Z: 0.028 sec, 0 instrs ??0?$RuntimeTracking@$0BAA@@?A0x17235d25@@QEAA@_K@Z: 0.028 sec, 0 instrs ??0?$RuntimeTracking@$0PN@@?A0x17235d25@@QEAA@_K@Z: 0.028 sec, 0 instrs ??0?$RuntimeTracking@$0PM@@?A0x17235d25@@QEAA@_K@Z: 0.028 sec, 0 instrs ??0?$RuntimeTracking@$0PJ@@?A0x17235d25@@QEAA@_K@Z: 0.028 sec, 0 instrs ??0?$RuntimeTracking@$0BAE@@?A0x17235d25@@QEAA@XZ: 0.028 sec, 0 instrs ??0?$RuntimeTracking@$0PI@@?A0x17235d25@@QEAA@_K@Z: 0.027 sec, 0 instrs ??0?$RuntimeTracking@$0PD@@?A0x17235d25@@QEAA@_K@Z: 0.027 sec, 0 instrs ??0?$RuntimeTracking@$0PH@@?A0x17235d25@@QEAA@_K@Z: 0.027 sec, 0 instrs ??0?$RuntimeTracking@$0PE@@?A0x17235d25@@QEAA@_K@Z: 0.027 sec, 0 instrs ??0?$RuntimeTracking@$0PF@@?A0x17235d25@@QEAA@_K@Z: 0.027 sec, 0 instrs ??0?$RuntimeTracking@$0PG@@?A0x17235d25@@QEAA@_K@Z: 0.027 sec, 0 instrs ??0?$RuntimeTracking@$0PC@@?A0x17235d25@@QEAA@_K@Z: 0.027 sec, 0 instrs ??0?$RuntimeTracking@$0PB@@?A0x17235d25@@QEAA@_K@Z: 0.026 sec, 0 instrs ??0?$RuntimeTracking@$0OP@@?A0x17235d25@@QEAA@_K@Z: 0.026 sec, 0 instrs ??0?$RuntimeTracking@$0PA@@?A0x17235d25@@QEAA@_K@Z: 0.026 sec, 0 instrs ??0?$RuntimeTracking@$0OO@@?A0x17235d25@@QEAA@_K@Z: 0.025 sec, 0 instrs ??0?$RuntimeTracking@$0NM@@?A0x17235d25@@QEAA@_K@Z: 0.025 sec, 0 instrs ??0?$RuntimeTracking@$0ON@@?A0x17235d25@@QEAA@_K@Z: 0.025 sec, 0 instrs ??0?$RuntimeTracking@$0OM@@?A0x17235d25@@QEAA@_K@Z: 0.025 sec, 0 instrs ??0?$RuntimeTracking@$0OK@@?A0x17235d25@@QEAA@_K@Z: 0.025 sec, 0 instrs ??0?$RuntimeTracking@$0OL@@?A0x17235d25@@QEAA@_K@Z: 0.025 sec, 0 instrs ??0?$RuntimeTracking@$0OJ@@?A0x17235d25@@QEAA@_K@Z: 0.024 sec, 0 instrs ??0?$RuntimeTracking@$0OI@@?A0x17235d25@@QEAA@_K@Z: 0.024 sec, 0 instrs ??0?$RuntimeTracking@$0OD@@?A0x17235d25@@QEAA@_K@Z: 0.024 sec, 0 instrs ??0?$RuntimeTracking@$0OH@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0OG@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0NP@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0OF@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0OC@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0OE@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0NK@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0NN@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0NJ@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0NO@@?A0x17235d25@@QEAA@_K@Z: 0.023 sec, 0 instrs ??0?$RuntimeTracking@$0OB@@?A0x17235d25@@QEAA@_K@Z: 0.022 sec, 0 instrs ??0?$RuntimeTracking@$0OA@@?A0x17235d25@@QEAA@_K@Z: 0.022 sec, 0 instrs ??__Eg_runtimeTrackingList_MenuEditor@@YAXXZ: 0.022 sec, 0 instrs ??0?$RuntimeTracking@$0NL@@?A0x17235d25@@QEAA@_K@Z: 0.022 sec, 0 instrs ??0?$RuntimeTracking@$0NI@@?A0x17235d25@@QEAA@_K@Z: 0.022 sec, 0 instrs ??0?$RuntimeTracking@$0NH@@?A0x17235d25@@QEAA@_K@Z: 0.021 sec, 0 instrs Serialized Initializer Count: 20 Serialized Initializer Time: 0.049 sec RdrReadProc Caching Stats Functions Cached: 167 Retrieved Count: 14620 Abandoned Retrieval Count: 0 Abandoned Caching Count: 0 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 166 Functions Cached and Never Retrieved: 1 Most Hits: ?hash_cstring@base@@YA_KPEBD_K@Z: 440 ??0StringHash@base@@QEAA@PEBDI@Z: 436 ??__K_shashstr@@YA?BUStringHash@base@@PEBD_K@Z: 435 ??0MemberInfo@@QEAA@W4typeID_t@0@UStringHash@base@@1I_NPEAU0@I@Z: 179 ??0?$RuntimeTracking@$05@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeTracking@$02@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeTracking@$01@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeTracking@$03@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeTracking@$04@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeTracking@$00@?A0x17235d25@@QEAA@_K@Z: 154 ??0?$RuntimeTracking@$0A@@?A0x17235d25@@QEAA@_K@Z: 154 Least Hits: ??0?$RuntimeTracking@$0KA@@?A0x17235d25@@QEAA@_K@Z: 0 ??0?$RuntimeTracking@$0JP@@?A0x17235d25@@QEAA@_K@Z: 1 ??0?$RuntimeTracking@$0JO@@?A0x17235d25@@QEAA@_K@Z: 2 ??0?$RuntimeTracking@$0JN@@?A0x17235d25@@QEAA@_K@Z: 3 ??0?$RuntimeTracking@$0JM@@?A0x17235d25@@QEAA@_K@Z: 4 ??0?$RuntimeTracking@$0JL@@?A0x17235d25@@QEAA@_K@Z: 5 ??0?$RuntimeTracking@$0JK@@?A0x17235d25@@QEAA@_K@Z: 6 ??0?$RuntimeTracking@$0JJ@@?A0x17235d25@@QEAA@_K@Z: 7 ??0?$RuntimeTracking@$0JI@@?A0x17235d25@@QEAA@_K@Z: 8 ??0?$RuntimeTracking@$0JH@@?A0x17235d25@@QEAA@_K@Z: 9 ??0?$RuntimeTracking@$0JG@@?A0x17235d25@@QEAA@_K@Z: 10 Creating library C:\Users\Doug\AppData\Local\Temp\B6CE.lib and object C:\Users\Doug\AppData\Local\Temp\B6CE.exp [RuntimeCompiler] Complete The thread 0x34b0 has exited with code 0 (0x0). 'Avoyd.exe' (Win32): Loaded 'C:\Users\Doug\AppData\Local\Temp\B6CE.tmp'. Symbols loaded. Compilation Succeeded Serializing out from 50 old constructors... Swapping in and creating objects for 1 new constructors... Serialising in... Auto Constructing Singletons... Initialising... Object swap completed Runtime Compile Module Swap time: 0.0235966s Compile Time: 6.091957
Elapsed Compile & Link Time: 6.1s
Finally adding the optimization off macro we have:
Code Generation Summary Total Function Count: 1108 Elapsed Time: 1.404 sec Total Compilation Time: 2.219 sec Efficiency: 158.1% Average time per function: 0.002 sec Anomalistic Compile Times: 1 ?Render@MenuEditor@@UEAA_NXZ: 0.200 sec, 0 instrs Serialized Initializer Count: 20 Serialized Initializer Time: 0.037 sec RdrReadProc Caching Stats Functions Cached: 143 Retrieved Count: 10077 Abandoned Retrieval Count: 0 Abandoned Caching Count: 0 Wasted Caching Attempts: 0 Functions Retrieved at Least Once: 142 Functions Cached and Never Retrieved: 1 Most Hits: ?hash_cstring@base@@YA_KPEBD_K@Z: 440 ??0StringHash@base@@QEAA@PEBDI@Z: 435 ??__K_shashstr@@YA?BUStringHash@base@@PEBD_K@Z: 435 ??0MemberInfo@@QEAA@W4typeID_t@0@UStringHash@base@@1I_NPEAU0@I@Z: 179 ??0?$RuntimeTracking@$03@?A0x17235d25@@QEAA@_K@Z: 124 ??0?$RuntimeTracking@$02@?A0x17235d25@@QEAA@_K@Z: 123 ??0?$RuntimeTracking@$04@?A0x17235d25@@QEAA@_K@Z: 123 ??0?$RuntimeTracking@$01@?A0x17235d25@@QEAA@_K@Z: 122 ??0?$RuntimeTracking@$05@?A0x17235d25@@QEAA@_K@Z: 122 ??0?$RuntimeTracking@$06@?A0x17235d25@@QEAA@_K@Z: 121 ??0?$RuntimeTracking@$00@?A0x17235d25@@QEAA@_K@Z: 121 Least Hits: ??0?$RuntimeTracking@$0IJ@@?A0x17235d25@@QEAA@_K@Z: 0 ??0?$RuntimeTracking@$0II@@?A0x17235d25@@QEAA@_K@Z: 1 ??0?$RuntimeTracking@$0IH@@?A0x17235d25@@QEAA@_K@Z: 2 ??0?$RuntimeTracking@$0IG@@?A0x17235d25@@QEAA@_K@Z: 3 ??0?$RuntimeTracking@$0IF@@?A0x17235d25@@QEAA@_K@Z: 4 ??0?$RuntimeTracking@$0IE@@?A0x17235d25@@QEAA@_K@Z: 5 ??0?$RuntimeTracking@$0ID@@?A0x17235d25@@QEAA@_K@Z: 6 ??0?$RuntimeTracking@$0IC@@?A0x17235d25@@QEAA@_K@Z: 7 ??0?$RuntimeTracking@$0IB@@?A0x17235d25@@QEAA@_K@Z: 8 ??0?$RuntimeTracking@$0IA@@?A0x17235d25@@QEAA@_K@Z: 9 ??0?$RuntimeTracking@$0HP@@?A0x17235d25@@QEAA@_K@Z: 10 Creating library C:\Users\Doug\AppData\Local\Temp\A074.lib and object C:\Users\Doug\AppData\Local\Temp\A074.exp [RuntimeCompiler] Complete The thread 0x3b08 has exited with code 0 (0x0). 'Avoyd.exe' (Win32): Loaded 'C:\Users\Doug\AppData\Local\Temp\A074.tmp'. Symbols loaded. Compilation Succeeded Serializing out from 50 old constructors... Swapping in and creating objects for 1 new constructors... Serialising in... Auto Constructing Singletons... Initialising... Object swap completed Runtime Compile Module Swap time: 0.0259246s Compile Time: 4.335302
Elapsed Compile & Link Time: 4.3s
Going from 12s to 4s is no mean feat, and has a huge benefit in iteration time because we alter this file regularly. Whilst the optimization off macro would likely have done much of the work without changing the templates, this approach also helped reduce compile times for non optimized code from 3.9s to 2.9s, a 1s reduction and a 1.3x gain which isn't to be sniffed at. This should also help RCC++ compile times on other plaforms.
It is an even better improvement when you look at Total Compilation Time, a measurement of all the time taken by all processes for compilation on a multithreaded system, which went from 21.8s to 2.2s for a 10x gain.
Things to look out for in your own code:
__COUNTER__
might be used by other code, so add an offset.__pragma( optimize( "", off ) )
might be helpful for non performance sensitive code.I'll be investigating other options to further improve compile times. Private counter macros might be useful to reduce the template hierachy, though the only example I've seen, Boost Counter BOOST_PP_COUNTER
, looks like a large enough macro to have its own compilation issues.
Ideally C++ would have not only code structure introspection and modules but also standard methods to compile and link code, and standard approaches to inspect the include files a given source file depends on.
Finally, many thanks to Aras and the Microsoft C++ compiler folks who helped him with this flag, and for Andrew Pardoe at Visual Studio C++ for reaching out to me about compile time issues with RCC++.