Profiling in .NET Core
installation of prerequisites
This guide is for popos 20.10 (groovy like)
perfcollect
$ curl -OL https://aka.ms/perfcollect
$ chmod +x perfcollect
$ # sudo ./perfcollect install
Copy the perfcollect bash script to ~/bin
.
Under Pop!_OS
the script will resolve it to Debian (and not Ubuntu), because
the following DISTRIB_ID
is unknown
$ cat /etc/lsb-release
DISTRIB_ID=Pop
DISTRIB_RELEASE=20.10
DISTRIB_CODENAME=groovy
DISTRIB_DESCRIPTION="Pop!_OS 20.10"
Therfore we do not use sudo ./perfcollect install
, because of this and other issues. Instead we manually install the deps below.
linux-headers
Must be downloaded to install perf for correct kernel version uname -r
$ sudo apt-get install linux-headers-$(uname - r)
perf
$ sudo apt-get install linux-tools-common
$ sudo apt-get install linux-tools-generic
$ sudo apt-get install linux-tools-$(uname -r)
$ sudo apt-get install linux-cloud-tools-generic
$ sudo apt-get install linux-cloud-tools-$(uname -r)
If you kernel version and linux-tool-generic version are same, you may be done,
and perf --version
should report a version (and no warning/error).
$ perf --version
perf version 5.11.7
You could also get
perf --version
WARNING: perf not found for kernel 5.8.0-7642
You may need to install the following packages for this specific kernel:
linux-tools-5.8.0-7642-generic
linux-cloud-tools-5.8.0-7642-generic
You may also want to install one of the following packages to keep up to date:
linux-tools-generic
linux-cloud-tools-generic
NOTES
linux-tools|linux-tools-common|linux-tools-generic
always points at the tools for the most up to date kernel version. When running an older kernel, if you want perf without rebooting to the newer kernel you have to explicitly install the tools paired with that kernel (hence theuname -r
shell expansion).- To see the installed version of linux-tools (perf)
$ ls /usr/lib/linux-tools
5.11.0-7612-generic/
lttng
# groovy requires 12.2.x releaseline branch
$ sudo add-apt-repository ppa:lttng/ppa
$ sudo apt-get update
The ppa will make it possible to use v12.2.5 of lttng-modules
$ apt policy lttng-modules-dkms
lttng-modules-dkms:
Installed: 2.12.x+stable+git1398+202103251918~ubuntu20.10.1
Candidate: 2.12.x+stable+git1398+202103251918~ubuntu20.10.1
Version table:
*** 2.12.x+stable+git1398+202103251918~ubuntu20.10.1 500
500 http://ppa.launchpad.net/lttng/ppa/ubuntu groovy/main amd64 Packages
500 http://ppa.launchpad.net/lttng/ppa/ubuntu groovy/main i386 Packages
100 /var/lib/dpkg/status
2.12.2-1ubuntu1 500
500 http://us.archive.ubuntu.com/ubuntu groovy/universe amd64 Packages
500 http://us.archive.ubuntu.com/ubuntu groovy/universe i386 Packages
$ sudo apt-get install lttng-tools
$ sudo apt-get install lttng-modules-dkms
$ sudo apt-get install liblttng-ust0
$ sudo apt-get install liblttng-ust-dev
NOTE: liblttng-ust0
provides the tracer shared library (the runtime dependency of an instrumented app/library) and liblttng-ust-dev
provide the necessary to compile applications/library with lttng-ust tracepoints.
lttng
should report a version.
$ lttng --version
lttng (LTTng Trace Control) 2.12.3 - (Ta) Meilleure
perfcollect
percollect
will use perf
to sample CPU stacks at 1000 Hertz:
$ perf record -k 1 -g --pid=228368 -F 1000 -e cpu-clock
# or if no PID
$ perf record -k 1 -g -a -F 1000 -e cpu-clock
We now have installed the following prerequisites:
-
perf
: the Linux Performance Events subsystem and companion user-mode collection/viewer application. perf is part of the Linux kernel source, but is not usually installed by default. -
LTTng
: Used to capture event data emitted at runtime by CoreCLR. This data is then used to analyze the behavior of various runtime components such as the GC, JIT, and thread pool.
We need to enable an environment variable in order to provide symbol information.
export COMPlus_PerfMapEnabled=1
export COMPlus_EnableEventLog=1
COMPlus_PerfMapEnabled=1
tells the runtime to emit information that enables perf_event
to resolve JIT-compiled code symbols.
Sampling (aka profiling) with perf (perfcollect)
terminal 1
$ dotnet bin/release/net5.0/TraceProfilingDemo.dll
terminal 2
$ ps -aux | grep 'TraceProfilingDemo'
maxfire **186749** 0.3 0.0 3137176 31852 pts/2 Sl+ 15:37 0:00 **dotnet bin/release/net5.0/TraceProfilingDemo.dll**
$ sudo perf record -g --pid 186749
or if not attaching (and only using one terminal)
$ sudo perf record -g -- dotnet bin/release/net5.0/TraceProfilingDemo.dll
we ask perf
to collect call stacks (-g).
Then
$ sudo perf report
NOTE: perf_events
has JIT support to solve this, which requires the .NET Runtime
to maintain a /tmp/perf-{PID}.map
file for symbol translation. You enables/disable
writing /tmp/perf-$pid.map
on Linux systems using COMPlus_PerfMapEnabled
.
NOTE: The perf record
command saves in the perf.data
unique identifiers
for all ELF images relevant to the measurement. In per-thread mode, this includes
all the ELF images of the monitored processes. In cpu-wide mode, it includes all
running processes running on the system. Those unique identifiers are generated
by the linker if the -Wl,--build-id
option is used. Thus, they are called build-id.
The build-id are a helpful tool when correlating instruction addresses to ELF images.
To extract all build-id entries used in a perf.data file, issue:
$ readelf -n /usr/lib/linux-tools-5.11.0-7612/perf | grep -e '[B|b]uild [I|i][d|D]:'
Build ID: fffd2b3dec83e5d91c81d73f38e0b127b0ef2a09
$ readelf -n /usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5/libclrjit.so | grep -e '[B|b]uild [I|i][d|D]:'
Build ID: d1a471bd128c619ccac914129ec3c6a19e977c04
An ELF build id is a unique identifier assigned by the linker to an executable.
sysctl
Read the current value:
$ sysctl kernel.kptr_restrict
kernel.kptr_restrict = 1
Modify the value:
$ sudo sysctl -w kernel.kptr_restrict=0
sysctl kernel.kptr_restrict=1
To make your modifications reboot persistent, you should edit /etc/sysctl.conf
or create a file in /etc/sysctl.d/50-mytest.conf
(edit the file as root),
containing:
kernel.kptr_restrict=1
In which case you should execute this command to reload your configuration:
$ sysctl -p /etc/sysctl.conf
Symbols: user code libraries
We can tell the .NET Runtime (JIT) to emit additional debugging info, which
perf will then use to convert function addresses to their names. The only thing
we need to do for to happen is to set COMPlus_PerfMapEnabled
environmental variable
to 1
terminal 1
# COMPlus_PerfMapEnabled tells the runtime to emit information
# that enables perf_event to resolve JIT-compiled code symbols.
$ COMPlus_PerfMapEnabled=1 dotnet bin/release/net5.0/TraceProfilingDemo.dll
Symbols: framework libraries
The Framework libraries (System.Private.CoreLib.dll etc) should also have debuginfo (symbols). The symbols are different than app-level symbols because the framework is pre-compiled while apps are just-in-time-compiled. For code like the framework that was precompiled to native code, you need a special tool called crossgen that knows how to generate the mapping from the native code to the name of the methods.
If you place the crossgen tool in the same directory as the .NET Runtime DLLs (e.g. libcoreclr.so), then perfcollect can find it and add the framework symbols to the trace file for you.
We can use a well-known trick. The crossgen tool is part of any self-contained published app. We can copy the crossgen tool from any self contained package.
$ dotnet publish --self-contained -r linux-x64 -c release TraceProfilingDemo.csproj
As a side effect of creating the self-contained application the dotnet 'muxer'
will download a nuget package called runtime.linux-x64.microsoft.netcore.app
and
place it in the directory ~/.nuget/packages/runtime.linux-x64.microsoft.netcore.app/VERSION
,
where VERSION is the version number of your .NET Core runtime (e.g. 5.0.5).
Under that folder is a tools directory and inside there is the crossgen tool you need.
$ find ~/.nuget/packages -iname 'crossgen'
/home/maxfire/.nuget/packages/microsoft.netcore.app.runtime.linux-x64/3.1.1/tools/crossgen
/home/maxfire/.nuget/packages/microsoft.netcore.app.runtime.linux-x64/5.0.5/tools/crossgen
/home/maxfire/.nuget/packages/microsoft.netcore.app.runtime.linux-x64/5.0.3/tools/crossgen
/home/maxfire/.nuget/packages/microsoft.netcore.app.runtime.linux-x64/5.0.0/tools/crossgen
/home/maxfire/.nuget/packages/microsoft.netcore.app.runtime.linux-musl-x64/5.0.0/tools/crossgen
/home/maxfire/.nuget/packages/microsoft.netcore.app.runtime.osx-x64/5.0.0/tools/crossgen
# copy
$ cp ~/.nuget/packages/microsoft.netcore.app.runtime.linux-x64/5.0.5/tools/crossgen \
/usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5
# check
$ find /usr/share/dotnet/ -iname 'crossgen' -type f
/usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5/crossgen
Once you have done this, perfcollect will use crossgen to include framework symbols. The warning that perfcollect used to issue should go away. This only has to be one once per machine (until you update your runtime).
NOTE: Only perf record
and lttng create session
via perfcollect collect
will
use crossgen. If using perf record
directly, use COMPlus_ZapDisable=1
.
Alternative: Turn off use of precompiled code
If you don't have the abiltiy to update the .NET Runtime (to add crossgen), there is another approach to getting framework symbols. You can tell the runtime to simply not use the precompiled framework code. The code will be Just in time compiled and the special crossgen tool is not needed.
This works, but will increase startup time for your code by something like a second or two. If you can tolerate that (you probably can), then this is an alternative.
$ COMPlus_PerfMapEnabled=1 COMPlus_ZapDisable=1 dotnet bin/release/net5.0/TraceProfilingDemo.dll
NOTE: This can also be used when debugging (sourcelinked) framework code in IDEs to force local variables in stack frames to have symbols.
Symbols: native runtime code
We use the dotnet-symbol CLI tool. First download all the symbol files for the shared runtime to a temporary folder
$ mkdir /tmp/symbols
$ dotnet symbol --symbols --output /tmp/symbols /usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5/lib*.so
Downloading from http://msdl.microsoft.com/download/symbols/
Writing files to /tmp/symbols
Writing: /tmp/symbols/libSystem.IO.Compression.Native.so.dbg
Writing: /tmp/symbols/libSystem.Native.so.dbg
Writing: /tmp/symbols/libSystem.Net.Security.Native.so.dbg
Writing: /tmp/symbols/libSystem.Security.Cryptography.Native.OpenSsl.so.dbg
Writing: /tmp/symbols/libclrjit.so.dbg
Writing: /tmp/symbols/libcoreclr.so.dbg
Writing: /tmp/symbols/libcoreclrtraceptprovider.so.dbg
Writing: /tmp/symbols/libdbgshim.so.dbg
Writing: /tmp/symbols/libhostpolicy.so.dbg
Writing: /tmp/symbols/libmscordaccore.so.dbg
Writing: /tmp/symbols/libmscordbi.so.dbg
Then copy the symbols into the shared framework (aka runtime) so that native debuggers
like lldb
or native tracers like per_event
and lltng
can find them
$ find /usr/share/dotnet/ -iname 'libcoreclr.so' -type f
/usr/share/dotnet/shared/Microsoft.NETCore.App/3.1.13/libcoreclr.so
/usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5/libcoreclr.so
$ sudo cp /tmp/symbols/* /usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5
# Checking copy
$ find /usr/share/dotnet/ -iname 'libcoreclr.*' -type f
/usr/share/dotnet/shared/Microsoft.NETCore.App/3.1.13/libcoreclr.so
/usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5/libcoreclr.so.dbg
/usr/share/dotnet/shared/Microsoft.NETCore.App/5.0.5/libcoreclr.so
TODO: Write a script to automate this for every new runtime version, and put it in ~/bin/dotnet-dbg-symbols
.
Should probably also download and copy crossgen.
Sampling and Tracing with perfcollect
perfcollect
uses perf_event
and lttng
simultanously.
terminal 1
# COMPlus_PerfMapEnabled tells the runtime to emit information
# that enables perf_event to resolve JIT-compiled code symbols.
$ COMPlus_PerfMapEnabled=1 dotnet bin/release/net5.0/TraceProfilingDemo.dll
terminal 2
$ sudo ~/bin/perfcollect collect mysession -nolttng -pid 207180
To view it
$ perfcollect view mysession.trace.zip
# or
$ perfcollect view mysession.trace.zip --viewer lltng
Check the log
$ unzip mysession.trace.zip
$ bat mysession.trace/perfcollect.log
Also check that perf.data.text
is non-empty. This is the file that PerfView will
open and parse, after transfering the *.trace.zip file to Windows.
This is content of the unzipped *.trace.zip file created by perfcollect
$ ll
Permissions Size User Date Modified Name
drwxr-xr-x - maxfire 10 Apr 01:05 debuginfo
.rw------- 1.2M maxfire 10 Apr 01:05 jit-259356.dump
.rw-r--r-- 1.9k maxfire 10 Apr 01:05 Microsoft.Win32.Primitives.ni.{a8cc8fd0-e844-4e1b-9a7b-1ad5ecab8ba2}.map
.rw-r--r-- 447k maxfire 10 Apr 01:05 perf-259356.map
.rw------- 4.2M maxfire 10 Apr 01:05 perf-jit.data
.rw------- 3.3M maxfire 10 Apr 01:05 perf.data
.rw-r--r-- 45M maxfire 10 Apr 01:05 perf.data.txt
.rw-r--r-- 15k maxfire 10 Apr 01:05 perfcollect.log
.rw-r--r-- 1.4k maxfire 10 Apr 01:05 perfinfo-259356.map
.rw-r--r-- 127k maxfire 10 Apr 01:05 System.Collections.ni.{598cee39-c64b-4542-b9ed-359eca21048c}.map
.rw-r--r-- 41k maxfire 10 Apr 01:05 System.Console.ni.{81a94ac1-409c-47a6-bcbc-765f68c9ec2b}.map
.rw-r--r-- 4.4M maxfire 10 Apr 01:05 System.Private.CoreLib.ni.{510c7690-83fe-4de5-8a2e-fdf79f2ec929}.map
.rw-r--r-- 13k maxfire 10 Apr 01:05 System.Runtime.InteropServices.ni.{caea2a6a-873b-493f-85ff-ad31bb0ee993}.map
.rw-r--r-- 0 maxfire 10 Apr 01:05 System.Runtime.ni.{1b676a8e-57d8-40e6-bce4-80d063a96828}.map
.rw-r--r-- 0 maxfire 10 Apr 01:05 System.Text.Encoding.Extensions.ni.{07860c2a-14e0-43bd-90a1-8fefaa54f9c2}.map
.rw-r--r-- 16k maxfire 10 Apr 01:05 System.Threading.ni.{a5d41e01-ef15-40da-8106-60ee513d891a}.map
Tracing with lttng
COMPlus_EnableEventLog