Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/12/14/154884.aspx
When you try to make sense of Windows Event Tracing files (.etl) it is often helpful to add some special events into the event stream so you can closer examine near the “Here it was slow” event which helps a lot to look at the right spots. If you are lucky you can do this by routing your tracing data to ETW or you need to set some marker events by hand. The traditional was of doing this was to call xperf –m “Here it was slow” during your repro session. I have observed several times since then that this command did no longer work on some machines.
C:\Program Files (x86)\Windows Kits\8.1\bin\x64> xperf -m "hi"
xperf: error: Set Mark failed
These things did never happen on my Windows 7 machine. I thought it was some oversight in Windows 8. Interestingly if I started the tracing session with xperf all worked well again. But when I did start a tracing session with the new WPR/UI tool from the Windows 8.1 SDK this error was consistently there. After digging a little deeper with xperf –Loggers I found that WPR/UI do not create the usual "NT Kernel Logger" session but instead a WPR_initiated_WprApp_WPR System Collector session for which I cannot see the trace flags anymore.
C:\Windows\system32>xperf -Loggers
Logger Name : WPR_initiated_WprApp_WPR System Collector
Logger Id : 11
Logger Thread Id : 0000000000000000
Buffer Size : 1024
Maximum Buffers : 1632
Minimum Buffers : 1632
Number of Buffers : 1632
Free Buffers : 1624
Buffers Written : 0
Events Lost : 0
Log Buffers Lost : 0
Real Time Buffers Lost: 0
Flush Timer : 0
Age Limit : 0
Log File Mode : Secure Buffered PersistOnHybridShutdown SystemLogger
Maximum File Size : 0
Log Filename :
Trace Flags :
PoolTagFilter : *
That does explain why I cannot add mark events anymore on Windows 8 machines. There is no more "NT Kernel Logger" session to set the mark event.This is bad news for xperf –m but good news for all people who like to keep Process Explorer and other tools open while they are taking kernel trace events. It seems that xperf was not updated to display the tracing flags for the new type of kernel tracing sessions because another tool from the Windows 8.1 SDK can display them:tracelog
C:\Program Files (x86)\Windows Kits\8.1\bin\x64>tracelog.exe -l
Logger Name: WPR_initiated_WprApp_WPR System Collector
Logger Id: 0x11
Logger Thread Id: 0000000000000000
Guid: c7f66f95-6045-11e3-8265-d43d7ef06eb2
Session Security: D:(A;;0x800;;;WD)(A;;0x120fff;;;SY)(A;;0x120fff;;;LS)(A;;0x120fff;;;NS)(A;;0x120fff;;;BA)(A;;0xee5;;;LU)(A;;LC;;;MU)(A;;
Buffer Size: 1024 Kb
Maximum Buffers: 1632
Minimum Buffers: 1632
Number of Buffers: 1632
Free Buffers: 1624
Buffers Written: 0
Events Lost: 0
Log Buffers Lost: 0
Real Time Buffers Lost: 0
AgeLimit: 0
Real Time Consumers: 0
ClockType: PerfCounter
Log Mode: Secure Buffering-only
Hybrid Shutdown: Persist
Maximum File Size: not set
Buffer Flush Timer: not set
Enabled tracing: Process Thread ProcCounters ImageLoad DiskIo HardFaults CxtSwap Dpc Isr Profile Power MemInfo MemInfoWs Priority Dispatcher…
Stack Tracing: DiskRead DiskWrite DiskFlush ThreadDCEnd CSwitch ReadyThread KernelQueueEnqueue KernelQueueDequeue Profile
Log Filename:
When you look at the documentation of tracelog you will find many Windows 8 and Windows 8.1 additions which make it a cool tool to record trace sessions. Although I must admit that the sheer number of tools to start traces is getting out of hand. We have
ETW Control Tool | Part Of |
WPR/UI | Windows Performance Toolkit |
xperf | Windows Performance Toolkit |
logman | Windows |
tracelog | Since Windows 8.1 SDK formerly only part of WDK. |
PerfView | Extra Download from MS |
Each tool has another syntax and philosophy behind which makes it a bit cumbersome to use the right tool. But as far as I can tell you should check out tracelog first since it is the only tool which is able to make use of the new cool Windows 8/8.1 enhancements. Below is a list of the new ETW capabilities of Windows 8/8.1
Since | Tracelog Feature | Description |
8.1 | -btn | Specifies the number (n) of buffers to fill before starting to flush them. |
8.1 | -EventIdFilter {-in|-out}nid1 id2 ... | Specifies an event id filter with n event ids (maximum 64 event ids allowed). |
8.1 | -ExeFilter Executable_file [;Executable_file ...] | Specifies the names of executable files to filter. You can specify a list of files. Separate the names of the files using semi-colons. Files not listed are excluded. |
8 | -hybridshutdown {stop|persist} | Controls hybrid shutdown logger behavior. |
8.1 | -independent | Enables independent mode on the trace session. |
8.1 | -PidFilter npid1 pid2 ... | Specifies a Pid filter with n Pids (maximum 8 allowed). |
8.1 | -PkgIdFilter Package Full Name [ ;Package Full Name...] | Specifies a package ID filter. You can specify a list of package files. Separate the names of the files using semi-colons. |
8.1 | -PkgAppIdFilter PRAID [ ;PRAID...] | Specifies a package-relative app identifier (PRAID) filter. The PRAID is the unique identifier of the application within the package. You can specify more than one PRAID. Separate the ids using semi-colons. |
8 | -Pmc Ctrs:Events | Configures the performance monitor counter (PMC) sampling on events. |
8 | -ProfileSource src | Configure profiling source to use. For list of sources, use the command tracelog -ProfileSource Help. |
8 | -SetProfInt nsrc | Configure the profiling interval (n) for specified source. Where n represents units of 100ns. The default is 10000 (which is equivalent to 1ms. |
8.1 | -StackWalkFilter {-in|-out}nid1 id2 ... | Specifies an event id filter with n event ids (maximum 64 event ids allowed). |
8 | -systemlogger | Logger can receive SystemTraceProvider events. See Configuring and Starting a SystemTraceProvider Session. |
By default tracelog will start loggers with kernel mode providers with the systemlogger option so you can start up to 8 additional Kernel Loggers which should be enough for everyone. To log e.g. all Context Switch events with call stacks you can use
tracelog -start MySession -f c:\Kernel1.etl -img -eflag PROC_THREAD+LOADER+CSWITCH+DISPATCHER -stackwalk CSWITCH+ReadyThread
This will create a new kernel trace session (Windows 8 and later only!) without blocking the default Kernel Logger Session which is the most useful thing of the new capabilities of the tracelogger tool. When I stop it with
tracelog -stop mySession
then no image identification rundown is performed which means that I have call stacks but WPR cannot identify in which images the stack addresses do reside. It seems also not be able to merge etl files like xperf can do. Not to mention that it does not create the manged pdbs when stopping a trace session like wpr and xperf are able to. Still tracelog is the only tool which exposes the full power of ETW as a command line tool. It has the most interesting options to start new trace sessions but it seems to (correct me I really would like to know) be the wrong tool to stop a trace session by initiating all the necessary image identification and managed pdb creation. I really would like to see additional parameters in xperf to create the managed pdbs later when I have gathered and etl file.
To limit the potentially huge output of ETW can now be filtered at least at process level for user mode providers (-PidFilter or -ExeFilter) and to enable/disable stack walking for specific events and not the complete provider (-StackWalkFilter …) which is a welcome addition for Windows 8.1 users of ETW. Based on my limited experience with the new tracelog tool I would use it to start trace sessions for sophisticated analysis tasks on Windows 8 machines but stop the trace session with xperf to get the usual image identification and ngen pdb creation support.
The command
xperf -stop MySession -d c:\temp\xperfStop.etl
did the job to stop the system logger and I got readable call stacks with a full .NET support. If you do not want to use a myriad of tools you can also use PerfView which Vance Morrison uses a lot to troubleshoot VS issues. The command line syntax is vastly different than the previous tools but it can be quite handy if you only need to deploy a 10MB exe which can collect and view the ETW data.
perfview start c:\temp\perfview.etl -nogui
will start profiling with a sample interval of 1ms and also capture all disc related stuff which is a good default to start. If you need to do some wait chain analysis you can use
perfview start c:\temp\perfview.etl -nogui -ThreadTime
which will enable an additional bunch of additional providers:
Enabled tracing: Process Thread ProcCounters ImageLoad DiskIo DiskIoInit HardFaults CxtSwap TcpIp Profile Dispatcher
Stack Tracing: DiskRead DiskWrite DiskFlush 0x0220 ProcessCreate ImageLoad ThreadCreate CSwitch ReadyThread Profile
To stop tracing you need to do
perfview -zip:false stop
You can view the generated file with
perfview c:\temp\perfview.etl
or with WPA or your favorite viewing tool. Perfview is for some managed analysis very handy. With handy I mean that it is the only tool to analyze managed processes for things like
- GC wait times
- JIT times
- Managed Allocation Tracing (type and stack)
- thrown exceptions without using a debugger, profiler or your own tracing
I did refuse to try perfview for a long time because it complained always that nearly all my stacks were broken. The issues is that it handles loading pdbs for call stack resolution and filtering quite different.
![image image]()
The first hurdle is to notice that there are always stack filters enabled (See GroupPats) which will fold your stack to nearly useless many times. After deleting the GroupPats text box I get real stacks but no symbols:
![image image]()
Now I have the stacks but still no symols. To fix that I must use the context menu to load for this call stack the corresponding symbols. The MS symbol servers are configured by default. If you have your own symbol server you need to reconfigure it in the File - Set Symbol Path menu first.
![image image]()
After that I get call stacks with method names as usual:
![image image]()
Once I did realize that I found PerfView quite useful. It is not only for performance work interesting but also as a system wide debugger which captures all thrown managed exceptions in all processes at once. The tool itself has many more options which you can explore with
Perfview /?
It supports some really advanced scenarios like collecting data in a ring buffer until a specific performance counter reaches a threshold then capture x seconds further and then stop. The main drawback of ETW with full profiling and context switch capturing is that the amount of data on a busy system exceeds 1 GB of data within one minute easily. If you do want to capture data for a really long time and stop when the interesting condition has happened you need more sophisticated stop triggers. PerfView has captured this operational know how in its command line parameters.
It can stop on
- Performance Counter Thresholds
- Application Event Log Messages
- ETW Events
- Gen 2 Collections
- Long Gen 2 Collections
- Managed exceptions of a specific type
- AppFabric request time (Azure)
This makes it a cool tool to diagnose sporadic errors but you cannot afford to let the application to run under a debugger. The biggest pain point of PerfView is that it is great to collect data but it cannot live with the generated etl file and needs to convert it to its own format (etlx) for further processing. This can take some additional minutes to load bigger etl files but I hope it will become faster in future versions.
Below is the full command line of PerfView which has gotten not enough attention (just like tracelog) so far. Tracelog can be excused by living its hidden live in the Windows Driver Development Kit until it was moved to the Windows SDK with 8.1. It seems that device driver developers do not share their tricks online like "normal" .NET developers do.
The PerfView application has a number of commands associated with it, each with its own set of parameters and
qualifiers. They are listed below. Options that are common to all commands are listed at the end.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView [DataFile]
View profile data.
Parameters:
[DataFile] ETL or ETLX file containing profile data.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView run CommandAndArgs ...
Starts data collection, runs a command and stops.
Parameters:
CommandAndArgs ... Command to run and arguments.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView collect [DataFile]
Starts data collection, wait for user input, then stops.
Parameters:
[DataFile] ETL file containing profile data.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView start [DataFile]
Starts machine wide profile data collection.
Parameters:
[DataFile] ETL file containing profile data.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView stop
Stop collecting profile data (machine wide).
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView mark [Message]
Add a PerfView 'Mark' event to the event stream with a optional string message
Parameters:
[Message] The string message to attach to the PerfView Mark event.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView abort
Insures that any active PerfView sessions are stopped.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView merge [DataFile]
Combine separate ETL files into a single ETL file (that can be decoded on another machine).
Parameters:
[DataFile] ETL file containing profile data.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView listSessions
Lists active ETW sessions.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView ListCpuCounters
Lists the ListCpuCounters CPU counters available on the system (win8+ only).
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView EnableKernelStacks
On X64 machines if you have problems with broken stacks when the code is executing in the kernel, setting this option
and rebooting may improve things
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView DisableKernelStacks
Resets the registry keys set by EnableKernelStack.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView HeapSnapshot Process [DataFile]
Take a snapshot of the CLR GC heap of a process.
Parameters:
Process The process ID or Process Name (Exe without extension) of the process take a
heap snapshot.
[DataFile] The name of the file to place the heap snapshot.
Qualifiers:
[-SaveETL] Save an ETL file along with the GCDump file when dumping the JS Heap.
[-MaxDumpCountK:250] The maximum number of objects (in K or thousands) to place int the .gcDump
file. Sample sufficiently to hit this metric.
[-Freeze] Freeze the dump while data is taken.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView ForceGC Process
Forces a GC on the specified process
Parameters:
Process The process ID or Process Name (Exe without extension) of the process to force
a GC.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView HeapSnapshotFromProcessDump ProcessDumpFile [DataFile]
Extract the CLR GC heap from a process dump file specified.
Parameters:
ProcessDumpFile The name of the input process dump file.
[DataFile] The name of the file to place the heap snapshot.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView GuiRun
Opens the 'Run' dialog box.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView GuiCollect
Opens the 'Collect' dialog box.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView GuiHeapSnapshot
Opens the 'TakeHeapSnapshot' dialog box.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView UserCommand CommandAndArgs ...
Runs a user defined command. Type 'PerfView UserCommandHelp' to see the help for all the user commands. See PerfView
Extensions in the users guide for more on creating user commands.
Parameters:
CommandAndArgs ... User command to run and any arguments.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView UserCommandHelp
Displays help for user commands. Also see Help->User Command Help in the GUI.
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView CreateExtensionProject [ExtensionName]
Creates a VS project for creates a perfView extension.
Parameters:
[ExtensionName.Global] The name of the extension (no .DLL)
-----------------------------------------------------------------------------------------------------------------------
Usage: PerfView FetchSymbolsForProcess [DataFile]
Fetch all the PDBs files needed for viewing locally.
Parameters:
[DataFile] ETL file containing profile data.
-----------------------------------------------------------------------------------------------------------------------
Qualifiers global to all commands:
[-LogFile:STRING] Send messages to this file instead launching the GUI. Intended for batch
scripts and other automation.
[-BufferSize:64] The size the buffers (in MB) the OS should use to store events waiting to be
written to disk.
[-Circular:0] Do Circular logging with a file size in MB. Zero means non-circular.
[-BufferSizeMB:64] The size the buffers (in MB) the OS should use to store events waiting to be
written to disk.
[-CircularMB:0] Do Circular logging with a file size in MB. Zero means non-circular.
[-MaxCollectSec:0] Turn off collection (and kill the program if perfView started it) after this
many seconds. Zero means no timeout.
[-StopOnPerfCounter:STRING,...] This is of the form CATEGORY:COUNTERNAME:INSTANCE OP NUM where
CATEGORY:COUNTERNAME:INSTANCE, identify a performance counter (same as
PerfMon), OP is either < or >, and NUM is a number. When that condition is
true then collection will stop. You can specify this qualifier more than once.
See 'Stop Trigger' in the users guide for more.
[-StopOnGen2GC:STRING] This will stop on any Gen2 GC from the given process (can be a process ID or a
process Name (exe file name without path or extension) or * (any process)
[-StopOnEtwEvent:STRING,...] This is of the form PROVIDER:TASK:OPCODE here only PROVIDER and TASK are
required. (Opcode=Info, Keywords=ALL, Level=INFORMATIONAL are the defaults.
Provider can be a Guid or Name (for EventSources you can use *Name to represent
the Guid. Task is the task Name or 'Task(NNN)' where NNN is the task number in
decimal. You can specify this argument more than once
[-StopOnException:STRING] Where the text is a regular expression that will be used to match the full name
of the .NET Exception thrown.The empty string represents any exception.
[-StopOnEventLogMessage:STRING] Stop when an event log message that matches the given (ignore case) regular
expression is written to the Windows 'Application' event log. You can specify
a particular event log with the syntax eventLogName@RegExp.
[-StopOnRequestOverMsec:0] Trigger a stop of a collect command if there is any IIS request that is longer
than the given number of MSec.
[-StopOnGCOverMsec:0] Trigger a stop of a collect command if there is a .NET Garbage Collection (GC)
is longer than the given number of MSec.
[-StopOnAppFabricOverMsec:0] Trigger a stop of a collect command if there is a AppFabric request is longer
than the given number of MSec.
[-DecayToZeroHours:0] The trigger value used in StopOnPerfCounter or StopOn*OverMSec will decay to
zero in this interval of time.
[-DelayAfterTriggerSec:10] Wait this number of seconds after a trigger before actually stopping the trace.
[-CollectMultiple:0] Collect Multiple instance (used in conjunction with StopTrigger).
[-MonitorPerfCounter:STRING,...] This is of the form CATEGORY:COUNTERNAME:INSTANCE@NUM where
CATEGORY:COUNTERNAME:INSTANCE, identify a performance counter (same as
PerfMon), and NUM is a number representing seconds. The @NUM part is optional
and defaults to 2. The value of the performance counter is logged to the ETL
file as an event ever NUM seconds
[-CpuSampleMSec:1] The interval (MSec) between CPU samples (.125Msec min).
[-Merge:BOOLEAN] Do a merge after stopping collection.
[-Zip:BOOLEAN] Zip the ETL file (implies /Merge).
[-NoRundown] Don't request CLR Rundown events.
[-NoNGenRundown] Don't do rundown of symbolic information in NGEN images (only needed pre V4.5).
[-NoClrRundown] Don't do rundown .NET (CLR) rundown information )(for symbolic name lookup).
[-RundownTimeout:120] Maximum number of seconds to wait for CLR rundown to complete.
[-MinRundownTime:0] Minimum number of seconds to wait for CLR rundown to complete.
[-KeepAllEvents] A debug option to keep all events, even symbolic rundown events.
[-MaxEventCount:0] Limits the total number of events. Useful for trimming large ETL files. 1M
typically yields 300-400Meg. of data considered.
[-SkipMSec:0] Skips the first N MSec of the trace. Useful for trimming large ETL files in
conjunction with the /MaxEventCount qualifier.
[-CpuCoutners:STRING,...] A comma separated list of hardware CPU counters specifications NAME:COUNT to
turn on. See Users guide for details. See ListCpuCounters for available
sources (Win8 only)
[-Providers:STRING,...] Additional providers. This is comma separated list of
ProviderGuid:Keywords:Level:Stack specs. This qualifier has the same syntax as
the Additional Providers TextBox in the collection window. See help on that
for more.
[-OnlyProviders:STRING,...] Like the Providers qualifier, but also turns off the default Kernel and CLR
providers.
[-ThreadTime] Shortcut for turning on context switch and readyThread events
[-GCOnly] Turns on JUST GC collections an allocation sampling.
[-GCCollectOnly] Turns on GC collections (no allocation sampling).
[-DumpHeap] Capture a heap snapshot on profile stop
[-ClrEventLevel:Verbose] The verbosity for CLR events Legal values: Always, Critical, Error, Warning,
Informational, Verbose.
[-ClrEvents:Default] A comma separated list of .NET CLR events to turn on. See Users guide for
details. Legal values: None, GC, GCHandle, Binder, Loader, Jit, NGen,
StartEnumeration, StopEnumeration, Security, AppDomainResourceManagement,
JitTracing, Interop, Contention, Exception, Threading,
JittedMethodILToNativeMap, OverrideAndSuppressNGenEvents, SupressNGen, Type,
GCHeapDump, GCSampledObjectAllocationHigh, GCHeapSurvivalAndMovement,
GCHeapCollect, GCHeapAndTypeNames, GCHeapSnapshot,
GCSampledObjectAllocationLow, PerfTrack, Stack, Default, All.
[-KernelEvents:Default] A comma separated list of windows OS kernel events to turn on. See Users guide
for details. Legal values: None, Process, Thread, ImageLoad, ProcessCounters,
ContextSwitch, DeferedProcedureCalls, Interrupt, SystemCall, DiskIO,
DiskFileIO, DiskIOInit, Dispatcher, MemoryPageFaults, MemoryHardFaults,
VirtualAlloc, VAMap, NetworkTCPIP, Registry, AdvancedLocalProcedureCalls,
SplitIO, Driver, OS, Profile, Default, TheadTime, FileIO, FileIOInit, Verbose,
All, ReferenceSet, PMCProfile.
[-DotNetAlloc] Turns on per-allocation .NET profiling.
[-DotNetAllocSampled] Turns on per-allocation .NET profiling, sampling types in a smart way to keep
overhead low.
[-OSHeapProcess:0] Turn on per-allocation profiling of allocation from the OS heap for the process
with the given process ID.
[-OSHeapExe:STRING] Turn on per-allocation profiling of allocation from the OS heap for the process
with the given EXE (only filename WITH extension).
[-ForceNgenRundown] By default on a V4.0 runtime NGEN rundown is suppressed, because NGEN PDB are a
less expensive way of getting symbolic information for NGEN images. This
option forces NGEN rundown, so NGEN PDBs are not needed. This can be useful in
some scenarios where NGEN PDB are not working properly.
[-AcceptEULA] Accepts the EULA associated with PerfView.
[-DataFile:STRING] FileName of the profile data to generate.
[-NoView] Normally after collecting data the data is viewed. This suppresses that.
[-UnsafePDBMatch] Allow the use of PDBs even when the trace does not contain PDB signatures.
[-ShowUnknownAddresses] Displays the hexadecimal address rather than ? when the address is unknown.
[-VS:BOOLEAN] Turn on Visual Studio Providers. It also insures that theses providers are
registered and can be decoded properly.
[-NoGui] Use the Command line version of the command (like on ARM). Brings up a console
window. For batch scripts use /LogFile instead.
[-Process:STRING] A process name (exe file name without directory or extension).
[-SafeMode] Turn off parallelism and other risky features.
[-RestartingToElevelate:STRING] Internal: indicates that perfView is restarting to get Admin privileges.
[-SessionName:PerfViewSession] Define the name for the user mode session, if kernel events are off.
[-MaxNodeCountK:0] The maximum number of objects (in K or thousands) that will even be examined
when dumping the heap. Avoids memory use at collection time. This is useful
if heap dumping causes out of memory exceptions.