Quantcast
Channel: Geeks with Blogs
Viewing all articles
Browse latest Browse all 682

Windows 10 SDK With New Windows Performance Toolkit Available!

$
0
0

Originally posted on: http://geekswithblogs.net/akraus1/archive/2015/07/31/166002.aspx

It has been a long wait to finally get the new version of the new Windows Performance Toolkit. I had not much time to test the betas since until now managed call stack resolution was not working.

WPT for Windows 10 makes some nice progress which you can download here. Click on the Download Standalone SDK and run it. Then uncheck everything except Windows Performance Toolkit and

it will install on your machine within minutes. If you download the complete Windows 10 SDK you are not downloading 170MB but several GB.

 

Management Summary

  • Graphs are more readable.
  • Graphs with short spikes are much better rendered without the need to zoom into to see them.
  • Colors for graphs are changeable (finally).
  • Filtering can be undone now in the context menu.
  • Quick Search in all tables.
  • Symbol load dialog was improved.
  • You can configure your own stack tag files in Trace Properties.
  • No new Graph Types (at least with the default providers I enable).
  • 5 Added columns to CPU Usage Sampled
    • Two could be extremely useful: Source File Name and Source Line Number
  • 12 Added columns to CPU Usage Precise.
    • Two are to user mode developers useful: ProcessOutOfMemory and CPU Usage (in view).
  • Occasional crashes are also featured now.

CPU Usage Sampled

image

The biggest improvement is certainly the possibility to see source file and line numbers. You can find out this way where most CPU is spent on which line if the CPU consumption happens in your code.

A quick recap for the newbies: The CPU Usage Sampled graph is generated by taking the stack trace of all running threads 1000 times per second (this is the default). These call stacks are then added together.

A count in your Main method of 1000 means that the method has used one second of CPU time.

If the call stack ends not in your e.g. Main method when the stack trace is taken then your source file will not be displayed there. As usual you have to take a sharp look at the numbers. Your method

may be causing a high CPU consumption but it might never show up with a source and line number because the stack always ends in an external library call (printf in my case) for which I did not have the pdb loaded.

It would be nice to have file and line numbers for managed code as well but this feature is of limited use as it is now.

Ideally I want to see this stuff while I am drilling into a call stack all my way down and not only if the call stack ends in a method which I did compile.

 

For Reference here is the complete list of columns you can choose from. Green are unchanged columns, Bold are new ones. I never have found out how to give a thread a name in WPT.

If anyone knows more about the mystical thread names and how they can be set I would be very interested in it.

WPT 8.1WPT 10
% Weight% Weight
AddressAddress
All CountAll Count
AnnotationAnnotation
Compiler OptimizationCompiler Optimization
CountCount
CPUCPU
Display NameDisplay Name
DPC/ISRDPC/ISR
FunctionFunction
 Image RVA
Inlined FunctionsInlined Functions
Is PGO'edIs PGO'ed
ModuleModule
PGO CountsPGO Counts
PGO Dynamic Instruction CountPGO Dynamic Instruction Count
PriorityPriority
ProcessProcess
Process NameProcess Name
RankRank
 Section Name
 Section RVA
 Source File Name
 Source Line Number
StackStack
TableTable
Thread Activity TagThread Activity Tag
Thread IDThread ID
Thread NameThread Name
Thread Start FunctionThread Start Function
Thread Start ModuleThread Start Module
TimeStampTimeStamp
Trace #Trace #
WeightWeight
Weight (in view)Weight (in view)

 

CPU Usage Precise

This is by far the most complex table. With WPT 10 we have 66 columns to choose from to find out how our threads interact with each other.

I am not sure how WPA detects that a process was out of memory but I think the kernel knows quite well if something did go wrong. The ProcessOutOfMemory column could be handy

to check stress tests when the machine was under heavy load which could explain subsequent application failures.

CPU Usage (in view) seems to be an attempt to display a more true thread running time. I am not sure how this value is calculated but it seems that if all cores are in use and you

get significant Ready times then it differs quite a lot. Otherwise the differences are marginal.

 

There have been quite some renaming of columns which might render your custom filter and enablement conditions useless and WPA will show you an error.

image

If your old preset did enable by default all processes except the Idle process which clutters up the view it was set to:

([Series Name]:="NewProcess" AND NOT ([NewProcess]:="Idle (0)"))

With WPT 10 you need to change it to:

([Series Name]:="New Process" AND NOT ([New Process]:="Idle (0)"))

This happens to all users of WPT 8.1 which have saved a custom default profile which contains this enablement rule.

With an empty enablement rule your view will look like

image

and with the rule

image

you will see no difference when you apply it. Why? Because this rule only enables matching stuff but it does not disable the non matching

processes! I am with you that this is very counter intuitive but it is the way it is. If you save this as new default profile and then open a new file

you will no longer get a view where the Idle process is enabled by default.

For reference I list here all available columns with the old and new names so you have an easier time to update your custom filters, enablement

and expansion rules.

Green lines are unchanged, yellow are renamed columns and bold are new columns in WPT 10.

WPT 8.1WPT 10
% CPU Usage% CPU Usage
 AdjustIncrement
 AdjustReason
AnnotationAnnotation
CountCount
Count:WaitsCount:Waits
CpuCpu
CPU Usage (ms)CPU Usage
 CPU Usage (in view)
 DirectSwitchAttempt
IdealCpuIdeal Cpu
 KernelStackNotResident
LastSwitchOutTime (s)Last Switch-Out Time
NewInPriNew In Pri
NewOutPriNew Out Pri
NewPrevOutPriNew Prev Out Pri
NewPrevStateNew Prev State
NewPrevWaitModeNew Prev Wait Mode
NewPrevWaitReasonNew Prev Wait Reason
NewPriDecrNew Pri Decr
NewProcessNew Process
NewProcess NameNew Process Name
NewQntNew Qnt
NewStateNew State
NewInSwitchTime (us)New Switch-In Time
NewThreadIdNew Thread Id
NewThreadStackNew Thread Stack
NewThreadStartFunctionNew Thread Start Function
NewThreadStartModuleNew Thread Start Module
NewWaitModeNew Wait Mode
NewWaitReasonNew Wait Reason
 NewThreadPrevRemainingQuantum
 NewThreadRank
 NewThreadRemainingQuantum
NextSwitchOutTime (s)Next Switch-Out Time
OldOutPriOld Out Pri
OldProcessOld Process
OldProcess NameOld Process Name
OldQntOld Qnt
OldStateOld State
OldInSwitchTime (us)Old Switch-In Time
OldThreadIdOld Thread Id
OldThreadStartFunctionOld Thread Start Function
OldThreadStartModuleOld Thread Start Module
OldWaitModeOld Wait Mode
OldWaitReasonOld Wait Reason
 OldThreadRank
 OldThreadRemainingQuantum
PrevCStatePrev CState
 ProcessOutOfMemory
Ready (us)Ready
ReadyThreadStackReady Thread Stack
ReadyTime (s)Ready Time
ReadyingProcessReadying Process
ReadyingProcess NameReadying Process Name
ReadyingThreadIdReadying Thread Id
ReadyingThreadStartFunctionReadying Thread Start Function
ReadyingThreadStartModuleReadying Thread Start Module
 ReadyThreadInDPC
SwitchInTime (s)Switch-In Time
TableTable
Thread Activity TagThread Activity Tag
Thread NameThread Name
TimeSinceLast (us)Time Since Last
Trace #Trace #
Waits (us)Waits

File IO View

The File IO view has now an additional column named File Path which acts like in Path Tree in Disc IO which is extremely useful if you do differential IO analysis. It seems that my

complaints about the not complete feature parity between Disc and File view have helped a bit.

image

This view has still a bug. If I zoom into it all graphs are disabled and I have to reenable them after every zoom operation. If I zoom out all is ok.

Generic Events

Besides some minor changes the biggest change is that finally it has got a DateTime (local) column. It shows the clock time and not the time since trace start.

This can be very useful for advanced scenarios when someone did report an issue and he did care to write down the time when it did happen.

image

Process View

I did not know it until it was removed. Process Tree is no longer with us. It was not really useful anyway.

Process Tree was added to WPT. That was the reason I never saw it before. After using it for some time I find it is easier to navigate in the tree if

you have a parent process which creates many childs.

 

Disc IO View

The Disc IO view has got only one additional column with the name Boosted which is for my disc IO always false. Not sure if this is a Windows 10 only feature. If someone knows more about it please comment.

 

Windows Performance Recorder

The recorder has not changed much. It has got some additional profiles like ones for .NET, Edge Browser, … The buffer size for .NET Activity is hard coded

to a 160 MB ring buffer which might not be enough for machines were many .NET apps are running. I still do not buy into this easy to use UI since the xml configs

need hard coding the values into it. It can be a hard value or a percentage of the system memory. Both choices are bad if I want to automate performance regression testing with

very specific workloads and tracing requirements. This GUI and the command line tool (wpr.exe) are just too inflexible.

Yes I know the complete xml has inheritance built into it so I could create derived profiles which override only the specific settings I care about. But this is still not dynamic enough

if I want to create a configuration for each problem and set them depending on the use case at hand.

image

  • xperf is still a good tool to stop a profiling session because it does a full CLR rundown.
  • But PerfView is much faster at creating the managed pdbs by not trying to recreate them over and over again with NGen.
  • Tracelog from the Windows SDK is still the only tool to start a trace session where you can configure the Win 8.1 enhancement to the ETW infrastructure.
    • E.g. filter events from a specific process, or disable events or stackwalking for user mode providers.

Summary

The new WPA tool has got some nice new features but they still need some polishing to become truly awesome. In my daily work I miss the automatic addition

of the etl.NGenPdb folder when present for the generated pdbs of an etl file. This is a useful convention and the old WPA was so nice to automatically include it to my symbol path.

WPA is very sophisticated. To use it right is an art which needs nearly daily practice. That could explain why this tool although it has a great UI has still got not so

much traction. It is like Windbg in disguise which will quickly cause information overload to the untrained brain.

But with some basic tricks which I will show in the near future you can crack otherwise unsolvable performance nuts with a few mouse clicks.


Viewing all articles
Browse latest Browse all 682

Trending Articles