Originally posted on: http://geekswithblogs.net/akraus1/archive/2015/07/31/166002.aspx

It has been a long wait to finally get the new version of the new Windows Performance Toolkit. I had not much time to test the betas since until now managed call stack resolution was not working.

WPT for Windows 10 makes some nice progress which you can download here. Click on the Download Standalone SDK and run it. Then uncheck everything except Windows Performance Toolkit and

it will install on your machine within minutes. If you download the complete Windows 10 SDK you are not downloading 170MB but several GB.

Management Summary

Graphs are more readable.
Graphs with short spikes are much better rendered without the need to zoom into to see them.
Colors for graphs are changeable (finally).
Filtering can be undone now in the context menu.
Quick Search in all tables.
Symbol load dialog was improved.
You can configure your own stack tag files in Trace Properties.
No new Graph Types (at least with the default providers I enable).
5 Added columns to CPU Usage Sampled
- Two could be extremely useful: Source File Name and Source Line Number
12 Added columns to CPU Usage Precise.
- Two are to user mode developers useful: ProcessOutOfMemory and CPU Usage (in view).
Occasional crashes are also featured now.

CPU Usage Sampled

The biggest improvement is certainly the possibility to see source file and line numbers. You can find out this way where most CPU is spent on which line if the CPU consumption happens in your code.

A quick recap for the newbies: The CPU Usage Sampled graph is generated by taking the stack trace of all running threads 1000 times per second (this is the default). These call stacks are then added together.

A count in your Main method of 1000 means that the method has used one second of CPU time.

If the call stack ends not in your e.g. Main method when the stack trace is taken then your source file will not be displayed there. As usual you have to take a sharp look at the numbers. Your method

may be causing a high CPU consumption but it might never show up with a source and line number because the stack always ends in an external library call (printf in my case) for which I did not have the pdb loaded.

It would be nice to have file and line numbers for managed code as well but this feature is of limited use as it is now.

Ideally I want to see this stuff while I am drilling into a call stack all my way down and not only if the call stack ends in a method which I did compile.

For Reference here is the complete list of columns you can choose from. Green are unchanged columns, Bold are new ones. I never have found out how to give a thread a name in WPT.

If anyone knows more about the mystical thread names and how they can be set I would be very interested in it.

WPT 8.1	WPT 10
% Weight	% Weight
Address	Address
All Count	All Count
Annotation	Annotation
Compiler Optimization	Compiler Optimization
Count	Count
CPU	CPU
Display Name	Display Name
DPC/ISR	DPC/ISR
Function	Function
	Image RVA
Inlined Functions	Inlined Functions
Is PGO'ed	Is PGO'ed
Module	Module
PGO Counts	PGO Counts
PGO Dynamic Instruction Count	PGO Dynamic Instruction Count
Priority	Priority
Process	Process
Process Name	Process Name
Rank	Rank
	Section Name
	Section RVA
	Source File Name
	Source Line Number
Stack	Stack
Table	Table
Thread Activity Tag	Thread Activity Tag
Thread ID	Thread ID
Thread Name	Thread Name
Thread Start Function	Thread Start Function
Thread Start Module	Thread Start Module
TimeStamp	TimeStamp
Trace #	Trace #
Weight	Weight
Weight (in view)	Weight (in view)

CPU Usage Precise

This is by far the most complex table. With WPT 10 we have 66 columns to choose from to find out how our threads interact with each other.

I am not sure how WPA detects that a process was out of memory but I think the kernel knows quite well if something did go wrong. The ProcessOutOfMemory column could be handy

to check stress tests when the machine was under heavy load which could explain subsequent application failures.

CPU Usage (in view) seems to be an attempt to display a more true thread running time. I am not sure how this value is calculated but it seems that if all cores are in use and you

get significant Ready times then it differs quite a lot. Otherwise the differences are marginal.

There have been quite some renaming of columns which might render your custom filter and enablement conditions useless and WPA will show you an error.

If your old preset did enable by default all processes except the Idle process which clutters up the view it was set to:

([Series Name]:="NewProcess" AND NOT ([NewProcess]:="Idle (0)"))

With WPT 10 you need to change it to:

([Series Name]:="New Process" AND NOT ([New Process]:="Idle (0)"))

This happens to all users of WPT 8.1 which have saved a custom default profile which contains this enablement rule.

With an empty enablement rule your view will look like

and with the rule

you will see no difference when you apply it. Why? Because this rule only enables matching stuff but it does not disable the non matching

processes! I am with you that this is very counter intuitive but it is the way it is. If you save this as new default profile and then open a new file

you will no longer get a view where the Idle process is enabled by default.

For reference I list here all available columns with the old and new names so you have an easier time to update your custom filters, enablement

and expansion rules.

Green lines are unchanged, yellow are renamed columns and bold are new columns in WPT 10.

WPT 8.1	WPT 10
% CPU Usage	% CPU Usage
	AdjustIncrement
	AdjustReason
Annotation	Annotation
Count	Count
Count:Waits	Count:Waits
Cpu	Cpu
CPU Usage (ms)	CPU Usage
	CPU Usage (in view)
	DirectSwitchAttempt
IdealCpu	Ideal Cpu
	KernelStackNotResident
LastSwitchOutTime (s)	Last Switch-Out Time
NewInPri	New In Pri
NewOutPri	New Out Pri
NewPrevOutPri	New Prev Out Pri
NewPrevState	New Prev State
NewPrevWaitMode	New Prev Wait Mode
NewPrevWaitReason	New Prev Wait Reason
NewPriDecr	New Pri Decr
NewProcess	New Process
NewProcess Name	New Process Name
NewQnt	New Qnt
NewState	New State
NewInSwitchTime (us)	New Switch-In Time
NewThreadId	New Thread Id
NewThreadStack	New Thread Stack
NewThreadStartFunction	New Thread Start Function
NewThreadStartModule	New Thread Start Module
NewWaitMode	New Wait Mode
NewWaitReason	New Wait Reason
	NewThreadPrevRemainingQuantum
	NewThreadRank
	NewThreadRemainingQuantum
NextSwitchOutTime (s)	Next Switch-Out Time
OldOutPri	Old Out Pri
OldProcess	Old Process
OldProcess Name	Old Process Name
OldQnt	Old Qnt
OldState	Old State
OldInSwitchTime (us)	Old Switch-In Time
OldThreadId	Old Thread Id
OldThreadStartFunction	Old Thread Start Function
OldThreadStartModule	Old Thread Start Module
OldWaitMode	Old Wait Mode
OldWaitReason	Old Wait Reason
	OldThreadRank
	OldThreadRemainingQuantum
PrevCState	Prev CState
	ProcessOutOfMemory
Ready (us)	Ready
ReadyThreadStack	Ready Thread Stack
ReadyTime (s)	Ready Time
ReadyingProcess	Readying Process
ReadyingProcess Name	Readying Process Name
ReadyingThreadId	Readying Thread Id
ReadyingThreadStartFunction	Readying Thread Start Function
ReadyingThreadStartModule	Readying Thread Start Module
	ReadyThreadInDPC
SwitchInTime (s)	Switch-In Time
Table	Table
Thread Activity Tag	Thread Activity Tag
Thread Name	Thread Name
TimeSinceLast (us)	Time Since Last
Trace #	Trace #
Waits (us)	Waits

File IO View

The File IO view has now an additional column named File Path which acts like in Path Tree in Disc IO which is extremely useful if you do differential IO analysis. It seems that my

complaints about the not complete feature parity between Disc and File view have helped a bit.

This view has still a bug. If I zoom into it all graphs are disabled and I have to reenable them after every zoom operation. If I zoom out all is ok.

Generic Events

Besides some minor changes the biggest change is that finally it has got a DateTime (local) column. It shows the clock time and not the time since trace start.

This can be very useful for advanced scenarios when someone did report an issue and he did care to write down the time when it did happen.

Process View

~~I did not know it until it was removed. Process Tree is no longer with us. It was not really useful anyway.~~

Process Tree was added to WPT. That was the reason I never saw it before. After using it for some time I find it is easier to navigate in the tree if

you have a parent process which creates many childs.

Disc IO View

The Disc IO view has got only one additional column with the name Boosted which is for my disc IO always false. Not sure if this is a Windows 10 only feature. If someone knows more about it please comment.

Windows Performance Recorder

The recorder has not changed much. It has got some additional profiles like ones for .NET, Edge Browser, … The buffer size for .NET Activity is hard coded

to a 160 MB ring buffer which might not be enough for machines were many .NET apps are running. I still do not buy into this easy to use UI since the xml configs

need hard coding the values into it. It can be a hard value or a percentage of the system memory. Both choices are bad if I want to automate performance regression testing with

very specific workloads and tracing requirements. This GUI and the command line tool (wpr.exe) are just too inflexible.

Yes I know the complete xml has inheritance built into it so I could create derived profiles which override only the specific settings I care about. But this is still not dynamic enough

if I want to create a configuration for each problem and set them depending on the use case at hand.

xperf is still a good tool to stop a profiling session because it does a full CLR rundown.
But PerfView is much faster at creating the managed pdbs by not trying to recreate them over and over again with NGen.
Tracelog from the Windows SDK is still the only tool to start a trace session where you can configure the Win 8.1 enhancement to the ETW infrastructure.
- E.g. filter events from a specific process, or disable events or stackwalking for user mode providers.

Summary

The new WPA tool has got some nice new features but they still need some polishing to become truly awesome. In my daily work I miss the automatic addition

of the etl.NGenPdb folder when present for the generated pdbs of an etl file. This is a useful convention and the old WPA was so nice to automatically include it to my symbol path.

WPA is very sophisticated. To use it right is an art which needs nearly daily practice. That could explain why this tool although it has a great UI has still got not so

much traction. It is like Windbg in disguise which will quickly cause information overload to the untrained brain.

But with some basic tricks which I will show in the near future you can crack otherwise unsolvable performance nuts with a few mouse clicks.

Windows 10 SDK With New Windows Performance Toolkit Available!

Management Summary

CPU Usage Sampled

CPU Usage Precise

File IO View

Generic Events

Process View

Disc IO View

Windows Performance Recorder

Summary

Trending Articles

Principal’s past includes domestic violence case

Download: Dismanto Ft Rich Bizzy – Bwete (Prod by: Dismanto)

Daru and Sharab Status for Sharabi Friends in Hindi, Punjabi

Pasulong o Paurong? (Col. 2:1-7)

FIFA 15 PPSSPP Android Download

Tigers to Lions: San Beda names Kungfu Reyes as Lady Red Spikers head coach...

Nottingham businessman jailed for three years for crimes...

NOTES ZA GENERAL CHEMISTRY ZA NGAIZA

Huzurabad Municipality into 30 wards

Lady Gaga – MAYHEM (2025) [FLAC 24bit/44,1kHz]

PURPLE RANGE LIVE AT GAL AMUNA 2013

Moondru Mudichu 27-05-2016 – Polimer tv Serial

XAMJYSS VPN APP | Powered by XAMJYSSVPN | Sun TU CTC FLIP | GTM FB IG |

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Practice Sheet of Right form of verbs for HSC Students

Lirik Lagu Rohani Glory Haleluya - Yochen Amos

Krishna Kanta Handique State Open University Latest Exam Result 2016...

East Godavari District Police Officers Mobile Numbers

Download EFF Song –“Azania”, led by Mbuyiseni Dlozi

Arrow Flash 2 Sinhala Teledrama – Last Episode 33 – 24th April 2016