Originally posted on: http://geekswithblogs.net/akraus1/archive/2013/11/14/154624.aspx
I constantly try out new and old tools to check out if they can be useful in my daily work. Did you ever wonder where Windows does put the file contents of your just read 200MB file in memory and why reading the same file is almost always faster? The answer to both question is the file system cache. When you look at the performance counters you will find quite a lot cache related counters but when you look at the values they are typically only a few hundred MB but the task manager shows very different values.
In theory on my Windows 7 machine I should get the value from Task Manager as “System Cache Resident Bytes” of the Memory section. The description sounds good:
“System Cache Resident Bytes is the size, in bytes, of the pageable operating system code in the file system cache. This value includes only current physical pages and does not include any virtual memory pages not currently resident. It does equal the System Cache value shown in Task Manager. As a result, this value may be smaller than the actual amount of virtual memory in use by the file system cache. This value is a component of Memory\\System Code Resident Bytes which represents all pageable operating system code that is currently in physical memory. This counter displays the last observed value only; it is not an average.”
The bold statement about equality is indeed a very bold one. On my Windows 7 x32 machine Task Manager reports as Cached 504 MB whereas the performance counter only gives 185 MB. There seems to be quite a bit of confusion what should be counted as cache and what not. This is an old description from Windows XP times which did have in the task manager a “System Cache” value.
So far I have not found a really good explanation what exactly the task manager is showing as Cached value.
But there are better tools around like Resource Monitor (perfmon /res) which has more to offer. It shows tooltips at unexpected places like value descriptions in the Memory Tab:
Now we can explain what Task Manager is actually showing there. Standby and Modifed are the names of Windows kernel structures which are used by the Windows Memory Manager.
The tool RamMap gives you all the details about the current state of the Memory Manager and its priority lists for different types of memory. More details can be found on the web what all these different types of memory mean and how the Memory Manager does organize its stuff.
Name | Explanation | Formula |
Total | Installed Memory in your computer | Total = Active List + Standby List + Modified List + Transition List + Zeroed List + Free List |
Cached | Memory used by File System Cache | Cached = Standby List + Modified List |
Available | Can be allocated as private memory by other applications | Available = Standby List + Free List |
Free | Unused Memory | Free = Zeroed List + Free List |
Why should I care about all that stuff? Because if you are reading this you are generally interested how things work. Besides of curiosity you can now answer questions like
- Why needs a performance regression test two warmup runs up before the numbers become stable?
- What files did my colleague have open but deleted them?
Lets start with the most disturbing question. I can have a look at all files you did have open at your computer if I have direct access with Admin rights. Open RamMap and check out the File Details tab. There are all files listed which are currently completely or partially stored in the file system cache in the one or other way.
With livekd from SysInternals I can also view the contents of the cached files
0: kd> !db 7e3ee000 | C:\Windows\system32>type c:\Windows\win.ini |
Yes the cache is really working and I can view its contents which is quite interesting and disturbing.
Now to question one. Why do some people claim that they need two warmup runs before the numbers become stable? A cache is a cache. If you have a test which does use nearly all of your physical memory (>75%) the cache manager of Windows needs to throw out the least used cached files to make room for the memory hungry application.
The first run of your test loads the data from disc and into the cache which does work well until the memory footprint of your application becomes so big that the Cache Manager needs to evict pages from the cached files. During a second run of the test the same things happen until the cache manager has learned which usage pattern you do follow and tries to adjust its internal policies. Since I do not want to learn every detail of the cache manager it is much easier to switch to a machine with much more RAM than needed by the application to run mostly everything from the cache during the second run.
The Windows task manager does lure millions of users into the wrong safety that you have plenty of free RAM and you can consume a few more GB of it without negative effects. Especially Notebook users with 64 bit Windows, 4 GB of RAM and a slow hard disc do experience long pauses during their routine work. A colleague of mine did ask me if I could investigate why his notebook was so slow. After I have seen his memory configuration I could tell him that he needed simply more RAM to make his work much smoother. Slow notebook hard discs do show you very clearly that the cache is there for a reason and when you do turn it off by using all of the memory application data you will experience many hard page faults which result in a sluggish user experience.
But also developers are not immune to such traps. You want to make your compile times faster by exchanging your hard disc with a solid state disc? Sure you can but how about getting more memory first? If you do get a few GB more RAM it can make a huge difference to your compile times if most of your stuff is stored in the file system cache. That does also explain why some tests with a RAM disc did show no improvements of compile times when we did copy all of the data onto a RAM disc. If you have enough RAM you do not need a RAM disc (most of the time)! At my work machine with 16 GB of RAM I do see that the file system cache begins shrinking already when I do hit 12 GB private bytes count.
What can you learn of this? Be nice to your cache and keep at least 1 GB of free memory around to be sure that the cache can work as a cache. Today's hard discs are still 100 000 times slower than memory which is a good reason to keep your cache big.
If you want to check for performance regression tests if your cache is filled you could execute
rammap C:\temp\ram.xml
to get a nice xml file which shows which files are being in the cach now. The cached files are stored as xml nodes of the form
<File Key="3114984784" Path="C:\windows\system32\aelupsvc.dll"/>
which can be easily parsed. Then you can filter e.g. for your application path and the Windows directory where most dlls during startup (especially NGen images for managed code) are loaded. Once you have set a cache baseline you can verify during subsequent tests that the cache still looks similar compared to your baseline run. If not you can skip the test run or try another warmup run to get consistent values for your performance regression tests. This can give you some hints why a performance regression test on that machine did run so slow compared to the others. You do not need to debug but can immediately prove that the test did most likely experience much more hard faults during its test execution because of a different cache state.
Recently I have talked to a MS program manager about performance tests and I was surprised that their test strategy was quite the opposite. They flush the file system cache before doing the test which always profiles the worst (cold startup) scenario which also leads to consistent numbers. He claimed that when you optimize cold startup warm startup will also benefit from it. That may be true to a certain extent but since warm startup is about a factor two faster you will experience concurrency especially during startup at places where none has been before and a much higher CPU utilization. This can shift the places where you should invest more time to optimize quite a lot if you are not careful.