Originally posted on: http://geekswithblogs.net/akraus1/archive/2014/12/14/160652.aspx
When you want fast and efficient File IO in Windows you can specify a myriad of flags to CreateFile. But which of these many flags help in which scenarios? MSDN says that some read ahead buffering is enabled if Sequential Scan is done. First a little look into the docs can help:
- FILE_FLAG_SEQUENTIAL_SCAN 0x08000000
Access is intended to be sequential from beginning to end. The system can use this as a hint to optimize file caching.
This flag should not be used if read-behind (that is, reverse scans) will be used.
This flag has no effect if the file system does not support cached I/O and FILE_FLAG_NO_BUFFERING.
For more information, see the Caching Behavior section of this topic.
Under Caching Behavior is more information:
Specifying the FILE_FLAG_SEQUENTIAL_SCAN flag can increase performance for applications that read large files using sequential access. Performance gains can be even more noticeable for applications that read large files mostly sequentially, but occasionally skip forward over small ranges of bytes. If an application moves the file pointer for random access, optimum caching performance most likely will not occur. However, correct operation is still guaranteed.
- FILE_FLAG_RANDOM_ACCESS 0x10000000
Access is intended to be random. The system can use this as a hint to optimize file caching.
This flag has no effect if the file system does not support cached I/O andFILE_FLAG_NO_BUFFERING.
For more information, see the Caching Behavior section of this topic.
The Old New Thing clears up the confusion by adding that if you specify neither flag the OS is free to use read ahead or not depending on the observed usage pattern. An interesting note is that if you specify RANDOM_ACCESS then the read pages are evicted from the file system cache with the usual LRU policy. This ensures that random reads do not pollute the file system cache. But if you read the same stuff several times shortly after the first read you still get the cached pages back. Lets try it out by reading from 1000 files the first 10KB with two 5000 byte reads for each file:
staticvoid ReadOnly(string dir) {string[] files = Directory.GetFiles(dir, "*.bin"); var sw = Stopwatch.StartNew();int bytesRead = 0;foreach(var file in files) { Parser.FlushFileBufferOfFileByOpeningItUnbuffered(file); FileStream stream = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.None, 1, FileOptions.RandomAccess);byte[] buffer = newbyte[5000]; bytesRead += stream.Read(buffer, 0 , buffer.Length); bytesRead += stream.Read(buffer, 0, buffer.Length); } sw.Stop(); Console.WriteLine("Did read {0:N0} bytes from {1} Files in {2:F2}s", bytesRead, files.Length, sw.Elapsed.TotalSeconds); }
To see the actual disc access every time we execute this code we need to delete the file system cache for this file. It turns out that you can do this quite easily by opening the file file once for unbuffered IO like this
const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;publicstaticvoid FlushFileBufferOfFileByOpeningItUnbuffered(string fileName) { var tmp = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read, 1, FileOptions.SequentialScan | FileFlagNoBuffering); tmp.Dispose(); }
FileAccess.RandomAccess
Now the file data is always read from the physical disc and the File IO layer. Since the file system sector size is usually 4KiB you will need for the first 5 KB read 8192 bytes and for the rest another 4096 bytes to successfully read 10 000 bytes. After enough data from the device has been read you can see the two actual 5000 byte reads from offset 0 and 5000 as you would expect them. From the hard disc actually 12288 bytes are read. It makes therefore from an IO perspective no difference if you read 10000 bytes or 12KiB from a disc with a 4KiB cluster size.
If you want to find the true file size requested by actual reads you need to filter for the File IO Init stacks (checked with Win 8.1)
fltmgr.sys!FltpPerformPreCallbacks
fltmgr.sys!FltpPassThrough
whereas the other stacks deal with read ahead caching and the translated calls to the actual device where the read size is adjusted to the actual sector read size the device actually supports.
You might ask what the IRP column is for. IRP is an I/O request packet which is the similar to an opaque file handle. Each device driver can create its own IRP where it issues IO requests to other drivers. That way you can follow how different drivers split their work up. So far I have not found a useful grouping for it.
It looks like the FileAccess.RandomAccess behaves largely like unbuffered IO where you exactly read as much data as you need in the granularity of the cluster size of the actual device.
FileAccess.SequentialScan
Now we can try it out a second time by using FileAccess.SequentialScan. The first read is 8KiB and the next one is 184KiB which comes from the read ahead prefetching mechanism. The assumption that we read much more than the first 10KB is quite optimistic which leads in this case to over 18 times more hard disc reads than we would need to for this use case.
FileAccess.SequentialScan is certainly a bad choice for small buffered reads. If we look at the timings we see that as well.
Scenario: 1K Files with 2x5000 bytes reads | Time in s |
FileAccess.RandomAccess | 3,9 |
FileAccess.SequentialScan | 6,5 |
FileAccess.None
Last but not least what will happen when we do not specify any option (FileOptions.None). At least in this case it behaves exactly like Sequential Scan where also 196MB of data is fetched from the disc although we only wanted to read only 10MB from the file headers.
Conclusions
That sounds like if you never should use SequentialScan. But in most scenarios where you read a file, parse a little bit and then read the next chunk of the file this is exactly what you want. In theory you can use async IO to read the file asynchronously while parsing the stuff on another thread. But if you are lucky you can do it without extra threading on your side while the OS is kind enough to read ahead of your application logic while you are still reading and parsing the file single threaded. You get the the benefits of multithreading without using extra application threads for it!
The read ahead logic is not optimal in all cases but helps where otherwise much more complex code would be necessary to read from a hard disc at full speed (60-300MB/s). What is the morale of this story? Never use FileOptions.SequentialScan or FileOptions.None if you want to read only the file headers! Otherwise you are consuming much more IO bandwidth than necessary to achieve your goal and you are left scratching your head why switching from unbuffered IO to buffered IO has made reading your files so much slower.