A Single Line Performance Error

Originally posted on: http://geekswithblogs.net/akraus1/archive/2014/12/21/160740.aspx

We all make dumb mistakes which are not interesting to share. But sometimes people make interesting mistakes. Below is a simple helper class that makes combining several delegates in a logical and manner very easy.

staticclass Constraints<T>
    {/// <summary>/// Logical and of a list of predicates./// </summary>publicstatic Predicate<T> And(params Predicate<T>[] ands)
        {// return a new delegate which checks if all delegates in the array did return // true for a given itemreturn item => ands.All(x => x(item));
        }
    }

You can use it then like this:

class Program
    {staticvoid Main(string[] args)
        {int[] data = newint[20 * 1000 * 1000]; // 20 million ints

            Predicate<int> CheckForZero = Constraints<int>.And(i => i > -1, 
                                                               i => i < 1);

            var sw = Stopwatch.StartNew();
            for (int i = 0; i < data.Length; i++)
            {if (CheckForZero(data[i]) == false)
                {
                    Console.WriteLine("Violation found at {0}", i);
                }
            }
            sw.Stop();
            Console.WriteLine("Did take: {0:F2}s", sw.Elapsed.TotalSeconds);
        }
   }

Now you have a nice way to combine separate checks in a reusable way. The only issue is that for many calls to CheckForZero the running time is quite high.

Did take: 0,90s

If we compare this to the "normal" way

for (int i = 0; i < data.Length; i++)
            {if ( !( data[i] > -1 && data[i] < 1) )
                {
                    Console.WriteLine("Violation found at {0}", i);
                }
            }

Did take: 0,01s

Whoa. The delegate version is 90 times slower! I found this issue by looking at the mount Everest's in terms of CPU consumption. What is causing so much overhead?

Well you just did wake up a sleeping dragon

with your LINQ delegate version. It is the Garbage Collector! Here is the allocation report of our slow sample:

We have a 80MB int array with 20 million elements. But we allocate 2,5GB of closure classes! Our Constraints.And method needs one closure to capture the list of delegates.

This is only needed once since we cache the returned delegate and will not show up in the profiler at all.

Things get interesting when we invoke the And method of the captured class. In this case we are allocating one additional closure to give access to the current item to check and the enclosing capturing class which gives us access to the ands array needs also be allocated. On top of that we need to create a new Func<Predicate<T>, bool> delegate instance as argument to the All LINQ statement.

To sum it up we allocate on each iteration

1 c_DisplayClass4 32 bytes
1 Func<Predicate<T,bool>> delegate instance 64 bytes
1 SZArrayHelper+SZGenericArrayEnumerator 32 bytes
- Where does this come from? It is caused by the All LINQ statements which iterates over an IEnumerable. If you foreach over an array it needs to get an enumerator from the object to enumerate. Since an array has none out of the box the CLR allocates a helper enumerator which is this strange SZArrayHelper class. If you directly foreach over an array this is NOT used. Instead more efficient code is generated.

In total we allocate 128 bytes on each integer we want to check (4 bytes) which is quite some overhead. 128*20.000.000=2.560.000.000 which is exactly what the profiler reports. Since I did use from PerfView the ETW .NET Alloc method

I get a correct total sum of the allocated bytes back. But the types which are responsible for the allocation and especially the counts can be misleading. If you use .NET Alloc you get all allocations but the use case will not run 1s but >2 minutes which is not something I really like to do. If you want to get exact allocation number you need to check the .NET Alloc checkbox which injects a profiling dll into all new started processes so you can get exact allocation reports. Unfortunately this does not seem to work with the old and current PerfView (1.7) version where I see anything but my new allocations.

When in doubt I would always use ETW .NET Alloc to get reliable although sampled numbers. This works also for JITed code in x64 on Windows 7 machines because for each allocation stack a managed stackwalk is also performed which gives you the full allocation stack. ETW does not work for JITed x64 code on Win7 except if you precompile it with NGen as workaround. On Win 8 and beyond this is no longer an issue except if you try to use API Hooking. There the x64 stack walker still stops walking the stack when the method interception is done in a form it cannot recognize.

This sounds like some complex stuff. Can we fix it? How about providing an overload to our class?

publicstatic Predicate<T> And(Predicate<T> p1, Predicate<T> p2)
        {return item => p1(item) && p2(item);
        }

This little overload spares us pretty much all of the overhead. We do not allocate a single byte while iterating over our array. This time we get

Did take: 0,14s

Which is nearly 9 times faster. This is quite good for such a simple overload which is preferred by the compiler when it finds this one as best match. Now we only pay for 3 delegate invocations which are not avoidable if we want to keep the flexibility here.

Now go and check your LINQ statements for hidden dragons. 2,5GB allocations with one line are quite easy to achieve with delegates and LINQ. This is the main reason why LINQ is banned from the Roslyn project because of these subtle delegate issues. It is kind of ironic that the inventors of LINQ do not want to use their own invention in their C# compiler backend for performance reasons.

A Single Line Performance Error

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112