caching - How does cache associativity impact performance

Tuesday, February 19, 2019

caching - How does cache associativity impact performance

I am reading "Pro .NET Benchmarking" by Andrey Akinshin and one thing puzzles me (p.536) -- explanation how cache associativity impacts performance. In a test author used 3 square arrays 1023x1023, 1024x1024, 1025x1025 of ints and observed that accessing first column was slower for 1024x1024 case.

Author explained (background info, CPU is Intel with L1 cache with 32KB memory, it is 8-way associative):

When N=1024, this difference is exactly 4096 bytes; it equals the
critical stride value. This means that all elements from the first
column match the same eight cache lines of L1. We don’t really have
performance benefits from the cache because we can’t use it
efficiently: we have only 512 bytes (8 cache lines * 64-byte cache
line size) instead of the original 32 kilobytes. When we iterate the
first column in a loop, the corresponding elements pop each other from
the cache. When N=1023 and N=1025, we don’t have problems with the

critical stride anymore: all elements can be kept in the cache, which
is much more efficient.

So it looks like the penalty comes from somehow shrinking the cache just because the main memory cannot be mapped to full cache.

It strikes me as odd, after reading wiki page I would say the performance penalty comes from resolving address conflicts. Since each row can be potentially mapped into the same cache line, it is conflict after conflict, and CPU has to resolve those -- it takes time.

Thus my question, what is the real nature of performance problem here. Accessible memory size of cache is lower, or entire cache is available but CPU spends more time in resolving conflicts with mapping. Or there is some other reason?

Blog

Tuesday, February 19, 2019