Thursday, February 22, 2018

performance - Why is there a significant difference in this C++ for loop's execution time?




I was going through loops and found a significant difference in accessing loops.
I can't understand what is the thing that causes such difference in both cases?




First Example:



Execution Time; 8 seconds



for (int kk = 0; kk < 1000; kk++)
{
sum = 0;
for (int i = 0; i < 1024; i++)
for (int j = 0; j < 1024; j++)
{

sum += matrix[i][j];
}
}


Second Example:



Execution Time: 23 seconds



for (int kk = 0; kk < 1000; kk++)

{
sum = 0;
for (int i = 0; i < 1024; i++)
for (int j = 0; j < 1024; j++)
{
sum += matrix[j][i];
}
}



What causes so much execution time difference just exchanging



matrix[i][j] 


to



matrix[j][i]



?


Answer



It's an issue of memory cache.



matrix[i][j] has better cache hits than matrix[j][i], since matrix[i][j] has more continuous memory accessing chances.



For example, when we access matrix[i][0], the cache may load a continuous segment of memory containing matrix[i][0], thus, accessing matrix[i][1], matrix[i][2], ..., will benefit from caching speed, since matrix[i][1], matrix[i][2], ... are near to matrix[i][0].



However, when we access matrix[j][0], it is far from matrix[j - 1][0] and may not been cached, and can not benefit from caching speed. Especially, a matrix is normally stored as a continuous big segment of memory, and the cacher may predicate the behavior of memory accessing and always cache the memory.




That's why matrix[i][j] is faster. This is typical in CPU cache based performance optimizing.


No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...