Monday, July 23, 2018

c++ - What is the overhead in splitting a for-loop into multiple for-loops, if the total work inside is the same?




What is the overhead in splitting a for-loop like this,



int i;

for (i = 0; i < exchanges; i++)
{

// some code
// some more code
// even more code
}


into multiple for-loops like this?



int i;


for (i = 0; i < exchanges; i++)
{
// some code
}

for (i = 0; i < exchanges; i++)
{
// some more code
}


for (i = 0; i < exchanges; i++)
{
// even more code
}


The code is performance-sensitive, but doing the latter would improve readability significantly. (In case it matters, there are no other loops, variable declarations, or function calls, save for a few accessors, within each loop.)



I'm not exactly a low-level programming guru, so it'd be even better if someone could measure up the performance hit in comparison to basic operations, e.g. "Each additional for-loop would cost the equivalent of two int allocations." But, I understand (and wouldn't be surprised) if it's not that simple.




Many thanks, in advance.


Answer



There are often way too many factors at play... And it's easy to demonstrate both ways:



For example, splitting the following loop results in almost a 2x slow-down (full test code at the bottom):



for (int c = 0; c < size; c++){
data[c] *= 10;
data[c] += 7;
data[c] &= 15;

}


And this is almost stating the obvious since you need to loop through 3 times instead of once and you make 3 passes over the entire array instead of 1.



On the other hand, if you take a look at this question: Why are elementwise additions much faster in separate loops than in a combined loop?



for(int j=0;j    a1[j] += b1[j];
c1[j] += d1[j];

}


The opposite is sometimes true due to memory alignment.






What to take from this?



Pretty much anything can happen. Neither way is always faster and it depends heavily on what's inside the loops.




And as such, determining whether such an optimization will increase performance is usually trial-and-error. With enough experience you can make fairly confident (educated) guesses. But in general, expect anything.




"Each additional for-loop would cost the equivalent of two int allocations."




You are correct that it's not that simple. In fact it's so complicated that the numbers don't mean much. A loop iteration may take X cycles in one context, but Y cycles in another due to a multitude of factors such as Out-of-order Execution and data dependencies.



Not only is the performance context-dependent, but it also vary with different processors.







Here's the test code:



#include 
#include
using namespace std;

int main(){


int size = 10000;
int *data = new int[size];


clock_t start = clock();

for (int i = 0; i < 1000000; i++){
#ifdef TOGETHER
for (int c = 0; c < size; c++){

data[c] *= 10;
data[c] += 7;
data[c] &= 15;
}
#else
for (int c = 0; c < size; c++){
data[c] *= 10;
}
for (int c = 0; c < size; c++){
data[c] += 7;

}
for (int c = 0; c < size; c++){
data[c] &= 15;
}
#endif
}

clock_t end = clock();
cout << (double)(end - start) / CLOCKS_PER_SEC << endl;


system("pause");
}


Output (one loop): 4.08 seconds
Output (3 loops): 7.17 seconds


No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...