Considering the following assembly code loop:
#include
#define ADD_LOOP(i, n, v) \
asm volatile ( \
"movw %1, %%cx ;" \
"movq %2, %%rax ;" \
"movq $0, %%rbx ;" \
"for: ;" \
"addq %%rax, %%rbx ;" \
"decw %%cx ;" \
"jnz for ;" \
"movq %%rbx, %0 ;" \
: "=x"(v) \
: "n"(i), "x"(n) \
: "%cx", "%rax", "%rbx" \
);
int main() {
uint16_t iter(10000);
uint64_t num(5);
uint64_t val;
ADD_LOOP(iter, num, val)
std::cout << val << std::endl;
return 0;
}
Is possible to call a C function (or it's machine code output) from within a loop as specified above?
for example:
#include
int main() {
__m128i x, y;
for(int i = 0; i < 10; i++) {
x = __builtin_ia32_aesenc128(x, y);
}
return 0;
}
Thanks
Answer
No. Builtin functions aren't real functions that you can call with call
. They always inline when used in C / C++.
For example, if you want int __builtin_popcount (unsigned int x)
to get either a popcnt
instruction for targets with -mpopcnt
, or a byte-wise lookup table for targets that don't support the popcnt
instruction, you are out of luck. You will have to #ifdef
yourself and use popcnt
or an alternative sequence of instructions.
The function you're talking about, __builtin_ia32_aesenc128
is just a wrapper for the aesenc
assembly instruction which you can just use directly if writing in asm.
If you're writing asm instead of using C++ intrinsics (like #include
for performance, you need to have a look at http://agner.org/optimize/ to write more efficient asm (e.g. use %ecx
as a loop counter, not %cx
. You're gaining nothing from using a 16-bit partial register).
You could also write more efficient inline-asm constraints, e.g. the movq %%rbx, %0
is a waste of an instruction. You could have used %0
the whole time instead of an explict %rbx
. If your inline asm starts or ends with a mov instruction to copy to/from an output/input operand, usually you're doing it wrong. Let the compiler allocate registers for you. See the inline-assembly tag wiki.
Or better, https://gcc.gnu.org/wiki/DontUseInlineAsm. Code with intrinsics typically compiles well for x86. See Intel's intrinsics guide: #include
and use __m128i _mm_aesenc_si128 (__m128i a, __m128i RoundKey)
. (In gcc that's just a wrapper for __builtin_ia32_aesenc128
, but it makes your code portable to other x86 compilers.)
No comments:
Post a Comment