gcc - Is it possible to call a built in function from assembly in C++

Monday, April 30, 2018

gcc - Is it possible to call a built in function from assembly in C++

Considering the following assembly code loop:

#include 

#define ADD_LOOP(i, n, v)       \
asm volatile (                  \
    "movw %1, %%cx      ;"      \
    "movq %2, %%rax     ;"      \
    "movq $0, %%rbx     ;"      \
    "for:               ;"      \
    "addq %%rax, %%rbx  ;"      \
    "decw %%cx          ;"      \

    "jnz for            ;"      \
    "movq %%rbx, %0     ;"      \
    : "=x"(v)                   \
    : "n"(i), "x"(n)            \
    : "%cx", "%rax", "%rbx"     \
);

int main() {
    uint16_t iter(10000);
    uint64_t num(5);

    uint64_t val;

    ADD_LOOP(iter, num, val)

    std::cout << val << std::endl;

    return 0;
}

Is possible to call a C function (or it's machine code output) from within a loop as specified above?

for example:

#include 

int main() {
    __m128i x, y;

    for(int i = 0; i < 10; i++) {

        x = __builtin_ia32_aesenc128(x, y);
    }

    return 0;
}

Thanks

Answer

No. Builtin functions aren't real functions that you can call with call. They always inline when used in C / C++.

For example, if you want int __builtin_popcount (unsigned int x) to get either a popcnt instruction for targets with -mpopcnt, or a byte-wise lookup table for targets that don't support the popcnt instruction, you are out of luck. You will have to #ifdef yourself and use popcnt or an alternative sequence of instructions.

The function you're talking about, __builtin_ia32_aesenc128 is just a wrapper for the aesenc assembly instruction which you can just use directly if writing in asm.

If you're writing asm instead of using C++ intrinsics (like #include for performance, you need to have a look at http://agner.org/optimize/ to write more efficient asm (e.g. use %ecx as a loop counter, not %cx. You're gaining nothing from using a 16-bit partial register).

You could also write more efficient inline-asm constraints, e.g. the movq %%rbx, %0 is a waste of an instruction. You could have used %0 the whole time instead of an explict %rbx. If your inline asm starts or ends with a mov instruction to copy to/from an output/input operand, usually you're doing it wrong. Let the compiler allocate registers for you. See the inline-assembly tag wiki.

Or better, https://gcc.gnu.org/wiki/DontUseInlineAsm. Code with intrinsics typically compiles well for x86. See Intel's intrinsics guide: #include and use __m128i _mm_aesenc_si128 (__m128i a, __m128i RoundKey). (In gcc that's just a wrapper for __builtin_ia32_aesenc128, but it makes your code portable to other x86 compilers.)

Blog

Monday, April 30, 2018