[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )

Kosov Pavel via llvm-dev llvm-dev at lists.llvm.org
Thu Nov 21 02:00:22 PST 2019


I noticed that in some cases clang generates sequence of AND+TST instructions:

For example:

       AND          x3, x2, x1

         TST            x2, x1

I think these instructions should be merged to one:

         ANDS       x3, x2, x1

( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>, <Xm> - https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf  )

Is it missing optimization or there could be some negative effect from such merge?

PS: Code sample (though it may be significantly reduced):

(clang -target aarch64 sample.c -S -O2 -o sample.S )


#define NULL ((void*)0)

typedef struct {

    unsigned long * res_in;

    unsigned long * proc;

    } fd_set_bits;

fd_set_bits *gv_fds;

int g_max_i;


unsigned DEF_MASK;

__attribute__((noinline)) int do_test(const int max_iters_count,

                                        const unsigned long in,

                                        const unsigned long out,

                                        const unsigned long ex,

                                        const unsigned long bit_init_val,

                                        const unsigned long mask) {

    int retval = 0;

    for(int k =0 ; k < max_iters_count; k++)


        fd_set_bits *fds = gv_fds;

        for(int j = 0; j < LOOP_ITERS_COUNT; ++j)


          if (in) {


            fds->proc = NULL;


          if (mask & DEF_MASK) {

            fds->proc = NULL;




         return retval;



