Today we are going to talk about Bubble sort slower with -O3 than -O2 with gcc. So lets start this article without wasting your time. We are going to learn about every possible method.

Why Bubble sort slower with -O3 than -O2 with gcc

  1. Bubble sort slower with -O3 than -O2 with gcc

    Bubble sort slower with -O3 than -O2 with gcc optimizing Bubble Sort for size instead of speed can involve memory-destination rotate (creating store-forwarding stalls for back-to-back swaps), or a memory-destination xchg (implicit lock prefix -> very slow).

  2. Why Bubble sort slower with -O3 than -O2 with gcc

    Bubble sort slower with -O3 than -O2 with gcc optimizing Bubble Sort for size instead of speed can involve memory-destination rotate (creating store-forwarding stalls for back-to-back swaps), or a memory-destination xchg (implicit lock prefix -> very slow).

Here Looks like GCC’s naivete about store-forwarding stalls is hurting its auto-vectorization strategy here. It’s doing 64-bit loads (and branching to store or not) on pairs of ints. This means, if we swapped last iteration, this load comes half from that store, half from fresh memory, so we get a store-forwarding stall after every swap. (https://agner.org/optimize/).

But Bubble Sort often has long chains of swapping every iteration as an element bubbles far, so this is really bad.

Even better would be to keep buf[x+1] in a register and use it as buf[x] next iteration avoiding a store and load. (Like good hand-written asm bubblesort examples, a few of which exist on SO.)

If it wasn’t for the store-forwarding stalls (which AFAIK GCC doesn’t know about in its cost model), this strategy might be about break-even. SSE4.1 for a branchless pmind / pmaxd comparator might be interesting, but that would mean always storing and the C source doesn’t do that.

optimizing Bubble Sort for size instead of speed can involve memory-destination rotate (creating store-forwarding stalls for back-to-back swaps), or a memory-destination xchg (implicit lock prefix -> very slow).

Summary

It’s all About this tutorial. Hope all methods helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which method worked for you?

Also, Read