Figure 7-1.

An overview with six tables that show the machine instructions for vaarious atomic operations.

The rows on each table are labeled ARMv8 and x86-64.

The columns are labeled with various memory orderings.

First table: load.

The ARMv8 row shows the LDR instruction is used in both the non-atomic case and relaxed ordering. The LDAR instruction is used for both acquire and sequentially consistent ordering.

The x86-64 row shows the MOV instruction in all cases.

Second table: store.

The ARMv8 row shows the STR instruction is used in both the non-atomic case and relaxed ordering. The STLR instruction is used for both release and sequentially consistent ordering.

The x86-64 row shows the MOV instruction in all cases except for sequentially consistent ordering, which uses XCHG.

Third table: swap.

The ARMv8 row shows two instructions for the non-atomic case: LDR, STR. The rest of the row shows a nearly identical three instruction loop for each memory ordering: LDXR, STXR, CBNZ, with an arrow indicating a branch from the last instruction to the first. In the acquire, acquire-release, and sequentially consistent columns, the LDXR instruction is replaced by LDXAR. In the release, acquire-release, and sequentially consistent columns, the STXR instruction is replaced by STLXR.

An additional row labeled ARMv8.1 does not contain these loops, but instead has a single instruction for each memory ordering. SWP for relaxed, SWPA for acquire, SWPL for release, and SWPAL for both acquire-release and sequentially consistent.

The x86-64 row shows MOV, MOV for the non-atomic case, and XCHG for all atomic memory orderings.

Fourth table: fetch op.

Op is a placeholder, such that this table represents fetch-add, fetch-sub, and so on.

The ARMv8 row is very similar as in the previous table, but with an additional instruction in the middle of each listing labeled op. Non-atomic: LDR, op, STR. Relaxed: LDXR, op, STXR, CBNZ, with an arrow indicating a branch from the last instruction to the first. The rest of the row shows a nearly identical four instruction loop for the other memory orderings In the acquire, acquire-release, and sequentially consistent columns, the LDXR instruction is replaced by LDXAR. In the release, acquire-release, and sequentially consistent columns, the STXR instruction is replaced by STLXR.

An additional row labeled ARMv8.1 does not contain these loops, but instead has a single instruction for each memory ordering. LDOP for relaxed, LDOPA for acquire, LDOPL for release, and LDOPAL for both acquire-release and sequentially consistent. The OP part of these instructions is a placeholder. It can stand for ADD, SUB, and so on.

The x86-64 row is split in two. The first half is labeled “op”, and the second is labeled “fetch and op”.

The first part has just a single “op” instruction for the non-atomic case, and “lock op” for all atomic memory orderings.

The second part, labeled “fetch and op”, shows two instrucitons for the non-atomic case: MOV, op. For the atomic case, regardless of memory ordering, it shows multiple options. Either “lock” followed by XADD, BTS, BTR, or BTC, or a four-instruction loop: MOV, op, lock CMPXCHG, JNE, with an arrow indicating a branch from the last instruction to the first.

Fifth table: compare exchange.

This table has two columns: weak and strong.

The weak ARMv8 instructions start with either LDXR or LDAXR, followed by CMP, BNE, and then either STXR or STLXR. An arrow from BNE branches off to a separate path with CLREX.

The strong ARMv8 instructions are identical, except for an additional CBNZ at the end, with an arrow indicating a branch back to the first instruction.

An additional row labeled ARMv8.1 shows no difference between the weak and strong cases. It is just a single CAS, CASA, CASL, or CASAL.

The x86-64 row shows the same instruction for both the weak and strong case: lock CMPXCHG.

Sixth and last table: fence.

The ARMv8 row shows DMB ISHLD for an acquire fence, and DMB ISH for a release, acquire-release and sequentially consistent fence.

The x86-64 row shows MFENCE for sequentially consistent fence, and no instructions for the other fences.