Question

This is a lock that can be held by only one thread of execution at a time. An attempt to acquire the lock by another thread of execution makes the latter loop until the lock is released.

How does it handle the case when two threads try to acquire the lock exactly the same time?

I think this question also applies to various of other mutex implementation.

Answer 1

As the previous poster indicates, every modern machine type has a special class of instruction known as atomics that do operate as the previous poster indicates... they serialize execution against at least the specified memory location.

On x86, there is a LOCK assembler prefix that indicates to the machine that the next instruction should be handled atomically. When the instruction is encountered, several things effectively happen on x86.

Pending read prefetches are canceled (this means that the CPU won t present data to the program that may be made stale across the atomic).
Pending writes to memory are flushed.
The operation is performed, guaranteed atomically and serialized against other CPUs. In this context, serialized means they happen one-at-a-time . Atomically means "all the parts of this instruction happen without anything else intervening".

For x86, there are two commonly used instructions that are used to implement locks.

CMPXCHG. Conditional exchange. Pseudocode:

uint32 cmpxchg(uint32 *memory_location, uint32 old_value, uint32 new_value) {
    atomically {
        if (*memory_location == old_value) 
            *memory_location = new_value;
        return old_value;
    }
}

XCHG. Pseudocode:

uint32 xchg(uint32 *memory_location, uint32 new_value) {
    atomically {
        uint32 old_value = *memory_location;
        *memory_location = new_value;
        return *old_value;
    }
}

So, you can implement a lock like this:

uint32 mylock = 0;
while (cmpxchg(&mylock, 0, 1) != 0)
    ;

We spin, waiting for the lock, hence, spinlock.

Now, unlocked instructions don t exhibit these nice behaviors. Depending on what machine you re on, with unlocked instructions, all sorts of violations of consistency can be observed. For example, even on x86, which has a very friendly memory consistency model, the following could be observed:

    Thread 1      Thread 2
    mov [w], 0    mov [x], 0
    mov [w], 1    mov [x], 2
    mov eax, w    mov eax, x
    mov [y], eax  mov [z], eax

At the end of this program, y and z can both have the value 0!.

Anyway, one last note: LOCK on x86 can be applied to ADD, OR, and AND, in order to get consistent and atomic read-modify-write semantics for the instruction. This is important for, say, setting flag variables and making sure they don t get lost. Without that, you have this problem:

   Thread 1       Thread 2
   AND [x], 0x1   AND [x], 0x2

At the end of this program, possible values for x are 1, 2, and 0x1|0x2 (3). In order to get a correct program, you need:

   Thread 1           Thread 2
   LOCK AND [x], 0x1  LOCK AND [x], 0x2

Hope this helps.

Answer 2

Depends on the processor and the threading implementation. Most processors have instructions that can be executed atomically, on top of which you can build things like spin locks. For example IA-32 has an xchg instruction that does an atomic swap. You can then implement a naive spinlock like:

  eax = 1;
  while( xchg(eax, lock_address) != 0 );
  // now I have the lock
  ... code ...
  *lock_address = 0; // release the lock

友情链接