Diff from upstream:
Removed clang-specific branch for x86 DCAS-based loads.
The storage to load from is const-qualified and DCAS via compiler intrinsics
require an unqualified pointer. Use asm implementation instead, which should be
as efficient as intrinsics, if not better, in this case.
6e14ca24da