DYNIX/ptx used an explicit memory barrier for publication, but had nothing
resembling <tt>rcu_dereference()</tt> for subscription, nor did it
have anything resembling the <tt>smp_read_barrier_depends()</tt>
-that was later subsumed into <tt>rcu_dereference()</tt>.
+that was later subsumed into <tt>rcu_dereference()</tt> and later
+still into <tt>READ_ONCE()</tt>.
The need for these operations made itself known quite suddenly at a
late-1990s meeting with the DEC Alpha architects, back in the days when
DEC was still a free-standing company.
Note that if checks for being within an RCU read-side
critical section are not required and the pointer is never
dereferenced, rcu_access_pointer() should be used in place
- of rcu_dereference(). The rcu_access_pointer() primitive
- does not require an enclosing read-side critical section,
- and also omits the smp_read_barrier_depends() included in
- rcu_dereference(), which in turn should provide a small
- performance gain in some CPUs (e.g., the DEC Alpha).
+ of rcu_dereference().
o The comparison is against a pointer that references memory
that was initialized "a long time ago." The reason
#define rcu_dereference(p) \
({ \
- typeof(p) _________p1 = p; \
- smp_read_barrier_depends(); \
+ typeof(p) _________p1 = READ_ONCE(p); \
(_________p1); \
})
Note the use of READ_ONCE() and smp_load_acquire() to read the
opposition index. This prevents the compiler from discarding and
-reloading its cached value - which some compilers will do across
-smp_read_barrier_depends(). This isn't strictly needed if you can
+reloading its cached value. This isn't strictly needed if you can
be sure that the opposition index will _only_ be used the once.
The smp_load_acquire() additionally forces the CPU to order against
subsequent memory references. Similarly, smp_store_release() is used
GENERAL mb() smp_mb()
WRITE wmb() smp_wmb()
READ rmb() smp_rmb()
- DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends()
+ DATA DEPENDENCY READ_ONCE()
All memory barriers except the data dependency barriers imply a compiler
Other CPUs may also have split caches, but must coordinate between the various
cachelets for normal memory accesses. The semantics of the Alpha removes the
-need for coordination in the absence of memory barriers.
+need for hardware coordination in the absence of memory barriers, which
+permitted Alpha to sport higher CPU clock rates back in the day. However,
+please note that smp_read_barrier_depends() should not be used except in
+Alpha arch-specific code and within the READ_ONCE() macro.
CACHE COHERENCY VS DMA