Use acq/rel semantics to pass flags/pointers in getrf_parallel.
authorAli Saidi <alisaidi@amazon.com>
Mon, 24 Feb 2020 05:45:30 +0000 (05:45 +0000)
committerAli Saidi <alisaidi@amazon.com>
Fri, 6 Mar 2020 06:22:31 +0000 (06:22 +0000)
commit208c7e7ca50a8bfdfabbec750bdc538023c94aed
treeed48f5c6912392be705e0f43f91446805f1f7d38
parent014fc13995e12fef81e94d06ca6ac8dd3f6c58c5
Use acq/rel semantics to pass flags/pointers in getrf_parallel.

The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
lapack/getrf/getrf_parallel.c