[OpenMP][DeviceRTL] Fixed an issue that causes hang in SU3
The synchronization at the end of parallel region cannot make sure all threads
exit the scope. As a result, the assertions right after it might be hit, and
further the `state::assumeInitialState(IsSPMD)` in `__kmpc_target_deinit` may
not hold as well. We either add a synchronization right after the parallel region,
or remove the assertions and assuptions. Here we choose the first one as those
assertions and assumptions can help optimizations.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112861