address CUDA-related errors and enable cuda in elementwise ops