added v_reduce_sum4() universal intrinsic; corrected number of threads in cv::getNumT...