morph ocl kernel for erode and dilate filter
This kernel is for CV_8UC1 format and 3x3 kernel size,
It is about 33% ~ 55% faster than current ocl kernel with below perf test
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture*
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture*
Also add accuracy test cases for this kernel, the test command is
./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3*
Signed-off-by: Li Peng <peng.li@intel.com>