Auto TensorCore CodeGen (#4234)
* Add Auto TensorCore TensorCore Unit Test
* Rebase to tvm master branch & Add auto tensor core
* Code Refine
* Add tensor core switch by pragma
* Add pragma in tensor core example code
* Get real tile size to replace hard coded 16
* support more than 2 dimensions (e.g. batchmatmul) for buffer bind scope
* support batch matmul
* Move cuda env check to tensor_core.cc
* Coderefine for tensor_core.cc
* Refine comments
* Some refinements of code and comment
* Update TensorCore UT to pass the CPU test
* remove redundant code
* matmul's storage align for different layout
* Add support for differenct position of type cast
* Add formal tutorial for auto tensorcore codegen
* move tensorcore check up to tutorial code
* code and doc refine
* comment out tune_and_evaluate in tutorial
* fix cpplint error