[CODEGEN] Support cuda tensorcore subbyte int data type in auto tensorcore (#4546)
* support cuda tensorcore subbyte int data type in auto tensorcore
* add lisence
* pass cpplint
* fix code review comments
* merge the int4/int1 codegen tutorial into the existing auto tensorcore tutorial
* using master's new API
* disable tuning when cuda is not enabled
* address cr comment
* do not run the tuning
* fix test failure
* fix cpplint error
* fix bool type reduction bug
* 1. fix a index bug 2. fix returned bytes value of int1/int4/uint4
* fix typo