Improve NHWC depthwise convolution for AArch64 (#6095)
* Improve NHWC depthwise convolution for aarch64
We created a default schedule (no auto-tuning or tensorization) named
depthwise_conv2d_nhwc which does a decent job at optimizing depthwise
for NHWC layouts (on aarch64).
Change-Id: I01e32903f6c1950623f33eae18484e70244fe0af
* Add tuning knobs in depthwise schedule
Change-Id: I15080e7f12b16e6c6aba99a04e42023845eeabf1
* Introduce padding policy
Change-Id: If12a6d05dce9153861550ddef1ee5216809dd1e1
* Vectorize padding
Change-Id: I7e2062a40358bf111c0366a449945eb077fb2e30
* Legalize depthwise convolution (2x improvement) and fix tuning issue
Change-Id: I4b82c58b167e40b0b7747d28293bbb488c505dd9
* Adding assert on padding
Change-Id: Idf8eeaaface5eb7799109cd00f437e404778b9cd
* Fix python linting
Change-Id: Iac16a8daea1268f0eb331fe4ec18a62408106cf9
* Removing commented code
Change-Id: I1412f22ad9864273d77a7bf38a6768694339b7f0
* Revert test file to make CI pass
Change-Id: Ica3eff8f9f0fd4c6f32f7ae80adc922f8b16cec9
* Enabling only arm_cpu tests
Change-Id: Icbaafcb39e892a5d1a4685133c1699e4d1a8e07e
* Rebasing
Change-Id: Ibb23f1d4e0d0107e4e3b3571437161cdc2ee2909