Fix alignment issues for Fake BFP16 fp32 -> bfp16 rounding routines (#18321)
authorJianyu Huang <jianyuhuang@fb.com>
Fri, 22 Mar 2019 19:28:04 +0000 (12:28 -0700)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Fri, 22 Mar 2019 19:41:58 +0000 (12:41 -0700)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18321

As title.

Reviewed By: jspark1105

Differential Revision: D14575512

fbshipit-source-id: 0e33cdab54b1aef8b67f0b4c366692c5dbdf631d

caffe2/quantization/server/fully_connected_fake_lowp_op_avx2.cc

index c8ca69e..414bfe2 100644 (file)
@@ -93,7 +93,7 @@ void fp32_to_bfp16_round(const float* source, size_t size, float* dest) {
         reinterpret_cast<__m256i*>(&dest[i]), _mm256_and_si256(wmask, v32int));
   }
   for (auto i = (size / 8) * 8; i < size; i++) {
-    alignas(8) float tmp[8];
+    alignas(64) float tmp[8];
     __m256i v32int = _mm256_add_epi32(
         _mm256_set1_epi32(*reinterpret_cast<const int*>(&source[i])), woffset);
     _mm256_store_si256(