review.tizen.org Git - platform/core/ml/nntrainer.git/commit

author	skykongkong8 <ss.kong@samsung.com>
	Wed, 3 Apr 2024 04:16:06 +0000 (13:16 +0900)
committer	Jijoong Moon <jijoong.moon@samsung.com>
	Wed, 3 Apr 2024 11:48:34 +0000 (20:48 +0900)
commit	08d02ec03f2d27d5339452216037ee67c0460379
tree	1279935d513b14b92ad2a14356b925e7f0400e5a	tree \| snapshot
parent	5d38c09ace47e17b391a9eb84c681bec4814ffc0	commit \| diff

[ hgemm ] Implement 4x8 hgemm kernel

- This commit introduces 2 types of 4x8 hgemm kernel
        1. full-fp16
        2. fp16-fp32 partial accumulation
- Additionally, 4x8 kernel has macro kernel that can regulate accuracy-latency tradeoff. By default it uses partial sum up to 256 digits. Other kernels will be refactored in this way ASAP.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>