* [VTA] Performance optimize, remove unnecessary contigious memory use.
Issue:
Uop maintain a cache vector to copy uop data into contigious DRAM memory for
FPGA/Simulator use, but this cache vector not get clear after FPGA/Simulator
core run, in Resnet18 case, if we printf the cache size in UopQueue::ReadBarrier
function, we can saw such cache size keep increase, this would cause
no use data copy and unnecessary contigous DRAM memory malloc.
Analysis:
This issue caused by not clear cache_ vector when do
uop_queue_.Reset().
Solution:
Override BaseQueue Reset function in UopQueue and add cache_ clear
logic.
* address review comments, remove spacing.
* \brief Reset the pointer of the buffer.
* Set SRAM pointer to be the current end.
*/
- void Reset() {
+ virtual void Reset() {
dram_buffer_.clear();
sram_begin_ = sram_end_;
}
sram_begin_ = sram_end_;
}
}
+ /*! \brief clear cache and reset base queue buffer.*/
+ void Reset() {
+ cache_.clear();
+ cache_idx_ = 0;
+ BaseQueue<VTAUop>::Reset();
+ }
void AutoReadBarrier() {
ReadBarrier();
}