This patch prefetches RxD descriptors which helps to lower the latency of a
cache miss in vxge_hw_ring_rxd_next_completed. This lowers the % of CPU
time used by vxge_hw_ring_rxd_next_completed() where the descriptor is
accessed in profiling netperf on a P4 Xeon from 1.5% to 1.0%.
Signed-off-by: Benjamin LaHaise <ben.lahaise@neterion.com>
Signed-off-by: Sreenivasa Honnur <sreenivasa.honnur@neterion.com>
Signed-off-by: Ramkrishna Vepa <ram.vepa@neterion.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vxge_hw_ring_replenish(ringh, 0);
do {
+ prefetch((char *)dtr + L1_CACHE_BYTES);
rx_priv = vxge_hw_ring_rxd_private_get(dtr);
skb = rx_priv->skb;
data_size = rx_priv->data_size;
vxge_assert(channel->compl_index < channel->length);
*dtrh = channel->work_arr[channel->compl_index];
+ prefetch(*dtrh);
}
/*