Improve BitVecOps<>::Iter::NextElem (#11696)
* Improve BitVecOps<>::Iter::NextElem
Tweak the implementation, to reduce the number of instructions
executed in the hot path.
Also, don't pass "env" to NextElem; it can be stored by Init()
if required. For non-inlined calls, this saves setting up one
argument.
Use a `m_bsEnd` end condition. This eliminates the need to handle
short/long differently, and reduces conditions when updating
the current bits to iterate over in the long case.
Overall, pin shows this reduces instruction count of superpmi over
a minopts test run by 2.6% (NextElem is very hot).
Also, fix BitSetAsUInt64 NextElem() iterator: It should store and updated
its own bit count, and not depend on the value passed in to be the
correct latest bit count.
17 files changed: