libstdc++: std::includes performance tweak
A small tweak to the implementation of __includes, which in my
application saves 20% of the running time. I noticed it because using
range-v3 was giving unexpected performance gains.
Some of the gain comes from pulling the 2 calls ++__first1 out of the
condition so there is just one call. And most of the gain comes from
replacing the resulting
if (__comp(__first1, __first2))
;
else
++__first2;
with
if (!__comp(__first1, __first2))
++__first2;
I was very surprised that the code ended up being so different for such
a change, and I still don't really understand where the extra time is
going...
Anyway, while I blame the compiler for not generating very good code
with the current implementation, I believe the change can be seen as a
simplification.
libstdc++-v3/ChangeLog:
* include/bits/stl_algo.h (__includes): Simplify the code.