review.tizen.org Git - platform/upstream/llvm.git/commit

[llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)

Summary:
This is an alternative to D59539.

Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.

Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.

This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59820

llvm-svn: 357152

author	Roman Lebedev <lebedev.ri@gmail.com>
	Thu, 28 Mar 2019 08:55:01 +0000 (08:55 +0000)
committer	Roman Lebedev <lebedev.ri@gmail.com>
	Thu, 28 Mar 2019 08:55:01 +0000 (08:55 +0000)
commit	c2423fe6899aad89fe0ac2aa4b873cb79ec15bd0
tree	0e4f2a257de05442b34177fc649e9e5b2f5f887d	tree \| snapshot
parent	aff4efffb3ac04a06b0acdb3b948222532bcf15b	commit \| diff

llvm/docs/CommandGuide/llvm-exegesis.rst		diff \| blob \| history
llvm/test/tools/llvm-exegesis/X86/analysis-clustering-algorithms.test	[new file with mode: 0644]	blob
llvm/test/tools/llvm-exegesis/X86/analysis-naive-cluster-stabilization.test	[new file with mode: 0644]	blob
llvm/test/tools/llvm-exegesis/X86/analysis-naive-clusterization.test	[new file with mode: 0644]	blob
llvm/test/tools/llvm-exegesis/X86/analysis-same-cluster-for-ops-in-different-sched-clusters.test	[new file with mode: 0644]	blob
llvm/tools/llvm-exegesis/lib/Analysis.cpp		diff \| blob \| history
llvm/tools/llvm-exegesis/lib/Analysis.h		diff \| blob \| history
llvm/tools/llvm-exegesis/lib/Clustering.cpp		diff \| blob \| history
llvm/tools/llvm-exegesis/lib/Clustering.h		diff \| blob \| history
llvm/tools/llvm-exegesis/llvm-exegesis.cpp		diff \| blob \| history
llvm/unittests/tools/llvm-exegesis/ClusteringTest.cpp		diff \| blob \| history