KVM: x86: never specify a sample period for virtualized in_tx_cp counters
authorRobert O'Callahan <robert@ocallahan.org>
Wed, 1 Feb 2017 04:06:11 +0000 (17:06 +1300)
committerRadim Krčmář <rkrcmar@redhat.com>
Wed, 1 Mar 2017 13:19:46 +0000 (14:19 +0100)
pmc_reprogram_counter() always sets a sample period based on the value of
pmc->counter. However, hsw_hw_config() rejects sample periods less than
2^31 - 1. So for example, if a KVM guest does

    struct perf_event_attr attr;
    memset(&attr, 0, sizeof(attr));
    attr.type = PERF_TYPE_RAW;
    attr.size = sizeof(attr);
    attr.config = 0x2005101c4; // conditional branches retired IN_TXCP
    attr.sample_period = 0;
    int fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
    ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
    ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

the guest kernel counts some conditional branch events, then updates the
virtual PMU register with a nonzero count. The host reaches
pmc_reprogram_counter() with nonzero pmc->counter, triggers EOPNOTSUPP
in hsw_hw_config(), prints "kvm_pmu: event creation failed" in
pmc_reprogram_counter(), and silently (from the guest's point of view) stops
counting events.

We fix event counting by forcing attr.sample_period to always be zero for
in_tx_cp counters. Sampling doesn't work, but it already didn't work and
can't be fixed without major changes to the approach in hsw_hw_config().

Signed-off-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
arch/x86/kvm/pmu.c

index 06ce377dcbc9ffb40a655a89c9f0a43855f2a732..026db42a86c3236d9be95e84319c9bb3ed85db75 100644 (file)
@@ -113,12 +113,19 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
                .config = config,
        };
 
+       attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
+
        if (in_tx)
                attr.config |= HSW_IN_TX;
-       if (in_tx_cp)
+       if (in_tx_cp) {
+               /*
+                * HSW_IN_TX_CHECKPOINTED is not supported with nonzero
+                * period. Just clear the sample period so at least
+                * allocating the counter doesn't fail.
+                */
+               attr.sample_period = 0;
                attr.config |= HSW_IN_TX_CHECKPOINTED;
-
-       attr.sample_period = (-pmc->counter) & pmc_bitmask(pmc);
+       }
 
        event = perf_event_create_kernel_counter(&attr, -1, current,
                                                 intr ? kvm_perf_overflow_intr :