From: Yonghong Song <yhs@fb.com>
Date: Tue, 18 Jul 2017 21:12:10 +0000 (-0700)
Subject: generate proper usdt code to prevent llvm meddling with ctx->#fields
X-Git-Tag: submit/tizen_4.0/20171018.110122~60^2
X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=8206f547b8e3a669db49a3326affbef7012fec49;p=platform%2Fupstream%2Fbcc.git

generate proper usdt code to prevent llvm meddling with ctx->#fields

Qin reported a test case where llvm still messes up with ctx->#fields.
For code like below:
  switch(ctx->ip) {
    case 0x7fdf2ede9820ULL: *((int64_t *)dest) = *(volatile int64_t *)&ctx->r12; return 0;
    case 0x7fdf2edecd9cULL: *((int64_t *)dest) = *(volatile int64_t *)&ctx->bx; return 0;
  }
The compiler still generates:
    # r1 is the pointer to the ctx
    r1 += 24
    goto LBB0_4
  LBB0_3:
    r1 += 40
  LBB0_4:
    r3 = *(u64 *)(r1 + 0)
The verifier will reject the above code since the last load is not "ctx + field_offset"
format.

The responsible llvm optimization pass is CFGSimplifyPass. Its main implementation
in llvm/lib/Transforms/Utils/SimplifyCFG.cpp. The main routine to do the optimization
is SinkThenElseCodeToEnd. The routine canSinkInstructions is used to determine whether
an insn is a candidate for sinking.

Unfortunately, volatile load/store is not a condition to prevent the optimization.
But inline assembly is a condition which can prevent further optimization.

In this patch, instead of using volatile to annotate ctx->#field access, we do
normal ctx->#field access but put a compiler inline assembly memory barrier
   __asm__ __volatile__(\"\": : :\"memory\");
after the field access.

Tested with usdt unit test case, usdt_samples example, a couple of usdt unit tests
developed in the past.

Signed-off-by: Yonghong Song <yhs@fb.com>
---

diff --git a/src/cc/usdt.h b/src/cc/usdt.h
index ab14f2b1..313804f0 100644
--- a/src/cc/usdt.h
+++ b/src/cc/usdt.h
@@ -35,6 +35,9 @@ class ArgumentParser;
 static const std::string USDT_PROGRAM_HEADER =
     "#include <uapi/linux/ptrace.h>\n";
 
+static const std::string COMPILER_BARRIER =
+    "__asm__ __volatile__(\"\": : :\"memory\");";
+
 class Argument {
 private:
   optional<int> arg_size_;
diff --git a/src/cc/usdt_args.cc b/src/cc/usdt_args.cc
index ee1a3ee5..42fb360f 100644
--- a/src/cc/usdt_args.cc
+++ b/src/cc/usdt_args.cc
@@ -67,20 +67,27 @@ bool Argument::assign_to_local(std::ostream &stream,
   }
 
   if (!deref_offset_) {
-    tfm::format(stream, "%s = *(volatile %s *)&ctx->%s;", local_name, ctype(),
-                *base_register_name_);
+    tfm::format(stream, "%s = ctx->%s;", local_name, *base_register_name_);
+    // Put a compiler barrier to prevent optimization
+    // like llvm SimplifyCFG SinkThenElseCodeToEnd
+    // Volatile marking is not sufficient to prevent such optimization.
+    tfm::format(stream, " %s", COMPILER_BARRIER);
     return true;
   }
 
   if (deref_offset_ && !deref_ident_) {
-    tfm::format(stream, "{ u64 __addr = (*(volatile u64 *)&ctx->%s) + (%d)",
+    tfm::format(stream, "{ u64 __addr = ctx->%s + %d",
                 *base_register_name_, *deref_offset_);
     if (index_register_name_) {
       int scale = scale_.value_or(1);
-      tfm::format(stream, " + ((*(volatile u64 *)&ctx->%s) * %d);", *index_register_name_, scale);
+      tfm::format(stream, " + (ctx->%s * %d);", *index_register_name_, scale);
     } else {
       tfm::format(stream, ";");
     }
+    // Theoretically, llvm SimplifyCFG SinkThenElseCodeToEnd may still
+    // sink bpf_probe_read call, so put a barrier here to prevent sinking
+    // of ctx->#fields.
+    tfm::format(stream, " %s ", COMPILER_BARRIER);
     tfm::format(stream,
                 "%s __res = 0x0; "
                 "bpf_probe_read(&__res, sizeof(__res), (void *)__addr); "