mojo/docs/mojolpm.md

   1 # Getting started with MojoLPM
   2
   3 *** note
   4 **Note:** Using MojoLPM to fuzz your Mojo interfaces is intended to be simple,
   5 but there are edge-cases that may require a very detailed understanding of the
   6 Mojo implementation to fix. If you run into problems that you can't understand
   7 readily, send an email to [markbrand@google.com] and cc `fuzzing@chromium.org`
   8 and we'll try and help.
   9
  10 **Prerequisites:** Knowledge of [libfuzzer] and basic understanding
  11 of [Protocol Buffers] and [libprotobuf-mutator]. Basic understanding of
  12 [testing in Chromium].
  13 ***
  14
  15 This document will walk you through:
  16 * An overview of MojoLPM and what it's used for.
  17 * Adding a fuzzer to an existing Mojo interface using MojoLPM.
  18
  19 [TOC]
  20
  21 ## Overview of MojoLPM
  22
  23 MojoLPM is a toolchain for automatically generating structure-aware fuzzers for
  24 Mojo interfaces using libprotobuf-mutator as the fuzzing engine.
  25
  26 This tool works by using the existing "grammar" for the interface provided by
  27 the .mojom files, and translating that into a Protocol Buffer format that can be
  28 fuzzed by libprotobuf-mutator. These protocol buffers are then interpreted by
  29 a generated runtime as a sequence of mojo method calls on the targeted
  30 interface.
  31
  32 The intention is that using these should be as simple as plugging the generated
  33 code in to the existing unittests for those interfaces - so if you've already
  34 implemented the necessary mocks to unittest your code, the majority of the work
  35 needed to get quite effective fuzzing of your interfaces is already complete!
  36
  37 ## Choose the Mojo interface(s) to fuzz
  38
  39 If you're a developer looking to add fuzzing support for an interface that
  40 you're developing, then this should be very easy for you!
  41
  42 If not, then a good starting point is to search for [interfaces] in codesearch.
  43 The most interesting interfaces from a security perspective are those which are
  44 implemented in the browser process and exposed to the renderer process, but
  45 there isn't a very simple way to enumerate these, so you may need to look
  46 through some of the source code to find an interesting one.
  47
  48 For the rest of this guide, we'll write a new fuzzer for
  49 `blink.mojom.CodeCacheHost`, which is defined in
  50 `third_party/blink/public/mojom/loader/code_cache.mojom`.
  51
  52 We then need to find the relevant GN build target for this mojo interface so
  53 that we know how to refer to it later - in this case that is
  54 `//third_party/blink/public/mojom:mojom_platform`.
  55
  56 ## Find the implementations of the interfaces
  57
  58 If you are developing these interfaces, then you already know where to find the
  59 implementations.
  60
  61 Otherwise a good starting point is to search for references to
  62 "public blink::mojom::CodeCacheHost". Usually there is only a single
  63 implementation of a given Mojo interface (there are a few exceptions where the
  64 interface abstracts platform specific details, but this is less common). This
  65 leads us to `content/browser/renderer_host/code_cache_host_impl.h` and
  66 `CodeCacheHostImpl`.
  67
  68 ## Find the unittest for the implementation
  69
  70 Unfortunately, it doesn't look like `CodeCacheHostImpl` has a unittest, so we'll
  71 have to go through the process of understanding how to create a valid instance
  72 ourselves in order to fuzz this interface.
  73
  74 Since this interface runs in the Browser process, and is part of `/content`,
  75 we're going to create our new fuzzer in `/content/test/fuzzer`.
  76
  77 ## Add our testcase proto
  78
  79 First we'll add a proto source file, `code_cache_host_mojolpm_fuzzer.proto`,
  80 which is going to define the structure of our testcases. This is basically
  81 boilerplate, but it allows creating fuzzers which interact with multiple Mojo
  82 interfaces to uncover more complex issues. For our case, this will be a simple
  83 file:
  84
  85 ```
  86 syntax = "proto2";
  87
  88 package content.fuzzing.code_cache_host.proto;
  89
  90 import "third_party/blink/public/mojom/loader/code_cache.mojom.mojolpm.proto";
  91
  92 message NewCodeCacheHost {
  93   required uint32 id = 1;
  94 }
  95
  96 message RunUntilIdle {
  97   enum ThreadId {
  98     IO = 0;
  99     UI = 1;
 100   }
 101
 102   required ThreadId id = 1;
 103 }
 104
 105 message Action {
 106   oneof action {
 107     NewCodeCacheHost new_code_cache_host = 1;
 108     RunUntilIdle run_until_idle = 2;
 109     mojolpm.blink.mojom.CodeCacheHost.RemoteMethodCall code_cache_host_call = 3;
 110   }
 111 }
 112
 113 message Sequence {
 114   repeated uint32 action_indexes = 1 [packed=true];
 115 }
 116
 117 message Testcase {
 118   repeated Action actions = 1;
 119   repeated Sequence sequences = 2;
 120   repeated uint32 sequence_indexes = 3 [packed=true];
 121 }
 122 ```
 123
 124 This specifies all of the actions that the fuzzer will be able to take - it
 125 will be able to create a new `CodeCacheHost` instance, perform sequences of
 126 interface calls on those instances, and wait for various threads to be idle.
 127
 128 In order to build this proto file, we'll need to copy it into the out/ directory
 129 so that it can reference the proto files generated by MojoLPM - this will be
 130 handled for us by the `mojolpm_fuzzer_test` build rule.
 131
 132 ## Add our fuzzer source
 133
 134 Now we're ready to create the fuzzer c++ source file,
 135 `code_cache_host_mojolpm_fuzzer.cc` and the fuzzer build target. This
 136 target is going to depend on both our proto file, and on the c++ source file.
 137 Most of the necessary dependencies will be handled for us, but we do still need
 138 to add some directly.
 139
 140 Note especially the dependency on `mojom_platform_mojolpm` in blink, this is an
 141 autogenerated target where the target containing the generated fuzzer protocol
 142 buffer descriptions will be the name of the mojom target with `_mojolpm`
 143 appended.
 144
 145 ```
 146 mojolpm_fuzzer_test("code_cache_host_mojolpm_fuzzer") {
 147   sources = [
 148     "code_cache_host_mojolpm_fuzzer.cc"
 149   ]
 150
 151   proto_source = "code_cache_host_mojolpm_fuzzer.proto"
 152
 153    deps = [
 154     "//base/test:test_support",
 155     "//content/browser:for_content_tests",
 156     "//content/public/browser:browser_sources",
 157     "//content/test:test_support",
 158     "//services/network:test_support",
 159     "//storage/browser:test_support",
 160   ]
 161
 162   proto_deps = [
 163     "//third_party/blink/public/mojom:mojom_platform_mojolpm",
 164   ]
 165 }
 166 ```
 167
 168 Now, the minimal source code to do load our testcases:
 169
 170 ```c++
 171 // Copyright 2020 The Chromium Authors. All rights reserved.
 172 // Use of this source code is governed by a BSD-style license that can be
 173 // found in the LICENSE file.
 174
 175 #include <stdint.h>
 176 #include <utility>
 177
 178 #include "code_cache_host_mojolpm_fuzzer.pb.h"
 179 #include "mojo/core/embedder/embedder.h"
 180 #include "third_party/blink/public/mojom/loader/code_cache.mojom-mojolpm.h"
 181 #include "third_party/libprotobuf-mutator/src/src/libfuzzer/libfuzzer_macro.h"
 182
 183 DEFINE_BINARY_PROTO_FUZZER(
 184     const content::fuzzing::code_cache_host::proto::Testcase& testcase) {
 185 }
 186 ```
 187
 188 You should now be able to build and run this fuzzer (it, of course, won't do
 189 very much) to check that everything is lined up right so far.
 190
 191 ## Handle global process setup
 192
 193 Now we need to add some basic setup code so that our process has something that
 194 mostly resembles a normal Browser process; if you look in the file this is
 195 `CodeCacheHostFuzzerEnvironment`, which adds a global environment instance that
 196 will handle setting up this basic environment, which will be reused for all of
 197 our testcases, since starting threads is expensive and slow.
 198
 199 ## Handle per-testcase setup
 200
 201 We next need to handle the necessary setup to instantiate `CodeCacheHostImpl`,
 202 so that we can actually run the testcases. At this point, we realise that it's
 203 likely that we want to be able to have multiple `CodeCacheHostImpl`'s with
 204 different render_process_ids and different backing origins, so we need to modify
 205 our proto file to reflect this:
 206
 207 ```
 208 message NewCodeCacheHost {
 209   enum OriginId {
 210     ORIGIN_A = 0;
 211     ORIGIN_B = 1;
 212     ORIGIN_OPAQUE = 2;
 213     ORIGIN_EMPTY = 3;
 214   }
 215
 216   required uint32 id = 1;
 217   required uint32 render_process_id = 2;
 218   required OriginId origin_id = 3;
 219 }
 220 ```
 221
 222 Note that we're using an enum to represent the origin, rather than a string;
 223 it's unlikely that the true value of the origin is going to be important, so
 224 we've instead chosen a few select values based on the cases mentioned in the
 225 source.
 226
 227 The first thing that we need to do is set-up the basic Browser process
 228 environment; this is what `ContentFuzzerEnvironment` is doing - this has a basic
 229 setup suitable for fuzzing interfaces in `/content`. A few things to be careful
 230 of are that we need to make sure that `mojo::core::Init()` is called (only once)
 231 and we probably want as much freedom as possible in terms of scheduling, so we
 232 want to use slightly different threading options than the average unittest. This
 233 is a singleton type that will live for the entire duration of the fuzzer process
 234 so we don't want to be holding any testcase-specific data here.
 235
 236 The next thing that we need to do is to figure out the basic setup needed to
 237 instantiate the interface we're interested in. Looking at the constructor for
 238 `CodeCacheHostImpl` we need three things; a valid `render_process_id`, an
 239 instance of `CacheStorageContextImpl` and an instance of
 240 `GeneratedCodeCacheContext`. `CodeCacheHostFuzzerContext` is our container for
 241 these per-testcase instances; and will handle creating and binding the instances
 242 of the Mojo interfaces that we're going to fuzz. The most important thing to be
 243 careful of here is that everything happens on the correct thread/sequence. Many
 244 Browser-process objects have specific expectations, and will end up with very
 245 different behaviour if they are created or used from the wrong context.
 246
 247 ## Integrate with the generated MojoLPM fuzzer code
 248
 249 Finally, we need to do a little bit more plumbing, to rig up this infrastructure
 250 that we've built together with the autogenerated code that MojoLPM gives us to
 251 interpret and run our testcases. This is the `CodeCacheHostTestcase`, and the
 252 part where the magic happens is here:
 253
 254 ```c++
 255 void CodeCacheHostTestcase::NextAction() {
 256   if (next_idx_ < testcase_.sequence_indexes_size()) {
 257     auto sequence_idx = testcase_.sequence_indexes(next_idx_++);
 258     const auto& sequence =
 259       testcase_.sequences(sequence_idx % testcase_.sequences_size());
 260     for (auto action_idx : sequence.action_indexes()) {
 261       if (!testcase_.actions_size() || ++action_count_ > MAX_ACTION_COUNT) {
 262         return;
 263       }
 264       const auto& action =
 265         testcase_.actions(action_idx % testcase_.actions_size());
 266       switch (action.action_case()) {
 267         case content::fuzzing::code_cache_host::proto::Action::kNewCodeCacheHost: {
 268           cch_context_.AddCodeCacheHost(
 269             action.new_code_cache_host().id(),
 270             action.new_code_cache_host().render_process_id(),
 271             action.new_code_cache_host().origin_id());
 272         } break;
 273
 274         case content::fuzzing::code_cache_host::proto::Action::kRunUntilIdle: {
 275           if (action.run_until_idle().id()) {
 276             content::RunUIThreadUntilIdle();
 277           } else {
 278             content::RunIOThreadUntilIdle();
 279           }
 280         } break;
 281
 282         case content::fuzzing::code_cache_host::proto::Action::kCodeCacheHostCall: {
 283           mojolpm::HandleRemoteMethodCall(action.code_cache_host_call());
 284         } break;
 285
 286         case content::fuzzing::code_cache_host::proto::Action::ACTION_NOT_SET:
 287           break;
 288       }
 289     }
 290   }
 291 }
 292 ```
 293
 294 The key line here in integration with MojoLPM is the last case,
 295 `kCodeCacheHostCall`, where we're asking MojoLPM to treat this incoming proto
 296 entry as a call to a method on the `CodeCacheHost` interface.
 297
 298 There's just a little bit more boilerplate in the bottom of the file to tidy up
 299 concurrency loose ends, making sure that the fuzzer components are all running
 300 on the correct threads; those are more-or-less common to any fuzzer using
 301 MojoLPM.
 302
 303 ## Test it!
 304
 305 Make a corpus directory and fire up your shiny new fuzzer!
 306
 307 ```
 308  ~/chromium/src% out/Default/code_cache_host_mojolpm_fuzzer /dev/shm/corpus
 309 INFO: Seed: 3273881842
 310 INFO: Loaded 1 modules   (1121912 inline 8-bit counters): 1121912 [0x559151a1aea8, 0x559151b2cd20),
 311 INFO: Loaded 1 PC tables (1121912 PCs): 1121912 [0x559151b2cd20,0x559152c4b4a0),
 312 INFO:      146 files found in /dev/shm/corpus
 313 INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
 314 INFO: seed corpus: files: 146 min: 2b max: 268b total: 8548b rss: 88Mb
 315 #147  INITED cov: 4633 ft: 10500 corp: 138/8041b exec/s: 0 rss: 91Mb
 316 #152  NEW    cov: 4633 ft: 10501 corp: 139/8139b lim: 4096 exec/s: 0 rss: 91Mb L: 98/268 MS: 8 Custom-ChangeByte-Custom-EraseBytes-Custom-ShuffleBytes-Custom-Custom-
 317 #154  NEW    cov: 4634 ft: 10510 corp: 140/8262b lim: 4096 exec/s: 0 rss: 91Mb L: 123/268 MS: 3 CustomCrossOver-ChangeBit-Custom-
 318 #157  NEW    cov: 4634 ft: 10512 corp: 141/8384b lim: 4096 exec/s: 0 rss: 91Mb L: 122/268 MS: 3 CustomCrossOver-Custom-CustomCrossOver-
 319 #158  NEW    cov: 4634 ft: 10514 corp: 142/8498b lim: 4096 exec/s: 0 rss: 91Mb L: 114/268 MS: 1 CustomCrossOver-
 320 #159  NEW    cov: 4634 ft: 10517 corp: 143/8601b lim: 4096 exec/s: 0 rss: 91Mb L: 103/268 MS: 1 Custom-
 321 #160  NEW    cov: 4634 ft: 10526 corp: 144/8633b lim: 4096 exec/s: 0 rss: 91Mb L: 32/268 MS: 1 Custom-
 322 #164  NEW    cov: 4634 ft: 10528 corp: 145/8851b lim: 4096 exec/s: 0 rss: 91Mb L: 218/268 MS: 4 CustomCrossOver-Custom-CustomCrossOver-Custom-
 323 ```
 324
 325 ## Wait for it...
 326
 327 Let the fuzzer run for a while, and keep periodically checking in in case it's
 328 fallen over. It's likely you'll have made a few mistakes somewhere along the way
 329 but hopefully soon you'll have the fuzzer running 'clean' for a few hours.
 330
 331 If your coverage isn't going up at all, then you've probably made a mistake and
 332 it likely isn't managing to actually interact with the interface you're trying
 333 to fuzz - try using the code coverage output from the next step to debug what's
 334 going wrong.
 335
 336 ## (Optional) Run coverage
 337
 338 In many cases it's useful to check the code coverage to see if we can benefit
 339 from adding some manual testcases to get deeper coverage. For this example I
 340 used the following command:
 341
 342 ```
 343 python tools/code_coverage/coverage.py code_cache_host_mojolpm_fuzzer -b out/Coverage -o ManualReport -c "out/Coverage/code_cache_host_mojolpm_fuzzer -ignore_timeouts=1 -timeout=4 -runs=0 /dev/shm/corpus" -f content
 344 ```
 345
 346 With the CodeCacheHost, looking at the coverage after a few hours we could see
 347 that there's definitely some room for improvement:
 348
 349 ```c++
 350 /* 55       */ base::Optional<GURL> GetSecondaryKeyForCodeCache(const GURL& resource_url,
 351 /* 56 53.6k */ int render_process_id) {
 352 /* 57 53.6k */    if (!resource_url.is_valid() || !resource_url.SchemeIsHTTPOrHTTPS())
 353 /* 58 53.6k */      return base::nullopt;
 354 /* 59 0     */
 355 /* 60 0     */    GURL origin_lock =
 356 /* 61 0     */        ChildProcessSecurityPolicyImpl::GetInstance()->GetOriginLock(
 357 /* 62 0     */            render_process_id);
 358 ```
 359
 360 ## (Optional) Improve corpus manually
 361
 362 It's fairly easy to improve the corpus manually, since our corpus files are just
 363 protobuf files that describe the sequence of interface calls to make.
 364
 365 There are a couple of approaches that we can take here - we'll try building a
 366 small manual seed corpus that we'll use to kick-start our fuzzer. Since it's
 367 easier to edit text protos, MojoLPM can automatically convert our seed corpus
 368 from text protos to binary protos during the build, making this slightly less
 369 painful for us, and letting us store our corpus in-tree in a readable format.
 370
 371 So, we'll create a new folder to hold this seed corpus, and craft our first
 372 file:
 373
 374 ```
 375 actions {
 376   new_code_cache_host {
 377     id: 1
 378     render_process_id: 0
 379     origin_id: ORIGIN_A
 380   }
 381 }
 382 actions {
 383   code_cache_host_call {
 384     remote {
 385       id: 1
 386     }
 387     m_did_generate_cacheable_metadata {
 388       m_cache_type: CodeCacheType_kJavascript
 389       m_url {
 390         new {
 391           id: 1
 392           m_url: "http://aaa.com/test"
 393         }
 394       }
 395       m_data {
 396         new {
 397           id: 1
 398           m_bytes {
 399           }
 400         }
 401       m_expected_response_time {
 402       }
 403     }
 404   }
 405 }
 406 sequences {
 407   action_indexes: 0
 408   action_indexes: 1
 409 }
 410 sequence_indexes: 0
 411 ```
 412
 413 We can then add some new entries to our build target to have the corpus
 414 converted to binary proto directly during build.
 415
 416 ```
 417   testcase_proto_kind = "content.fuzzing.code_cache_host.proto.Testcase"
 418
 419   seed_corpus_sources = [
 420     "code_cache_host_mojolpm_fuzzer_corpus/did_generate_cacheable_metadata.textproto",
 421   ]
 422 ```
 423
 424 If we now run a new coverage report using this single file seed corpus:
 425 (note that the binary corpus files will be output in your output directory, in
 426 this case code_cache_host_mojolpm_fuzzer_seed_corpus.zip):
 427
 428 ```
 429 autoninja -C out/Coverage chrome
 430 rm -rf /tmp/corpus; mkdir /tmp/corpus; unzip out/Coverage/code_cache_host_mojolpm_fuzzer_seed_corpus.zip -d /tmp/corpus
 431 python tools/code_coverage/coverage.py code_cache_host_mojolpm_fuzzer -b out/Coverage -o ManualReport -c "out/Coverage/code_cache_host_mojolpm_fuzzer -ignore_timeouts=1 -timeout=4 -runs=0 /tmp/corpus" -f content
 432 ```
 433
 434 We can see that we're now getting some more coverage:
 435
 436 ```c++
 437 /* 118   */ void CodeCacheHostImpl::DidGenerateCacheableMetadata(
 438 /* 119   */     blink::mojom::CodeCacheType cache_type,
 439 /* 120   */     const GURL& url,
 440 /* 121   */     base::Time expected_response_time,
 441 /* 122 2 */       mojo_base::BigBuffer data) {
 442 /* 123 2 */     if (!url.SchemeIsHTTPOrHTTPS()) {
 443 /* 124 0 */       mojo::ReportBadMessage("Invalid URL scheme for code cache.");
 444 /* 125 0 */       return;
 445 /* 126 0 */     }
 446 /* 127 2 */
 447 /* 128 2 */     DCHECK_CURRENTLY_ON(BrowserThread::UI);
 448 /* 129 2 */
 449 /* 130 2 */     GeneratedCodeCache* code_cache = GetCodeCache(cache_type);
 450 /* 131 2 */     if (!code_cache)
 451 /* 132 0 */       return;
 452 /* 133 2 */
 453 /* 134 2 */     base::Optional<GURL> origin_lock =
 454 /* 135 2 */         GetSecondaryKeyForCodeCache(url, render_process_id_);
 455 /* 136 2 */     if (!origin_lock)
 456 /* 137 0 */       return;
 457 /* 138 2 */
 458 /* 139 2 */     code_cache->WriteEntry(url, *origin_lock, expected_response_time,
 459 /* 140 2 */                            std::move(data));
 460 /* 141 2 */ }
 461 ```
 462
 463 Much better!
 464
 465 [markbrand@google.com]: mailto:markbrand@google.com?subject=[MojoLPM%20Help]:%20&cc=fuzzing@chromium.org
 466 [libfuzzer]: https://source.chromium.org/chromium/chromium/src/+/master:testing/libfuzzer/getting_started.md
 467 [Protocol Buffers]: https://developers.google.com/protocol-buffers/docs/cpptutorial
 468 [libprotobuf-mutator]: https://source.chromium.org/chromium/chromium/src/+/master:testing/libfuzzer/libprotobuf-mutator.md
 469 [testing in Chromium]: https://source.chromium.org/chromium/chromium/src/+/master:docs/testing/testing_in_chromium.md
 470 [interfaces]: https://source.chromium.org/search?q=interface%5Cs%2B%5Cw%2B%5Cs%2B%7B%20f:%5C.mojom$%20-f:test
 471