docs/ci/index.rst

   1 Continuous Integration
   2 ======================
   3
   4 GitLab CI
   5 ---------
   6
   7 GitLab provides a convenient framework for running commands in response to Git pushes.
   8 We use it to test merge requests (MRs) before merging them (pre-merge testing),
   9 as well as post-merge testing, for everything that hits ``main``
  10 (this is necessary because we still allow commits to be pushed outside of MRs,
  11 and even then the MR CI runs in the forked repository, which might have been
  12 modified and thus is unreliable).
  13
  14 The CI runs a number of tests, from trivial build-testing to complex GPU rendering:
  15
  16 - Build testing for a number of configurations and platforms
  17 - Sanity checks (``meson test``)
  18 - Most drivers are also tested using several test suites, such as the
  19   `Vulkan/GL/GLES conformance test suite <https://github.com/KhronosGroup/VK-GL-CTS>`__,
  20   `Piglit <https://gitlab.freedesktop.org/mesa/piglit>`__, and others.
  21 - Replay of application traces
  22
  23 A typical run takes between 20 and 30 minutes, although it can go up very quickly
  24 if the GitLab runners are overwhelmed, which happens sometimes. When it does happen,
  25 not much can be done besides waiting it out, or cancel it.
  26 You can do your part by only running the jobs you care about by using `our
  27 tool <#running-specific-ci-jobs>`__.
  28
  29 Due to limited resources, we currently do not run the CI automatically
  30 on every push; instead, we only run it automatically once the MR has
  31 been assigned to ``Marge``, our merge bot.
  32
  33 If you're interested in the details, the main configuration file is ``.gitlab-ci.yml``,
  34 and it references a number of other files in ``.gitlab-ci/``.
  35
  36 If the GitLab CI doesn't seem to be running on your fork (or MRs, as they run
  37 in the context of your fork), you should check the "Settings" of your fork.
  38 Under "CI / CD" → "General pipelines", make sure "Custom CI config path" is
  39 empty (or set to the default ``.gitlab-ci.yml``), and that the
  40 "Public pipelines" box is checked.
  41
  42 If you're having issues with the GitLab CI, your best bet is to ask
  43 about it on ``#freedesktop`` on OFTC and tag `Daniel Stone
  44 <https://gitlab.freedesktop.org/daniels>`__ (``daniels`` on IRC) or
  45 `Emma Anholt <https://gitlab.freedesktop.org/anholt>`__ (``anholt`` on
  46 IRC).
  47
  48 The three GitLab CI systems currently integrated are:
  49
  50
  51 .. toctree::
  52    :maxdepth: 1
  53
  54    bare-metal
  55    LAVA
  56    docker
  57
  58 Farm management
  59 ---------------
  60
  61 .. note::
  62    Never mix disabling/re-enabling a farm with any change that can affect a job
  63    that runs in another farm!
  64
  65 When the farm starts failing for any reason (power, network, out-of-space), it needs to be disabled by pushing separate MR with
  66
  67 .. code-block:: console
  68
  69    git mv .ci-farms{,-disabled}/$farm_name
  70
  71 After farm restore functionality can be enabled by pushing a new merge request, which contains
  72
  73 .. code-block:: console
  74
  75    git mv .ci-farms{-disabled,}/$farm_name
  76
  77 .. warning::
  78    Pushing (``git push``) directly to ``main`` is forbidden; this change must
  79    be sent as a :ref:`Merge Request <merging>`.
  80
  81 Application traces replay
  82 -------------------------
  83
  84 The CI replays application traces with various drivers in two different jobs. The first
  85 job replays traces listed in ``src/<driver>/ci/traces-<driver>.yml`` files and if any
  86 of those traces fail the pipeline fails as well. The second job replays traces listed in
  87 ``src/<driver>/ci/restricted-traces-<driver>.yml`` and it is allowed to fail. This second
  88 job is only created when the pipeline is triggered by ``marge-bot`` or any other user that
  89 has been granted access to these traces.
  90
  91 A traces YAML file also includes a ``download-url`` pointing to a MinIO
  92 instance where to download the traces from. While the first job should always work with
  93 publicly accessible traces, the second job could point to an URL with restricted access.
  94
  95 Restricted traces are those that have been made available to Mesa developers without a
  96 license to redistribute at will, and thus should not be exposed to the public. Failing to
  97 access that URL would not prevent the pipeline to pass, therefore forks made by
  98 contributors without permissions to download non-redistributable traces can be merged
  99 without friction.
 100
 101 As an aside, only maintainers of such non-redistributable traces are responsible for
 102 ensuring that replays are successful, since other contributors would not be able to
 103 download and test them by themselves.
 104
 105 Those Mesa contributors that believe they could have permission to access such
 106 non-redistributable traces can request permission to Daniel Stone <daniels@collabora.com>.
 107
 108 gitlab.freedesktop.org accounts that are to be granted access to these traces will be
 109 added to the OPA policy for the MinIO repository as per
 110 https://gitlab.freedesktop.org/freedesktop/helm-gitlab-infra/-/commit/a3cd632743019f68ac8a829267deb262d9670958 .
 111
 112 So the jobs are created in personal repositories, the name of the user's account needs
 113 to be added to the rules attribute of the GitLab CI job that accesses the restricted
 114 accounts.
 115
 116 .. toctree::
 117    :maxdepth: 1
 118
 119    local-traces
 120
 121 Intel CI
 122 --------
 123
 124 The Intel CI is not yet integrated into the GitLab CI.
 125 For now, special access must be manually given (file a issue in
 126 `the Intel CI configuration repo <https://gitlab.freedesktop.org/Mesa_CI/mesa_jenkins>`__
 127 if you think you or Mesa would benefit from you having access to the Intel CI).
 128 Results can be seen on `mesa-ci.01.org <https://mesa-ci.01.org>`__
 129 if you are *not* an Intel employee, but if you are you
 130 can access a better interface on
 131 `mesa-ci-results.jf.intel.com <http://mesa-ci-results.jf.intel.com>`__.
 132
 133 The Intel CI runs a much larger array of tests, on a number of generations
 134 of Intel hardware and on multiple platforms (X11, Wayland, DRM & Android),
 135 with the purpose of detecting regressions.
 136 Tests include
 137 `Crucible <https://gitlab.freedesktop.org/mesa/crucible>`__,
 138 `VK-GL-CTS <https://github.com/KhronosGroup/VK-GL-CTS>`__,
 139 `dEQP <https://android.googlesource.com/platform/external/deqp>`__,
 140 `Piglit <https://gitlab.freedesktop.org/mesa/piglit>`__,
 141 `Skia <https://skia.googlesource.com/skia>`__,
 142 `VkRunner <https://github.com/Igalia/vkrunner>`__,
 143 `WebGL <https://github.com/KhronosGroup/WebGL>`__,
 144 and a few other tools.
 145 A typical run takes between 30 minutes and an hour.
 146
 147 If you're having issues with the Intel CI, your best bet is to ask about
 148 it on ``#dri-devel`` on OFTC and tag `Nico Cortes
 149 <https://gitlab.freedesktop.org/ngcortes>`__ (``ngcortes`` on IRC).
 150
 151 .. _CI-job-user-expectations:
 152
 153 CI job user expectations:
 154 -------------------------
 155
 156 To make sure that testing of one vendor's drivers doesn't block
 157 unrelated work by other vendors, we require that a given driver's test
 158 farm produces a spurious failure no more than once a week.  If every
 159 driver had CI and failed once a week, we would be seeing someone's
 160 code getting blocked on a spurious failure daily, which is an
 161 unacceptable cost to the project.
 162
 163 To ensure that, driver maintainers with CI enabled should watch the Flakes panel
 164 of the `CI flakes dashboard
 165 <https://ci-stats-grafana.freedesktop.org/d/Ae_TLIwVk/mesa-ci-quality-false-positives?orgId=1>`__,
 166 particularly the "Flake jobs" pane, to inspect jobs in their driver where the
 167 automatic retry of a failing job produced a success a second time.
 168 Additionally, most CI reports test-level flakes to an IRC channel, and flakes
 169 reported as NEW are not expected and could cause spurious failures in jobs.
 170 Please track the NEW reports in jobs and add them as appropriate to the
 171 ``-flakes.txt`` file for your driver.
 172
 173 Additionally, the test farm needs to be able to provide a short enough
 174 turnaround time that we can get our MRs through marge-bot without the pipeline
 175 backing up.  As a result, we require that the test farm be able to handle a
 176 whole pipeline's worth of jobs in less than 15 minutes (to compare, the build
 177 stage is about 10 minutes).  Given boot times and intermittent network delays,
 178 this generally means that the test runtime as reported by deqp-runner should be
 179 kept to 10 minutes.
 180
 181 If a test farm is short the HW to provide these guarantees, consider dropping
 182 tests to reduce runtime.  dEQP job logs print the slowest tests at the end of
 183 the run, and Piglit logs the runtime of tests in the results.json.bz2 in the
 184 artifacts.  Or, you can add the following to your job to only run some fraction
 185 (in this case, 1/10th) of the dEQP tests.
 186
 187 .. code-block:: yaml
 188
 189    variables:
 190       DEQP_FRACTION: 10
 191
 192 to just run 1/10th of the test list.
 193
 194 For Collabora's LAVA farm, the `device types
 195 <https://lava.collabora.dev/scheduler/device_types>`__ page can tell you how
 196 many boards of a specific tag are currently available by adding the "Idle" and
 197 "Busy" columns.  For bare-metal, a gitlab admin can look at the `runners
 198 <https://gitlab.freedesktop.org/admin/runners>`__ page.  A pipeline should
 199 probably not create more jobs for a board type than there are boards, unless you
 200 clearly have some short-runtime jobs.
 201
 202 If a HW CI farm goes offline (network dies and all CI pipelines end up
 203 stalled) or its runners are consistently spuriously failing (disk
 204 full?), and the maintainer is not immediately available to fix the
 205 issue, please push through an MR disabling that farm's jobs according
 206 to the `Farm Management <#farm-management>`__ instructions.
 207
 208 Personal runners
 209 ----------------
 210
 211 Mesa's CI is currently run primarily on packet.net's m1xlarge nodes
 212 (2.2Ghz Sandy Bridge), with each job getting 8 cores allocated.  You
 213 can speed up your personal CI builds (and marge-bot merges) by using a
 214 faster personal machine as a runner.  You can find the gitlab-runner
 215 package in Debian, or use GitLab's own builds.
 216
 217 To do so, follow `GitLab's instructions
 218 <https://docs.gitlab.com/ee/ci/runners/runners_scope.html#create-a-project-runner-with-a-runner-authentication-token>`__
 219 to register your personal GitLab runner in your Mesa fork.  Then, tell
 220 Mesa how many jobs it should serve (``concurrent=``) and how many
 221 cores those jobs should use (``FDO_CI_CONCURRENT=``) by editing these
 222 lines in ``/etc/gitlab-runner/config.toml``, for example:
 223
 224 .. code-block:: toml
 225
 226    concurrent = 2
 227
 228    [[runners]]
 229      environment = ["FDO_CI_CONCURRENT=16"]
 230
 231
 232 Docker caching
 233 --------------
 234
 235 The CI system uses Docker images extensively to cache
 236 infrequently-updated build content like the CTS.  The `freedesktop.org
 237 CI templates
 238 <https://gitlab.freedesktop.org/freedesktop/ci-templates/>`__ help us
 239 manage the building of the images to reduce how frequently rebuilds
 240 happen, and trim down the images (stripping out manpages, cleaning the
 241 apt cache, and other such common pitfalls of building Docker images).
 242
 243 When running a container job, the templates will look for an existing
 244 build of that image in the container registry under
 245 ``MESA_IMAGE_TAG``.  If it's found it will be reused, and if
 246 not, the associated ``.gitlab-ci/containers/<jobname>.sh`` will be run
 247 to build it.  So, when developing any change to container build
 248 scripts, you need to update the associated ``MESA_IMAGE_TAG`` to
 249 a new unique string.  We recommend using the current date plus some
 250 string related to your branch (so that if you rebase on someone else's
 251 container update from the same day, you will get a Git conflict
 252 instead of silently reusing their container)
 253
 254 When developing a given change to your Docker image, you would have to
 255 bump the tag on each ``git commit --amend`` to your development
 256 branch, which can get tedious.  Instead, you can navigate to the
 257 `container registry
 258 <https://gitlab.freedesktop.org/mesa/mesa/container_registry>`__ for
 259 your repository and delete the tag to force a rebuild.  When your code
 260 is eventually merged to main, a full image rebuild will occur again
 261 (forks inherit images from the main repo, but MRs don't propagate
 262 images from the fork into the main repo's registry).
 263
 264 Building locally using CI docker images
 265 ---------------------------------------
 266
 267 It can be frustrating to debug build failures on an environment you
 268 don't personally have.  If you're experiencing this with the CI
 269 builds, you can use Docker to use their build environment locally.  Go
 270 to your job log, and at the top you'll see a line like::
 271
 272    Pulling docker image registry.freedesktop.org/anholt/mesa/debian/android_build:2020-09-11
 273
 274 We'll use a volume mount to make our current Mesa tree be what the
 275 Docker container uses, so they'll share everything (their build will
 276 go in _build, according to ``meson-build.sh``).  We're going to be
 277 using the image non-interactively so we use ``run --rm $IMAGE
 278 command`` instead of ``run -it $IMAGE bash`` (which you may also find
 279 useful for debug).  Extract your build setup variables from
 280 .gitlab-ci.yml and run the CI meson build script:
 281
 282 .. code-block:: console
 283
 284    IMAGE=registry.freedesktop.org/anholt/mesa/debian/android_build:2020-09-11
 285    sudo docker pull $IMAGE
 286    sudo docker run --rm -v `pwd`:/mesa -w /mesa $IMAGE env PKG_CONFIG_PATH=/usr/local/lib/aarch64-linux-android/pkgconfig/:/android-ndk-r21d/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android/pkgconfig/ GALLIUM_DRIVERS=freedreno UNWIND=disabled EXTRA_OPTION="-D android-stub=true -D llvm=disabled" DRI_LOADERS="-D glx=disabled -D gbm=disabled -D egl=enabled -D platforms=android" CROSS=aarch64-linux-android ./.gitlab-ci/meson-build.sh
 287
 288 All you have left over from the build is its output, and a _build
 289 directory.  You can hack on mesa and iterate testing the build with:
 290
 291 .. code-block:: console
 292
 293    sudo docker run --rm -v `pwd`:/mesa $IMAGE meson compile -C /mesa/_build
 294
 295 Running specific CI jobs
 296 ------------------------
 297
 298 You can use ``bin/ci/ci_run_n_monitor.py`` to run specific CI jobs. It
 299 will automatically take care of running all the jobs yours depends on,
 300 and cancel the rest to avoid wasting resources.
 301
 302 See ``bin/ci/ci_run_n_monitor.py --help`` for all the options.
 303
 304 The ``--target`` argument takes a regex that you can use to select the
 305 jobs names you want to run, eg. ``--target 'zink.*'`` will run all the
 306 zink jobs, leaving the other drivers' jobs free for others to use.
 307
 308 Conformance Tests
 309 -----------------
 310
 311 Some conformance tests require a special treatment to be maintained on GitLab CI.
 312 This section lists their documentation pages.
 313
 314 .. toctree::
 315   :maxdepth: 1
 316
 317   skqp
 318
 319
 320 Updating GitLab CI Linux Kernel
 321 -------------------------------
 322
 323 GitLab CI usually runs a bleeding-edge kernel. The following documentation has
 324 instructions on how to uprev Linux Kernel in the GitLab CI ecosystem.
 325
 326 .. toctree::
 327   :maxdepth: 1
 328
 329   kernel
 330
 331
 332 Reusing CI scripts for other projects
 333 --------------------------------------
 334
 335 The CI scripts in ``.gitlab-ci/`` can be reused for other projects, to
 336 facilitate reuse of the infrastructure, our scripts can be used as tools
 337 to create containers and run tests on the available farms.
 338
 339 .. envvar:: EXTRA_LOCAL_PACKAGES
 340
 341    Define extra Debian packages to be installed in the container.