drm/i915: Apply WAC6entrylatency to kbl/cfl
authorVille Syrjälä <ville.syrjala@linux.intel.com>
Thu, 16 Jul 2020 19:04:26 +0000 (22:04 +0300)
committerVille Syrjälä <ville.syrjala@linux.intel.com>
Fri, 16 Oct 2020 16:44:45 +0000 (19:44 +0300)
commit4d6bde58a026c5cac263081c572aebe89bf32167
treee88edb98b66895ff1d33517abdb1f4fc32d61113
parent693260cf23f95c579558a98af70eb66a8c46b394
drm/i915: Apply WAC6entrylatency to kbl/cfl

WAC6entrylatency is trying to fix excessive rc6 entry latency caused
by the extra delay from FBC_LLC_READ_CTRL, which is there for some
extra sync with uncore for frame buffer caching in LLC.

Reading through the hsd the recommendation was to set the FBC_LLC_FULLY_OPEN
bit to disable this extra delay entirely. This can be done whenever fb LLC
caching is not used. The alternative suggestion was to reduce the delay to
eg. 0x5 via updated BIOS programming instructions. But all the kbl/cfl
machines I've seen still have the default 0xff programmed. As we never use
fb LLC caching let's just apply the w/a to all skl derivatives to get
consistent rc6 latencies.

I was able to measure the effect of FBC_LLC_READ_CTRL to rc6 latency
via forcewake. Here's a graph of some of the results:

             sleep;fw_req=1;wait fw_ack==1;sleep;fw_req=0;wait fw_ack==0
 fw_ack==1 duration
    160us +----------------------------------------------------------------+
          |          +          +        $$+         +          +          |
          |  $$           $    $   ******$$ **   $ $**$*  #########$$######|
    140us |-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$*$$$$$$$$$$$$$$$$ $$$$$$|
          | $                     *                       #                |
          | $                     *                       #                |
    120us |$+                     *                       #              +-|
          |$                      *                       #                |
          |$                      *                  #   #                 |
    100us |$+         ************########################               +-|
          |$          *          *#                                        |
          |$      *****   #########                                        |
     80us |$+     *    # ####   ##                                       +-|
          |$   **** ### # #                                                |
          |  ** ####                     FBC_LLC_READ_CTRL: 0x8000 ******* |
     60us |-######                       FBC_LLC_READ_CTRL: 0xffff #######-|
          |##        +          +    FBC_LLC_READ_CTRL: 0x400000ff $$$$$$$ |
          +----------------------------------------------------------------+
         0ms       10ms       20ms       30ms      40ms       50ms       60ms
                                   sleep duration

The default FBC_LLC_READ_CTRL value of 0xff is documented to give us
a 170usec delay. That tracks well with the knees at 0xffff->~44msec and
0x8000->~22msec we see in the graph.

We can see that if we sleep longer than the FBC_LLC_READ_CTRL delay
we always observe the full (~145usec) rc6 wakeup latency. But if we sleep
for less than the FBC_LLC_READ_CTRL delay we see a quicker fw wakeup,
presumably due the hardware not having yet entered rc6 fully.
The other plateaus in the graph I suspect correspond to some shallower
internal rc states.

v2: s/usec/msec/ typo in commit msg

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200716190426.17047-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
drivers/gpu/drm/i915/intel_pm.c