1 \documentclass{article}
4 \author{Kristian Høgsberg\\
5 \texttt{krh@bitplanet.net}
8 \title{The Wayland Display Server}
14 \section{Wayland Overview}
17 \item wayland is a protocol for a new display server.
18 \item wayland is an implementation
21 \subsection{Replacing X11}
23 Over the last 10 years, a lot of functionality have slowly moved out
24 of the X server and into libraries or kernel drivers. It started with
25 freetype and fontconfig providing an alternative to the core X fonts
26 and direct rendering OpenGL as a graphics driver in a client side
27 library. Then cairo came along and provided a modern 2D rendering
28 library independent of X and compositing managers took over control of
29 the rendering of the desktop. Recently with GEM and KMS in the Linux
30 kernel, we can do modesetting outside X and schedule several direct
31 rendering clients. The end result is a highly modular graphics stack.
33 Wayland is a new display server building on top of all those
34 components. We’re trying to distill out the functionality in the X
35 server that is still used by the modern Linux desktop. This turns out
36 to be not a whole lot. Applications can allocate their own off-screen
37 buffers and render their window contents by themselves. In the end,
38 what’s needed is a way to present the resulting window surface to a
39 compositor and a way to receive input. This is what Wayland provides,
40 by piecing together the components already in the eco-system in a
41 slightly different way.
43 X will always be relevant, in the same way Fortran compilers and VRML
44 browsers are, but it’s time that we think about moving it out of the
45 critical path and provide it as an optional component for legacy
49 \section{Wayland protocol}
51 \subsection{Basic Principles}
53 The wayland protocol is an asynchronous object oriented protocol. All
54 requests are method invocations on some object. The request include
55 an object id that uniquely identifies an object on the server. Each
56 object implements an interface and the requests include an opcode that
57 identifies which method in the interface to invoke.
59 The wire protocol is determined from the C prototypes of the requests
60 and events. There is a straight forward mapping from the C types to
61 packing the bytes in the request written to the socket. It is
62 possible to map the events and requests to function calls in other
63 languages, but that hasn't been done at this point.
65 The server sends back events to the client, each event is emitted from
66 an object. Events can be error conditions. The event includes the
67 object id and the event opcode, from which the client can determine
68 the type of event. Events are generated both in repsonse to a request
69 (in which case the request and the event constitutes a round trip) or
70 spontanously when the server state changes.
73 \item state is broadcast on connect, events sent out when state
74 change. client must listen for these changes and cache the state.
75 no need (or mechanism) to query server state.
77 \item server will broadcast presence of a number of global objects,
78 which in turn will broadcast their current state
81 \subsection{Connect Time}
84 \item no fixed format connect block, the server emits a bunch of
85 events at connect time
86 \item presence events for global objects: output, compositor, input
89 \subsection{Security and Authentication}
92 \item mostly about access to underlying buffers, need new drm auth
93 mechanism (the grant-to ioctl idea), need to check the cmd stream?
95 \item getting the server socket depends on the compositor type, could
96 be a system wide name, through fd passing on the session dbus. or
97 the client is forked by the compositor and the fd is already opened.
100 \subsection{Creating Objects}
103 \item client allocates object ID, uses range protocol
104 \item server tracks how many IDs are left in current range, sends new
105 range when client is about to run out.
108 \subsection{Compositor}
110 The compositor is a global object, advertised at connect time.
114 Interface \texttt{compositor} \\ \hline
116 \texttt{create\_surface(id)} \\
117 \texttt{commit()} \\ \hline
119 \texttt{device(device)} \\
120 \texttt{acknowledge(key, frame)} \\
121 \texttt{frame(frame, time)} \\ \hline
126 \item a global object
127 \item broadcasts drm file name, or at least a string like drm:/dev/card0
128 \item commit/ack/frame protocol
133 Created by the client.
137 Interface \texttt{surface} \\ \hline
139 \texttt{destroy()} \\
142 \texttt{damage()} \\ \hline
147 Needs a way to set input region, opaque region.
151 Represents a group of input devices, including mice, keyboards. Has a
152 keyboard and pointer focus. Global object. Pointer events are
153 delivered in both screen coordinates and surface local coordinates.
157 Interface \texttt{cache} \\ \hline
159 no requests \\ \hline
161 \texttt{motion(x, y, sx, sy)} \\
162 \texttt{button(button, state, x, y, sx, sy)} \\
163 \texttt{key(key, state)} \\
164 \texttt{pointer\_focus(surface)} \\
165 \texttt{keyboard\_focus(surface, keys)} \\ \hline
171 \item keyboard map, change events
173 \item multi pointer wayland
176 A surface can change the pointer image when the surface is the pointer
177 focus of the input device. Wayland doesn't automatically change the
178 pointer image when a pointer enters a surface, but expects the
179 application to set the cursor it wants in response the the motion
180 event. The rationale is that a client has to manage changing pointer
181 images for UI elements within the surface in response to motion events
182 anyway, so we'll make that the only mechanism for setting changing the
183 pointer image. If the server receives a request to set the pointer
184 image after the surface loses pointer focus, the request is ignored.
185 To the client this will look like it successfully set the pointer
188 The compositor will revert the pointer image back to a default image
189 when no surface has the pointer focus for that device. Clients can
190 revert the pointer image back to the default image by setting a NULL
193 What if the pointer moves from one window which has set a special
194 pointer image to a surface that doesn't set an image in response to
195 the motion event? The new surface will be stuck with the special
196 pointer image. We can't just revert the pointer image on leaving a
197 surface, since if we immediately enter a surface that sets a different
198 image, the image will flicker. Broken app, I suppose.
202 A output is a global object, advertised at connect time or as they
207 Interface \texttt{output} \\ \hline
209 no requests \\ \hline
211 \texttt{geometry(width, height)} \\ \hline
215 \item laid out in a big (compositor) coordinate system
216 \item basically xrandr over wayland
217 \item geometry needs position in compositor coordinate system\
218 \item events to advertise available modes, requests to move and change
222 \subsection{Shared object cache}
224 Cache for sharing glyphs, icons, cursors across clients. Lets clients
225 share identical objects. The cache is a global object, advertised at
230 Interface \texttt{cache} \\ \hline
232 \texttt{upload(key, visual, bo, stride, width, height)} \\ \hline
234 \texttt{item(key, bo, x, y, stride)} \\
235 \texttt{retire(bo)} \\ \hline
240 \item Upload by passing a visual, bo, stride, width, height to the
243 \item Upload returns a bo name, stride, and x, y location of object in
244 the buffer. Clients take a reference on the atlas bo.
246 \item Shared objects are refcounted, freed by client (when purging
247 glyphs from the local cache) or when a client exits.
249 \item Server can't delete individual items from an atlas, but it can
250 throw out an entire atlas bo if it becomes too sparse. The server
251 sends out an \texttt{retire} event when this happens, and clients
252 must throw away any objects from that bo and reupload. Between the
253 server dropping the atlas and the client receiving the retire event,
254 clients can still legally use the old atlas since they have a ref on
257 \item cairo needs to hook into the glyph cache, and maybe also a way
258 to create a read-only surface based on an object form the cache
261 \texttt{cairo\_wayland\_create\_cached\_surface(surface-data)}.
266 \subsection{Drag and Drop}
268 Multi-device aware. Orthogonal to rest of wayland, as it is its own
269 toplevel object. Since the compositor determines the drag target, it
270 works with transformed surfaces (dragging to a scaled down window in
271 expose mode, for example).
276 \item we can set the cursor image to the current cursor + dragged
277 object, which will last as long as the drag, but maybe an request to
278 attach an image to the cursor will be more convenient?
280 \item Should drag.send() destroy the object? There's nothing to do
281 after the data has been transferred.
283 \item How do we marshall several mime-types? We could make the drag
284 setup a multi-step operation: dnd.create, drag.offer(mime-type1,
285 drag.offer(mime-type2), drag.activate(). The drag object could send
286 multiple offer events on each motion event. Or we could just
287 implement an array type, but that's a pain to work with.
289 \item Middle-click drag to pop up menu? Ctrl/Shift/Alt drag?
291 \item Send a file descriptor over the protocol to let initiator and
292 source exchange data out of band?
294 \item Action? Specify action when creating the drag object? Ask
298 New objects, requests and events:
301 \item New toplevel dnd global. One method, creates a drag object:
302 \texttt{dnd.start(new object id, surface, input device, mime
303 types)}. Starts drag for the device, if it's grabbed by the
304 surface. drag ends when button is released. Caller is responsible
305 for destroying the drag object.
307 \item Drag object methods:
309 \texttt{drag.destroy(id)}, destroy drag object.
311 \texttt{drag.send(id, data)}, send drag data.
313 \texttt{drag.accept(id, mime type)}, accept drag offer, called by
316 \item Drag object events:
318 \texttt{drag.offer(id, mime-types)}, sent to potential destination
319 surfaces to offer drag data. If the device leaves the window or the
320 originator cancels the drag, this event is sent with mime-types =
323 \texttt{drag.target(id, mime-type)}, sent to drag originator when a
324 target surface has accepted the offer. if a previous target goes
325 away, this event is sent with mime-type = NULL.
327 \texttt{drag.data(id, data)}, sent to target, contains dragged data.
328 ends transaction on the target side.
334 \item The initiator surface receives a click (which grabs the input
335 device to that surface) and then enough motion to decide that a drag
336 is starting. Wayland has no subwindows, so it's entirely up to the
337 application to decide whether or not a draggable object within the
340 \item The initiator creates a drag object by calling the
341 \texttt{create\_drag} method on the dnd global object. As for any
342 client created object, the client allocates the id. The
343 \texttt{create\_drag} method also takes the originating surface, the
344 device that's dragging and the mime-types supported. If the surface
345 has indeed grabbed the device passed in, the server will create an
346 active drag object for the device. If the grab was released in the
347 meantime, the drag object will be in-active, that is, the same state
348 as when the grab is released. In that case, the client will receive
349 a button up event, which will let it know that the drag finished.
350 To the client it will look like the drag was immediately cancelled
353 The special mime-type application/x-root-target indicates that the
354 initiator is looking for drag events to the root window as well.
356 \item To indicate the object being dragged, the initiator can replace
357 the pointer image with an larger image representing the data being
358 dragged with the cursor image overlaid. The pointer image will
359 remain in place as long as the grab is in effect, since the
360 initiating surface keeps pointer focus, and no other surface
361 receives enter events.
363 \item As long as the grab is active (or until the initiator cancels
364 the drag by destroying the drag object), the drag object will send
365 \texttt{offer} events to surfaces it moves across. As for motion
366 events, these events contain the surface local coordinates of the
367 device as well as the list of mime-types offered. When a device
368 leaves a surface, it will send an \texttt{offer} event with an empty
369 list of mime-types to indicate that the device left the surface.
371 \item If a surface receives an offer event and decides that it's in an
372 area that can accept a drag event, it should call the
373 \texttt{accept} method on the drag object in the event. The surface
374 passes a mime-type in the request, picked from the list in the offer
375 event, to indicate which of the types it wants. At this point, the
376 surface can update the appearance of the drop target to give
377 feedback to the user that the drag has a valid target. If the
378 \texttt{offer} event moves to a different drop target (the surface
379 decides the offer coordinates is outside the drop target) or leaves
380 the surface (the offer event has an empty list of mime-types) it
381 should revert the appearance of the drop target to the inactive
382 state. A surface can also decide to retract its drop target (if the
383 drop target disappears or moves, for example), by calling the accept
384 method with a NULL mime-type.
386 \item When a target surface sends an \texttt{accept} request, the drag
387 object will send a \texttt{target} event to the initiator surface.
388 This tells the initiator that the drag currently has a potential
389 target and which of the offered mime-types the target wants. The
390 initiator can change the pointer image or drag source appearance to
391 reflect this new state. If the target surface retracts its drop
392 target of if the surface disappears, a \texttt{target} event with a
393 NULL mime-type will be sent.
395 If the initiator listed application/x-root-target as a valid
396 mime-type, dragging into the root window will make the drag object
397 send a \texttt{target} event with the application/x-root-target
400 \item When the grab is released (indicated by the button release
401 event), if the drag has an active target, the initiator calls the
402 \texttt{send} method on the drag object to send the data to be
403 transferred by the drag operation, in the format requested by the
404 target. The initiator can then destroy the drag object by calling
405 the \texttt{destroy} method.
407 \item The drop target receives a \texttt{data} event from the drag
408 object with the requested data.
411 MIME is defined in RFC's 2045-2049. A registry of MIME types is
412 maintained by the Internet Assigned Numbers Authority (IANA).
414 ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/
417 \section{Types of compositors}
419 \subsection{System Compositor}
422 \item ties in with graphical boot
423 \item hosts different types of session compositors
424 \item lets us switch between multiple sessions (fast user switching,
425 secure/personal desktop switching)
427 \item linux implementation using libudev, egl, kms, evdev, cairo
428 \item for fullscreen clients, the system compositor can reprogram the
429 video scanout address to source fromt the client provided buffer.
432 \subsection{Session Compositor}
435 \item nested under the system compositor. nesting is feasible because
436 protocol is async, roundtrip would break nesting
440 \item kde compositor?
441 \item text mode using vte
443 \item fullscreen X session under wayland
444 \item can run without system compositor, on the hw where it makes
446 \item root window less X server, bridging X windows into a wayland
450 \subsection{Embbedding Compositor}
452 X11 lets clients embed windows from other clients, or lets client copy
453 pixmap contents rendered by another client into their window. This is
454 often used for applets in a panel, browser plugins and similar.
455 Wayland doesn't directly allow this, but clients can communicate GEM
456 buffer names out-of-band, for example, using d-bus or as command line
457 arguments when the panel launches the applet. Another option is to
458 use a nested wayland instance. For this, the wayland server will have
459 to be a library that the host application links to. The host
460 application will then pass the wayland server socket name to the
461 embedded application, and will need to implement the wayland
462 compositor interface. The host application composites the client
463 surfaces as part of it's window, that is, in the web page or in the
464 panel. The benefit of nesting the wayland server is that it provides
465 the requests the embedded client needs to inform the host about buffer
466 updates and a mechanism for forwarding input events from the host
470 \item firefox embedding flash by being a special purpose compositor to
474 \section{Implementation}
476 what's currently implemented
478 \subsection{Wayland Server Library}
480 \texttt{libwayland-server.so}
483 \item implements protocol side of a compositor
484 \item minimal, doesn't include any rendering or input device handling
485 \item helpers for running on egl and evdev, and for nested wayland
488 \subsection{Wayland Client Library}
490 \texttt{libwayland.so}
493 \item minimal, designed to support integration with real toolkits such as
496 \item doesn't cache state, but lets the toolkits cache server state in
497 native objects (GObject or QObject or whatever).
500 \subsection{Wayland System Compositor}
503 \item implementation of the system compositor
505 \item uses libudev, eagle (egl), evdev and drm
507 \item integrates with ConsoleKit, can create new sessions
509 \item allows multi seat setups
511 \item configurable through udev rules and maybe /etc/wayland.d type thing
514 \subsection{X Server Session}
517 \item xserver module and driver support
519 \item uses wayland client library
521 \item same X.org server as we normally run, the front buffer is a wayland
522 surface but all accel code, 3d and extensions are there
524 \item when full screen the session compositor will scan out from the X
525 server wayland surface, at which point X is running pretty much as it