weston/spec/main.tex

\documentclass{article}
\usepackage{palatino}

\author{Kristian Høgsberg\\
\texttt{krh@bitplanet.net}
}

\title{The Wayland Display Server}

\begin{document}

\maketitle

\section{Wayland Overview}

\begin{itemize}
\item wayland is a protocol for a new display server.  
\item wayland is an implementation
\end{itemize}

\subsection{Replacing X11}

Over the last 10 years, a lot of functionality have slowly moved out
of the X server and into libraries or kernel drivers. It started with
freetype and fontconfig providing an alternative to the core X fonts
and direct rendering OpenGL as a graphics driver in a client side
library. Then cairo came along and provided a modern 2D rendering
library independent of X and compositing managers took over control of
the rendering of the desktop. Recently with GEM and KMS in the Linux
kernel, we can do modesetting outside X and schedule several direct
rendering clients. The end result is a highly modular graphics stack.

Wayland is a new display server building on top of all those
components. We’re trying to distill out the functionality in the X
server that is still used by the modern Linux desktop. This turns out
to be not a whole lot. Applications can allocate their own off-screen
buffers and render their window contents by themselves. In the end,
what’s needed is a way to present the resulting window surface to a
compositor and a way to receive input. This is what Wayland provides,
by piecing together the components already in the eco-system in a
slightly different way.

X will always be relevant, in the same way Fortran compilers and VRML
browsers are, but it’s time that we think about moving it out of the
critical path and provide it as an optional component for legacy
applications.


\section{Wayland protocol}

\subsection{Basic Principles}

The wayland protocol is a asynchronous object oriented protocol.  All
requests are method invocations on some object.  The request include
an object id that uniquely identifies an object on the server.  Each
object implements an interface and the requests include an opcode that
identifies which method in the interface to invoke.

The wire protocol is determined from the C prototypes of the requests
and events.  There is a straight forward mapping from the C types to
packing the bytes in the request written to the socket.  It is
possible to map the events and requests to function calls in other
languages, but that hasn't been done at this point.

The server sends back events to the client, each event is emitted from
an object.  Events can be error conditions.  The event includes the
object id and the event opcode, from which the client can determine
the type of event.  Events are generated both in repsonse to a request
(in which case the request and the event constitutes a round trip) or
spontanously when the server state changes.

\begin{itemize}
\item state is broadcast on connect, events sent out when state
  change.  client must listen for these changes and cache the state.
  no need (or mechanism) to query server state.

\item server will broadcast presence of a number of global objects,
  which in turn will broadcast their current state
\end{itemize}

\subsection{Connect Time}

\begin{itemize}
\item no fixed format connect block, the server emits a bunch of
  events at connect time
\item presence events for global objects: output, compositor, input
  devices
\end{itemize}
\subsection{Security and Authentication}

\begin{itemize}
\item mostly about access to underlying buffers, need new drm auth
  mechanism (the grant-to ioctl idea), need to check the cmd stream?

\item getting the server socket depends on the compositor type, could
  be a system wide name, through fd passing on the session dbus. or
  the client is forked by the compositor and the fd is already opened.
\end{itemize}

\subsection{Creating Objects}

\begin{itemize}
\item client allocates object ID, uses range protocol
\item server tracks how many IDs are left in current range, sends new
  range when client is about to run out.
\end{itemize}

\subsection{Compositor}

\begin{itemize}
\item a global object
\item broadcasts drm file name, or at least a string like drm:/dev/card0
\item commit/ack/frame protocol
\end{itemize}

\subsection{Surface}

created by the client
\begin{itemize}
\item attach
\item copy
\item damage
\item destroy
\item input region, opaque region
\item set cursor
\end{itemize}

\subsection{Input Group}

global object

\begin{itemize}
\item input group, keyboard, mouse
\item keyboard map, change events
\item pointer motion
\item enter, leave, focus
\item xkb on wayland
\item multi pointer wayland
\end{itemize}


\subsection{Output}

\begin{itemize}
\item global objects
\item a connected screen
\item laid out in a big coordinate system
\item basically xrandr over wayland
\end{itemize}

\subsection{Drag and Drop}

Multi-device aware. Orthogonal to rest of wayland, as it is its own
toplevel object.  Since the compositor determines the drag target, it
works with transformed surfaces (dragging to a scaled down window in
expose mode, for example).

Issues: 

\begin{itemize}
\item we can set the cursor image to the current cursor + dragged
  object, which will last as long as the drag, but maybe an request to
  attach an image to the cursor will be more convenient?

\item Should drag.send() destroy the object?  There's nothing to do
  after the data has been transferred.

\item How do we marshall several mime-types?  We could make the drag
  setup a multi-step operation: dnd.create, drag.offer(mime-type1,
  drag.offer(mime-type2), drag.activate().  The drag object could send
  multiple offer events on each motion event.  Or we could just
  implement an array type, but that's a pain to work with.

\item Middle-click drag to pop up menu?  Ctrl/Shift/Alt drag?

\item Send a file descriptor over the protocol to let initiator and
  source exchange data out of band?

\item Action?  Specify action when creating the drag object? Ask
  action?
\end{itemize}

New objects, requests and events:

  - New toplevel dnd global.  One method, creates a drag object:

	dnd.start(new object id, surface, input device, mime types), 

    Starts drag for the device, if it's grabbed by the surface. drag
    ends when button is released.  Caller is responsible for
    destroying the drag object.

  - Drag object methods:

	drag.destroy(id), destroy drag object.

	drag.send(id, data), send drag data.

	drag.accept(id, mime type), accept drag offer, called by
	target surface.

  - drag object events:

	drag.offer(id, mime-types), sent to potential destination
	surfaces to offer drag data.  If the device leaves the window
	or the originator cancels the drag, this event is sent with
	mime-types = NULL.

	drag.target(id, mime-type), sent to drag originator when a
	target surface has accepted the offer. if a previous target
	goes away, this event is sent with mime-type = NULL.

	drag.data(id, data), sent to target, contains dragged data.
	ends transaction on the target side.

Sequence of events:

\begin{itemize}
\item The initiator surface receives a click (which grabs the input
  device to that surface) and then enough motion to decide that a drag
  is starting.  Wayland has no subwindows, so it's entirely up to the
  application to decide whether or not a draggable object within the
  surface was clicked.

\item The initiator creates a drag object by calling the create\_drag
  method on the dnd global object.  As for any client created object,
  the client allocates the id.  The create\_drag method also takes the
  originating surface, the device that's dragging and the mime-types
  supported.  If the surface has indeed grabbed the device passed in,
  the server will create an active drag object for the device.  If the
  grab was released in the meantime, the drag object will be
  in-active, that is, the same state as when the grab is released.  In
  that case, the client will receive a button up event, which will let
  it know that the drag finished.  To the client it will look like the
  drag was immediately cancelled by the grab ending.

  The special mime-type application/x-root-target indicates that the
  initiator is looking for drag events to the root window as well.

\item To indicate the object being dragged, the initiator can replace
  the pointer image with an larger image representing the data being
  dragged with the cursor image overlaid.  The pointer image will
  remain in place as long as the grab is in effect, since no other
  surfaces receive enter/leave events.

\item As long as the grab is active (or until the initiator cancels
  the drag by destroying the drag object), the drag object will send
  "offer" events to surfaces it moves across. As for motion events,
  these events contain the surface local coordinates of the device as
  well as the list of mime-types offered.  When a device leaves a
  surface, it will send an offer event with an empty list of
  mime-types to indicate that the device left the surface.

\item If a surface receives an offer event and decides that it's in an
  area that can accept a drag event, it should call the accept method
  on the drag object in the event.  The surface passes a mime-type in
  the request, picked from the list in the offer event, to indicate
  which of the types it wants.  At this point, the surface can update
  the appearance of the drop target to give feedback to the user that
  the drag has a valid target.  If the offer event moves to a
  different drop target (the surface decides the offer coordinates is
  outside the drop target) or leaves the surface (the offer event has
  an empty list of mime-types) it should revert the appearance of the
  drop target to the inactive state.  A surface can also decide to
  retract its drop target (if the drop target disappears or moves, for
  example), by calling the accept method with a NULL mime-type.

\item When a target surface sends an accept request, the drag object
  will send a target event to the initiator surface.  This tells the
  initiator that the drag currently has a potential target and which
  of the offered mime-types the target wants.  The initiator can
  change the pointer image or drag source appearance to reflect this
  new state.  If the target surface retracts its drop target of if the
  surface disappears, a target event with a NULL mime-type will be
  sent.

  If the initiator listed application/x-root-target as a valid
  mime-type, dragging into the root window will make the drag object
  send a target event with the application/x-root-target mime-type.

\item When the grab is released (indicated by the button release
  event), if the drag has an active target, the initiator calls the
  send method on the drag object to send the data to be transferred by
  the drag operation, in the format requested by the target.  The
  initiator can then destroy the drag object by calling the destroy
  method.

\item The drop target receives a data event from the drag object with
  the requested data.
\end{itemize}

MIME is defined in RFC's 2045-2049. A registry of MIME types is
maintained by the Internet Assigned Numbers Authority (IANA).

ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/


\section{Types of compositors}

\subsection{System Compositor}

\begin{itemize}
\item ties in with graphical boot
\item hosts different types of session compositors
\item lets us switch between multiple sessions (fast user switching,
   secure/personal desktop switching)
\item multiseat
\item linux implementation using libudev, egl, kms, evdev, cairo
\item for fullscreen clients, the system compositor can reprogram the
   video scanout address to source fromt the client provided buffer.
\end{itemize}

\subsection{Session Compositor}

\begin{itemize}
\item nested under the system compositor.  nesting is feasible because
   protocol is async, roundtrip would break nesting
\item gnome-shell
\item moblin
\item compiz?
\item kde compositor?
\item text mode using vte
\item rdp session
\item fullscreen X session under wayland
\item can run without system compositor, on the hw where it makes
   sense
\item root window less X server, bridging X windows into a wayland
   session compositor
\end{itemize}

\subsection{Embbedding Compositor}

X11 lets clients embed windows from other clients, or lets client copy
pixmap contents rendered by another client into their window.  This is
often used for applets in a panel, browser plugins and similar.
Wayland doesn't directly allow this, but clients can communicate GEM
buffer names out-of-band, for example, using d-bus or as command line
arguments when the panel launches the applet.  Another option is to
use a nested wayland instance.  For this, the wayland server will have
to be a library that the host application links to.  The host
application will then pass the wayland server socket name to the
embedded application, and will need to implement the wayland
compositor interface.  The host application composites the client
surfaces as part of it's window, that is, in the web page or in the
panel.  The benefit of nesting the wayland server is that it provides
the requests the embedded client needs to inform the host about buffer
updates and a mechanism for forwarding input events from the host
application.

\begin{itemize}
\item firefox embedding flash by being a special purpose compositor to
   the plugin
\end{itemize}

\section{Implementation}

what's currently implemented

\subsection{Wayland Server Library}

\texttt{libwayland-server.so}

\begin{itemize}
\item implements protocol side of a compositor
\item minimal, doesn't include any rendering or input device handling
\item helpers for running on egl and evdev, and for nested wayland
\end{itemize}

\subsection{Wayland Client Library}

\texttt{libwayland.so}

\begin{itemize}
\item minimal, designed to support integration with real toolkits such as
   Qt, GTK+ or Clutter.

\item doesn't cache state, but lets the toolkits cache server state in
   native objects (GObject or QObject or whatever).
\end{itemize}

\subsection{Wayland System Compositor}

\item implementation of the system compositor

\item uses libudev, eagle (egl), evdev and drm

\item integrates with ConsoleKit, can create new sessions

\item allows multi seat setups

\item configurable through udev rules and maybe /etc/wayland.d type thing
\end{itemize}

\subsection{X Server Session}

\begin{itemize}
\item xserver module and driver support

\item uses wayland client library

\item same X.org server as we normally run, the front buffer is a wayland
   surface but all accel code, 3d and extensions are there

\item when full screen the session compositor will scan out from the X
   server wayland surface, at which point X is running pretty much as it
   does natively.
\end{itemize}

\end{document}