[RFC,0/2] Add the stateless AV1 uAPI and the VIVPU virtual driver to showcase it.

Message ID 20210810220552.298140-1-daniel.almeida@collabora.com (mailing list archive)
Headers
Series Add the stateless AV1 uAPI and the VIVPU virtual driver to showcase it. |

Message

Daniel Almeida Aug. 10, 2021, 10:05 p.m. UTC
  From: Daniel Almeida <daniel.almeida@collabora.com>

Dear all,

This patchset adds the stateless AV1 uAPI and the VIVPU virtual driver to
showcase it.

Note that this patch depends on dynamically allocated control arrays, i.e. [0]
and [1], which are part of the following series[2].

This cover letter will discuss the AV1 OBUs and their relationship with the
V4L2 controls proposed therein. The VIVPU test driver will also be discussed.

Note that I have also written a GStreamer decoder element [3] to interface with
the VIVPU virtual driver through the proposed control interface to ensure that
these three pieces actually work. The MR in gst-plugins-bad is marked as "Draft"
only because the uAPI hasn't been merged yet and there's no real hardware to
test it.

Padding and holes have not been taken into account yet.



Relevant AV1 Open Bitstream Units (OBUs):
-----------------------------------------

AV1 is packetized into a syntax element known as OBU, which stands for Open
Bitstream Units. There are seven different types of OBUs defined in the AV1
specification, of which five are of interest for the purposes of this API, they
are:

Sequence Header OBU: Contains information that applies to the entire sequence.
Most importantly, it contains a set of flags that signal which AV1 features are
enabled for the entire video coded sequence. The sequence header OBU also
encodes the sequence profile.

Frame Header OBU: Contains information that applies to an entire frame. Notably,
this OBU will dictate the frame's dimensions, its frame type, quantization,
segmentation and filter parameters as well as the set of reference frames needed
to effect a decoding operation. A set of flags will signal whether some AV1
features are enabled for a particular frame.

Tile Group OBU: Contains tiling information. Tile groups contain the tile data
associated with a frame. Tiles are subdivisions of a picture that can be
independently decoded, optionally in parallel. The entire frame is assembled
from all the tiles after potential loop filtering.

Frame OBU: Shorthand for a frame header OBU plus a tile group OBU but with less
overhead. Frame OBUs are a convenience for the common case in which a frame
header is combined with tiling information.

Tile List OBU: Similar to a tile group OBU, but used in "Large Scale
Tile Decoding Mode". The tiling information contained in this OBU has an
additional header that allows the decoder to process a subset of tiles and
display the corresponding part of the image without having to fully decode all
the tiles for a frame.



AV1 uAPI V4L2 CIDs:
-------------------

V4L2_CID_STATELESS_AV1_SEQUENCE: represents a Sequence Header OBU. This control
should only be set once per Sequence Header OBU. The "flags" member contains a
bitfield with the set of flags for the current video coded sequence as parsed
from the bitstream.

V4L2_CID_STATELESS_AV1_FRAME_HEADER: represents a Frame Header OBU. This control
should be set once per frame.

V4L2_CID_STATELESS_AV1_{TILE_GROUP|TILE_GROUP_ENTRY}: represents a Tile Group
OBU or the tiling information within a Frame OBU. These controls contain an
array of metadata to decode the tiles associated with a frame. Both controls
depend on V4L2_CTRL_FLAG_DYNAMIC_ARRAY and drivers will be able to index into
the array using ctrl->p_cur.p_av1_tile_group and
ctrl->p_cur.p_av1_tile_group_entry as base pointers respectively. Frame OBUs
should be split into their Frame Header OBU and Tile Group OBU constituents
before the array entries can be set and there should be a maximum of 512 tile
group entries as per the AV1 specification. In the event that more than one tile
group is provided, drivers can disambiguate their corresponding entries in the
ctrl->p_cur.p_av1_tile_group_entry array by taking note of the tg_start and
tg_end fields.

V4L2_CID_STATELESS_AV1_{TILE_LIST|TILE_LIST_ENTRY}: represents a Tile List OBU.
These controls contain an array of metadata to decode a list of tiles associated
with a frame when the decoder is operating under "Large Scale Tile Decoding
Mode". Both controls depend on V4L2_CTRL_FLAG_DYNAMIC_ARRAY, and drivers will be
able index into the array using ctrl->p_cur.p_av1_tile_list and
ctrl->p_cur.p_av1_tile_list_entry as base pointers respectively. In the event
that more than one list is provided, drivers can disambiguate their
corresponding entries in the ctrl->p_cur.p_av1_tile_list_entry array by taking
note of the tile_count_minus_1 field.

V4L2_CID_STATELESS_AV1_PROFILE: this control lets the driver convey the
supported profiles to userspace.

V4L2_CID_STATELESS_AV1_LEVEL: this control lets the driver convey the supported
AV1 levels to userspace.

V4L2_CTRL_AV1_OPERATING_MODE: this control lets the driver convey the supported
operating modes to userspace. Conversely, userspace apps can change the value of
this control to switch between "general decoding" and "large scale tile
decoding". As per the AV1 specification, under *general decoding mode* the
driver should expect the input to be a sequence of OBUs and the output to be a
decoded frame, whereas under *large scale tile decoding mode* the driver should
expect the input to be a tile list OBU plus additional side information and the
output to be a decoded frame.



VIVPU:
------

This virtual driver was written as a way to showcase and test the control
interface for AV1 as well as the GStreamer decoder[3]. This is so we can detect
bugs at an early stage before real hardware is available. VIVPU does not attempt
to decode video at all.

Once VIVPU is loaded, one can run the following GStreamer pipeline successfully:

gst-launch-1.0 filesrc location=<path to some sample av1 file> ! parsebin !  v4l2slav1dec ! fakevideosink

This is provided that the patches in [3] have been applied and the v4l2codecs
gstreamer plugin is compiled.

It is also possible to print the controls' contents to the console by setting
vivpu_debug to 1. This is handy when debugging, even more so when one is
comparing two different userspace implementations because it makes it easier to
diff the controls that were passed to the kernel.

[0] https://patchwork.linuxtv.org/project/linux-media/patch/20210610113615.785359-2-hverkuil-cisco@xs4all.nl/

[1] https://patchwork.linuxtv.org/project/linux-media/patch/20210610113615.785359-3-hverkuil-cisco@xs4all.nl/

[2] https://patchwork.linuxtv.org/project/linux-media/list/?series=5647

[3] https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2305

Daniel Almeida (2):
  media: Add AV1 uAPI
  media: vivpu: add virtual VPU driver

 .../userspace-api/media/v4l/biblio.rst        |   10 +
 .../media/v4l/ext-ctrls-codec-stateless.rst   | 1268 +++++++++++++++++
 .../media/v4l/pixfmt-compressed.rst           |   21 +
 .../media/v4l/vidioc-g-ext-ctrls.rst          |   36 +
 .../media/v4l/vidioc-queryctrl.rst            |   54 +
 .../media/videodev2.h.rst.exceptions          |    9 +
 drivers/media/test-drivers/Kconfig            |    1 +
 drivers/media/test-drivers/Makefile           |    1 +
 drivers/media/test-drivers/vivpu/Kconfig      |   16 +
 drivers/media/test-drivers/vivpu/Makefile     |    4 +
 drivers/media/test-drivers/vivpu/vivpu-core.c |  418 ++++++
 drivers/media/test-drivers/vivpu/vivpu-dec.c  |  491 +++++++
 drivers/media/test-drivers/vivpu/vivpu-dec.h  |   61 +
 .../media/test-drivers/vivpu/vivpu-video.c    |  599 ++++++++
 .../media/test-drivers/vivpu/vivpu-video.h    |   46 +
 drivers/media/test-drivers/vivpu/vivpu.h      |  119 ++
 drivers/media/v4l2-core/v4l2-ctrls-core.c     |  286 +++-
 drivers/media/v4l2-core/v4l2-ctrls-defs.c     |   79 +
 drivers/media/v4l2-core/v4l2-ioctl.c          |    1 +
 include/media/v4l2-ctrls.h                    |   12 +
 include/uapi/linux/v4l2-controls.h            |  796 +++++++++++
 include/uapi/linux/videodev2.h                |   15 +
 22 files changed, 4342 insertions(+), 1 deletion(-)
 create mode 100644 drivers/media/test-drivers/vivpu/Kconfig
 create mode 100644 drivers/media/test-drivers/vivpu/Makefile
 create mode 100644 drivers/media/test-drivers/vivpu/vivpu-core.c
 create mode 100644 drivers/media/test-drivers/vivpu/vivpu-dec.c
 create mode 100644 drivers/media/test-drivers/vivpu/vivpu-dec.h
 create mode 100644 drivers/media/test-drivers/vivpu/vivpu-video.c
 create mode 100644 drivers/media/test-drivers/vivpu/vivpu-video.h
 create mode 100644 drivers/media/test-drivers/vivpu/vivpu.h
  

Comments

Hans Verkuil Sept. 2, 2021, 3:43 p.m. UTC | #1
On 11/08/2021 00:05, daniel.almeida@collabora.com wrote:
> From: Daniel Almeida <daniel.almeida@collabora.com>
> 
> Dear all,
> 
> This patchset adds the stateless AV1 uAPI and the VIVPU virtual driver to
> showcase it.
> 
> Note that this patch depends on dynamically allocated control arrays, i.e. [0]
> and [1], which are part of the following series[2].
> 
> This cover letter will discuss the AV1 OBUs and their relationship with the
> V4L2 controls proposed therein. The VIVPU test driver will also be discussed.
> 
> Note that I have also written a GStreamer decoder element [3] to interface with
> the VIVPU virtual driver through the proposed control interface to ensure that
> these three pieces actually work. The MR in gst-plugins-bad is marked as "Draft"
> only because the uAPI hasn't been merged yet and there's no real hardware to
> test it.
> 
> Padding and holes have not been taken into account yet.
> 
> 
> 
> Relevant AV1 Open Bitstream Units (OBUs):
> -----------------------------------------
> 
> AV1 is packetized into a syntax element known as OBU, which stands for Open
> Bitstream Units. There are seven different types of OBUs defined in the AV1
> specification, of which five are of interest for the purposes of this API, they
> are:
> 
> Sequence Header OBU: Contains information that applies to the entire sequence.
> Most importantly, it contains a set of flags that signal which AV1 features are
> enabled for the entire video coded sequence. The sequence header OBU also
> encodes the sequence profile.
> 
> Frame Header OBU: Contains information that applies to an entire frame. Notably,
> this OBU will dictate the frame's dimensions, its frame type, quantization,
> segmentation and filter parameters as well as the set of reference frames needed
> to effect a decoding operation. A set of flags will signal whether some AV1
> features are enabled for a particular frame.
> 
> Tile Group OBU: Contains tiling information. Tile groups contain the tile data
> associated with a frame. Tiles are subdivisions of a picture that can be
> independently decoded, optionally in parallel. The entire frame is assembled
> from all the tiles after potential loop filtering.
> 
> Frame OBU: Shorthand for a frame header OBU plus a tile group OBU but with less
> overhead. Frame OBUs are a convenience for the common case in which a frame
> header is combined with tiling information.
> 
> Tile List OBU: Similar to a tile group OBU, but used in "Large Scale
> Tile Decoding Mode". The tiling information contained in this OBU has an
> additional header that allows the decoder to process a subset of tiles and
> display the corresponding part of the image without having to fully decode all
> the tiles for a frame.
> 
> 
> 
> AV1 uAPI V4L2 CIDs:
> -------------------
> 
> V4L2_CID_STATELESS_AV1_SEQUENCE: represents a Sequence Header OBU. This control
> should only be set once per Sequence Header OBU. The "flags" member contains a
> bitfield with the set of flags for the current video coded sequence as parsed
> from the bitstream.
> 
> V4L2_CID_STATELESS_AV1_FRAME_HEADER: represents a Frame Header OBU. This control
> should be set once per frame.
> 
> V4L2_CID_STATELESS_AV1_{TILE_GROUP|TILE_GROUP_ENTRY}: represents a Tile Group
> OBU or the tiling information within a Frame OBU. These controls contain an
> array of metadata to decode the tiles associated with a frame. Both controls
> depend on V4L2_CTRL_FLAG_DYNAMIC_ARRAY and drivers will be able to index into
> the array using ctrl->p_cur.p_av1_tile_group and
> ctrl->p_cur.p_av1_tile_group_entry as base pointers respectively. Frame OBUs
> should be split into their Frame Header OBU and Tile Group OBU constituents
> before the array entries can be set and there should be a maximum of 512 tile
> group entries as per the AV1 specification. In the event that more than one tile
> group is provided, drivers can disambiguate their corresponding entries in the
> ctrl->p_cur.p_av1_tile_group_entry array by taking note of the tg_start and
> tg_end fields.
> 
> V4L2_CID_STATELESS_AV1_{TILE_LIST|TILE_LIST_ENTRY}: represents a Tile List OBU.
> These controls contain an array of metadata to decode a list of tiles associated
> with a frame when the decoder is operating under "Large Scale Tile Decoding
> Mode". Both controls depend on V4L2_CTRL_FLAG_DYNAMIC_ARRAY, and drivers will be
> able index into the array using ctrl->p_cur.p_av1_tile_list and
> ctrl->p_cur.p_av1_tile_list_entry as base pointers respectively. In the event
> that more than one list is provided, drivers can disambiguate their
> corresponding entries in the ctrl->p_cur.p_av1_tile_list_entry array by taking
> note of the tile_count_minus_1 field.

It's a bit hard to tell for non-AV1 experts how these tile controls relate to
one another.

This is my understanding:

Without tiling only a V4L2_CID_STATELESS_AV1_FRAME_HEADER is needed.

With tiling you need a V4L2_CID_STATELESS_AV1_FRAME_HEADER and
V4L2_CID_STATELESS_AV1_{TILE_GROUP|TILE_GROUP_ENTRY} arrays.

With 'Large Scale Tile Decoding' you need a V4L2_CID_STATELESS_AV1_FRAME_HEADER
and V4L2_CID_STATELESS_AV1_{TILE_LIST|TILE_LIST_ENTRY} arrays (I think). It's
not clear to me if these TILE_LISTs replace TILE_GROUPs or add to them. If it
is the latter, then the V4L2_CID_STATELESS_AV1_{TILE_GROUP|TILE_GROUP_ENTRY}
arrays also need to be set for each frame.

In any case, this should probably be clarified in the documentation as well.

Regards,

	Hans

> 
> V4L2_CID_STATELESS_AV1_PROFILE: this control lets the driver convey the
> supported profiles to userspace.
> 
> V4L2_CID_STATELESS_AV1_LEVEL: this control lets the driver convey the supported
> AV1 levels to userspace.
> 
> V4L2_CTRL_AV1_OPERATING_MODE: this control lets the driver convey the supported
> operating modes to userspace. Conversely, userspace apps can change the value of
> this control to switch between "general decoding" and "large scale tile
> decoding". As per the AV1 specification, under *general decoding mode* the
> driver should expect the input to be a sequence of OBUs and the output to be a
> decoded frame, whereas under *large scale tile decoding mode* the driver should
> expect the input to be a tile list OBU plus additional side information and the
> output to be a decoded frame.
> 
> 
> 
> VIVPU:
> ------
> 
> This virtual driver was written as a way to showcase and test the control
> interface for AV1 as well as the GStreamer decoder[3]. This is so we can detect
> bugs at an early stage before real hardware is available. VIVPU does not attempt
> to decode video at all.
> 
> Once VIVPU is loaded, one can run the following GStreamer pipeline successfully:
> 
> gst-launch-1.0 filesrc location=<path to some sample av1 file> ! parsebin !  v4l2slav1dec ! fakevideosink
> 
> This is provided that the patches in [3] have been applied and the v4l2codecs
> gstreamer plugin is compiled.
> 
> It is also possible to print the controls' contents to the console by setting
> vivpu_debug to 1. This is handy when debugging, even more so when one is
> comparing two different userspace implementations because it makes it easier to
> diff the controls that were passed to the kernel.
> 
> [0] https://patchwork.linuxtv.org/project/linux-media/patch/20210610113615.785359-2-hverkuil-cisco@xs4all.nl/
> 
> [1] https://patchwork.linuxtv.org/project/linux-media/patch/20210610113615.785359-3-hverkuil-cisco@xs4all.nl/
> 
> [2] https://patchwork.linuxtv.org/project/linux-media/list/?series=5647
> 
> [3] https://gitlab.freedesktop.org/gstreamer/gst-plugins-bad/-/merge_requests/2305
> 
> Daniel Almeida (2):
>   media: Add AV1 uAPI
>   media: vivpu: add virtual VPU driver
> 
>  .../userspace-api/media/v4l/biblio.rst        |   10 +
>  .../media/v4l/ext-ctrls-codec-stateless.rst   | 1268 +++++++++++++++++
>  .../media/v4l/pixfmt-compressed.rst           |   21 +
>  .../media/v4l/vidioc-g-ext-ctrls.rst          |   36 +
>  .../media/v4l/vidioc-queryctrl.rst            |   54 +
>  .../media/videodev2.h.rst.exceptions          |    9 +
>  drivers/media/test-drivers/Kconfig            |    1 +
>  drivers/media/test-drivers/Makefile           |    1 +
>  drivers/media/test-drivers/vivpu/Kconfig      |   16 +
>  drivers/media/test-drivers/vivpu/Makefile     |    4 +
>  drivers/media/test-drivers/vivpu/vivpu-core.c |  418 ++++++
>  drivers/media/test-drivers/vivpu/vivpu-dec.c  |  491 +++++++
>  drivers/media/test-drivers/vivpu/vivpu-dec.h  |   61 +
>  .../media/test-drivers/vivpu/vivpu-video.c    |  599 ++++++++
>  .../media/test-drivers/vivpu/vivpu-video.h    |   46 +
>  drivers/media/test-drivers/vivpu/vivpu.h      |  119 ++
>  drivers/media/v4l2-core/v4l2-ctrls-core.c     |  286 +++-
>  drivers/media/v4l2-core/v4l2-ctrls-defs.c     |   79 +
>  drivers/media/v4l2-core/v4l2-ioctl.c          |    1 +
>  include/media/v4l2-ctrls.h                    |   12 +
>  include/uapi/linux/v4l2-controls.h            |  796 +++++++++++
>  include/uapi/linux/videodev2.h                |   15 +
>  22 files changed, 4342 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/media/test-drivers/vivpu/Kconfig
>  create mode 100644 drivers/media/test-drivers/vivpu/Makefile
>  create mode 100644 drivers/media/test-drivers/vivpu/vivpu-core.c
>  create mode 100644 drivers/media/test-drivers/vivpu/vivpu-dec.c
>  create mode 100644 drivers/media/test-drivers/vivpu/vivpu-dec.h
>  create mode 100644 drivers/media/test-drivers/vivpu/vivpu-video.c
>  create mode 100644 drivers/media/test-drivers/vivpu/vivpu-video.h
>  create mode 100644 drivers/media/test-drivers/vivpu/vivpu.h
>