[RFC,v6,0/7] Generic page pool & deferred freeing for system dmabuf heap

Message ID 20210205080621.3102035-1-john.stultz@linaro.org (mailing list archive)
Headers
Series Generic page pool & deferred freeing for system dmabuf heap |

Message

John Stultz Feb. 5, 2021, 8:06 a.m. UTC
  This series is starting to get long, so I figured I'd add a
short cover letter for context.

The point of this series is trying to add both deferred-freeing
logic as well as a page pool to the DMA-BUF system heap.

This is desired, as the combination of deferred freeing along
with the page pool allows us to offload page-zeroing out of
the allocation hot path. This was done originally with ION
and this patch series allows the DMA-BUF system heap to match
ION's system heap allocation performance in a simple
microbenchmark [1] (ION re-added to the kernel for comparision,
running on an x86 vm image):

./dmabuf-heap-bench -i 0 1 system                     
Testing dmabuf system vs ion heaptype 0 (flags: 0x1)
---------------------------------------------
dmabuf heap: alloc 4096 bytes 5000 times in 86572223 ns          17314 ns/call
ion heap:    alloc 4096 bytes 5000 times in 97442526 ns          19488 ns/call
dmabuf heap: alloc 1048576 bytes 5000 times in 196635057 ns      39327 ns/call
ion heap:    alloc 1048576 bytes 5000 times in 357323629 ns      71464 ns/call
dmabuf heap: alloc 8388608 bytes 5000 times in 3165445534 ns     633089 ns/call
ion heap:    alloc 8388608 bytes 5000 times in 3699591271 ns     739918 ns/call
dmabuf heap: alloc 33554432 bytes 5000 times in 13327402517 ns   2665480 ns/call
ion heap:    alloc 33554432 bytes 5000 times in 15292352796 ns   3058470 ns/call

Daniel didn't like earlier attempts to re-use the network
page-pool code to achieve this, and suggested the ttm_pool be
used instead. This required pulling the fairly tightly knit
ttm_pool logic apart, but after many failed attmempts I think
I found a workable abstraction to split out shared logic.

So this series contains a new generic drm_page_pool helper
library, converts the ttm_pool to use it, and then adds the
dmabuf deferred freeing and adds support to the dmabuf system
heap to use both deferred freeing and the new drm_page_pool.

Input would be greatly appreciated. Testing as well, as I don't
have any development hardware that utilizes the ttm pool.

thanks
-john

[1] https://android.googlesource.com/platform/system/memory/libdmabufheap/+/refs/heads/master/tests/dmabuf_heap_bench.c

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org

John Stultz (7):
  drm: Add a sharable drm page-pool implementation
  drm: ttm_pool: Rename the ttm_pool_dma structure to ttm_pool_page_dat
  drm: ttm_pool: Rework ttm_pool_free_page to allow us to use it as a
    function pointer
  drm: ttm_pool: Rework ttm_pool to use drm_page_pool
  dma-buf: heaps: Add deferred-free-helper library code
  dma-buf: system_heap: Add drm pagepool support to system heap
  dma-buf: system_heap: Add deferred freeing to the system heap

 drivers/dma-buf/heaps/Kconfig                |   5 +
 drivers/dma-buf/heaps/Makefile               |   1 +
 drivers/dma-buf/heaps/deferred-free-helper.c | 145 ++++++++++
 drivers/dma-buf/heaps/deferred-free-helper.h |  55 ++++
 drivers/dma-buf/heaps/system_heap.c          |  77 ++++-
 drivers/gpu/drm/Kconfig                      |   5 +
 drivers/gpu/drm/Makefile                     |   1 +
 drivers/gpu/drm/page_pool.c                  | 220 +++++++++++++++
 drivers/gpu/drm/ttm/ttm_pool.c               | 278 ++++++-------------
 include/drm/page_pool.h                      |  54 ++++
 include/drm/ttm/ttm_pool.h                   |  23 +-
 11 files changed, 639 insertions(+), 225 deletions(-)
 create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.c
 create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.h
 create mode 100644 drivers/gpu/drm/page_pool.c
 create mode 100644 include/drm/page_pool.h
  

Comments

Christian König Feb. 5, 2021, 10:36 a.m. UTC | #1
Am 05.02.21 um 09:06 schrieb John Stultz:
> This series is starting to get long, so I figured I'd add a
> short cover letter for context.
>
> The point of this series is trying to add both deferred-freeing
> logic as well as a page pool to the DMA-BUF system heap.
>
> This is desired, as the combination of deferred freeing along
> with the page pool allows us to offload page-zeroing out of
> the allocation hot path. This was done originally with ION
> and this patch series allows the DMA-BUF system heap to match
> ION's system heap allocation performance in a simple
> microbenchmark [1] (ION re-added to the kernel for comparision,
> running on an x86 vm image):
>
> ./dmabuf-heap-bench -i 0 1 system
> Testing dmabuf system vs ion heaptype 0 (flags: 0x1)
> ---------------------------------------------
> dmabuf heap: alloc 4096 bytes 5000 times in 86572223 ns          17314 ns/call
> ion heap:    alloc 4096 bytes 5000 times in 97442526 ns          19488 ns/call
> dmabuf heap: alloc 1048576 bytes 5000 times in 196635057 ns      39327 ns/call
> ion heap:    alloc 1048576 bytes 5000 times in 357323629 ns      71464 ns/call
> dmabuf heap: alloc 8388608 bytes 5000 times in 3165445534 ns     633089 ns/call
> ion heap:    alloc 8388608 bytes 5000 times in 3699591271 ns     739918 ns/call
> dmabuf heap: alloc 33554432 bytes 5000 times in 13327402517 ns   2665480 ns/call
> ion heap:    alloc 33554432 bytes 5000 times in 15292352796 ns   3058470 ns/call
>
> Daniel didn't like earlier attempts to re-use the network
> page-pool code to achieve this, and suggested the ttm_pool be
> used instead. This required pulling the fairly tightly knit
> ttm_pool logic apart, but after many failed attmempts I think
> I found a workable abstraction to split out shared logic.
>
> So this series contains a new generic drm_page_pool helper
> library, converts the ttm_pool to use it, and then adds the
> dmabuf deferred freeing and adds support to the dmabuf system
> heap to use both deferred freeing and the new drm_page_pool.
>
> Input would be greatly appreciated. Testing as well, as I don't
> have any development hardware that utilizes the ttm pool.

We can easily do the testing and the general idea sounds solid to me.

I see three major things we need to clean up here.
1. The licensing, you are moving from BSD/MIT to GPL2.
2. Don't add any new overhead to the TTM pool, especially allocating a 
private object per page is a no-go.
3. What are you doing with the reclaim stuff and why?
4. Keeping the documentation would be nice to have.

Regards,
Christian.

>
> thanks
> -john
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fandroid.googlesource.com%2Fplatform%2Fsystem%2Fmemory%2Flibdmabufheap%2F%2B%2Frefs%2Fheads%2Fmaster%2Ftests%2Fdmabuf_heap_bench.c&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C2dc4d6cb3ee246558b9e08d8c9acef9a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637481091933715561%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oMgVsrdlwS%2BqZuuatjTiWDzMU9SiUW5eRar5xwT%2BHYQ%3D&amp;reserved=0
>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Christian Koenig <christian.koenig@amd.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Liam Mark <lmark@codeaurora.org>
> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
> Cc: Laura Abbott <labbott@kernel.org>
> Cc: Brian Starkey <Brian.Starkey@arm.com>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Sandeep Patil <sspatil@google.com>
> Cc: Daniel Mentz <danielmentz@google.com>
> Cc: Ørjan Eide <orjan.eide@arm.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Ezequiel Garcia <ezequiel@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: James Jones <jajones@nvidia.com>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
>
> John Stultz (7):
>    drm: Add a sharable drm page-pool implementation
>    drm: ttm_pool: Rename the ttm_pool_dma structure to ttm_pool_page_dat
>    drm: ttm_pool: Rework ttm_pool_free_page to allow us to use it as a
>      function pointer
>    drm: ttm_pool: Rework ttm_pool to use drm_page_pool
>    dma-buf: heaps: Add deferred-free-helper library code
>    dma-buf: system_heap: Add drm pagepool support to system heap
>    dma-buf: system_heap: Add deferred freeing to the system heap
>
>   drivers/dma-buf/heaps/Kconfig                |   5 +
>   drivers/dma-buf/heaps/Makefile               |   1 +
>   drivers/dma-buf/heaps/deferred-free-helper.c | 145 ++++++++++
>   drivers/dma-buf/heaps/deferred-free-helper.h |  55 ++++
>   drivers/dma-buf/heaps/system_heap.c          |  77 ++++-
>   drivers/gpu/drm/Kconfig                      |   5 +
>   drivers/gpu/drm/Makefile                     |   1 +
>   drivers/gpu/drm/page_pool.c                  | 220 +++++++++++++++
>   drivers/gpu/drm/ttm/ttm_pool.c               | 278 ++++++-------------
>   include/drm/page_pool.h                      |  54 ++++
>   include/drm/ttm/ttm_pool.h                   |  23 +-
>   11 files changed, 639 insertions(+), 225 deletions(-)
>   create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.c
>   create mode 100644 drivers/dma-buf/heaps/deferred-free-helper.h
>   create mode 100644 drivers/gpu/drm/page_pool.c
>   create mode 100644 include/drm/page_pool.h
>
  
John Stultz Feb. 5, 2021, 8:57 p.m. UTC | #2
On Fri, Feb 5, 2021 at 2:36 AM Christian König <christian.koenig@amd.com> wrote:
> Am 05.02.21 um 09:06 schrieb John Stultz:
> > Input would be greatly appreciated. Testing as well, as I don't
> > have any development hardware that utilizes the ttm pool.
>
> We can easily do the testing and the general idea sounds solid to me.
>

Thanks so much again for the feedback!

> I see three major things we need to clean up here.
> 1. The licensing, you are moving from BSD/MIT to GPL2.

Yea, this may be sticky, as it's not just code re-used from one dual
licensed file, but combination of GPL2 work, so advice here would be
appreciated.

> 2. Don't add any new overhead to the TTM pool, especially allocating a
> private object per page is a no-go.

This will need some more series rework to solve. I've got some ideas,
but we'll see if they work.

> 3. What are you doing with the reclaim stuff and why?

As I mentioned, it's a holdover from earlier code, and I'm happy to
drop it and defer to other accounting/stats discussions that are
ongoing.

> 4. Keeping the documentation would be nice to have.

True. I didn't spend much time with documentation, as I worried folks
may have disagreed with the whole approach. I'll work to improve it
for the next go around.

Thanks so much again for the review and feedback! I really appreciate
your time here.
-john