[v5,0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
Message ID | 20201110034934.70898-1-john.stultz@linaro.org (mailing list archive) |
---|---|
Headers |
Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from <linux-media-owner@vger.kernel.org>) id 1kcKf7-005GfB-QF; Tue, 10 Nov 2020 03:49:43 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730482AbgKJDtj (ORCPT <rfc822;mkrufky@linuxtv.org> + 1 other); Mon, 9 Nov 2020 22:49:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730249AbgKJDti (ORCPT <rfc822;linux-media@vger.kernel.org>); Mon, 9 Nov 2020 22:49:38 -0500 Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2231C0613CF for <linux-media@vger.kernel.org>; Mon, 9 Nov 2020 19:49:38 -0800 (PST) Received: by mail-pf1-x443.google.com with SMTP id w6so4653281pfu.1 for <linux-media@vger.kernel.org>; Mon, 09 Nov 2020 19:49:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=sDoxEoYXzOGDTRqKHmsK1PmkQ+v+kjtjORMumiaAdaY=; b=VRHtKHrHp+YYnxmp5gRTU5t79Mq1+WJABfC0oh+1Cza26fX35vqeIx0L7pTcKEGdU6 oE7L7TMvPdmMiAWjDBxnEqqH6vZuu7nkQdgMtO4ya0zUSdp7zFPLHG3cAA/4+I6iJTSv Dfl+SskuSqxVcP06diQkjLgkoaYUzFobZyNAuAzBxuDJ8rW3bYmniabBkXEE0iYn6lKD 2Offn4yIOhOBLAHNJbPGPN85qS+phnyo+6ye4t7WOzp8VkUz/Fdu0eGL1ZJ/ssKFriBO NlMBufWsCcZcYF5pfeFv30h41qabhd8gLf3ysS0yokatYdlM5+pOlK+I6mntknRC7TcX En8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=sDoxEoYXzOGDTRqKHmsK1PmkQ+v+kjtjORMumiaAdaY=; b=FTm9vkJXzszdrmG5PRQC1lFJQTzCh40Q7L41NfKn4CPbIhd3rk8wuPMTFXoBi5P3bn sktiOt3l3uOpC1/bGpVrhysmD/mFkN/Igf6jSv/7rjSMohLH25YoAK1ouh0HDzzpwxLc pbEKJeLH6b7RZD9s0gGTK6toiypvFT2RQMK86hm6dX6QG8DuHz4pqm1FY4dAJ2W01ynu U12ygefLL5X1JRRlmqg0Ay+MBETTkvAqmyif8Huy2nZCl3UXw6ZRhwqtosEScube81SP gQYxqlklU3mGq0Kmkh+UQSoo/EURkDF0cXIoDTz4oIT5QK1NGfeZJiQ6s6muq0eve63q 9hKg== X-Gm-Message-State: AOAM5315WiK163A+4BV7MI38YFoTzuD/4wV1rULrTXb4uhNhLRFZniy0 8VX5/PsrxlflMKT+GLWjsCwJlQ== X-Google-Smtp-Source: ABdhPJzH2pSKF1PV/0DDDEBwUaDZ0Qy7l115AmJfeyhAyHCxq657BbYMJVv7zdoWvPj0zHB54WdOCw== X-Received: by 2002:a17:90b:3708:: with SMTP id mg8mr2765897pjb.192.1604980178162; Mon, 09 Nov 2020 19:49:38 -0800 (PST) Received: from localhost.localdomain ([2601:1c2:680:1319:692:26ff:feda:3a81]) by smtp.gmail.com with ESMTPSA id b4sm12380693pfi.208.2020.11.09.19.49.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Nov 2020 19:49:37 -0800 (PST) From: John Stultz <john.stultz@linaro.org> To: lkml <linux-kernel@vger.kernel.org> Cc: John Stultz <john.stultz@linaro.org>, Sumit Semwal <sumit.semwal@linaro.org>, Liam Mark <lmark@codeaurora.org>, Laura Abbott <labbott@kernel.org>, Brian Starkey <Brian.Starkey@arm.com>, Hridya Valsaraju <hridya@google.com>, Suren Baghdasaryan <surenb@google.com>, Sandeep Patil <sspatil@google.com>, Daniel Mentz <danielmentz@google.com>, Chris Goldsworthy <cgoldswo@codeaurora.org>, =?utf-8?q?=C3=98rjan_Eide?= <orjan.eide@arm.com>, Robin Murphy <robin.murphy@arm.com>, Ezequiel Garcia <ezequiel@collabora.com>, Simon Ser <contact@emersion.fr>, James Jones <jajones@nvidia.com>, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation Date: Tue, 10 Nov 2020 03:49:27 +0000 Message-Id: <20201110034934.70898-1-john.stultz@linaro.org> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-media.vger.kernel.org> X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -2.5 (--) X-LSpam-Report: No, score=-2.5 required=5.0 tests=BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1 autolearn=ham autolearn_force=no |
Series |
dma-buf: Performance improvements for system heap & a system-uncached implementation
|
|
Message
John Stultz
Nov. 10, 2020, 3:49 a.m. UTC
Hey All, So just wanted to send my last revision of my patch series of performance optimizations to the dma-buf system heap. This series reworks the system heap to use sgtables, and then consolidates the pagelist method from the heap-helpers into the CMA heap. After which the heap-helpers logic is removed (as it is unused). I'd still like to find a better way to avoid some of the logic duplication in implementing the entire dma_buf_ops handlers per heap. But unfortunately that code is tied somewhat to how the buffer's memory is tracked. As more heaps show up I think we'll have a better idea how to best share code, so for now I think this is ok. After this, the series introduces an optimization that Ørjan Eide implemented for ION that avoids calling sync on attachments that don't have a mapping. Next, an optimization to use larger order pages for the system heap. This change brings us closer to the current performance of the ION allocation code (though there still is a gap due to ION using a mix of deferred-freeing and page pools, I'll be looking at integrating those eventually). Finally, a reworked version of my uncached system heap implementation I was submitting a few weeks back. Since it duplicated a lot of the now reworked system heap code, I realized it would be much simpler to add the functionality to the system_heap implementation itself. While not improving the core allocation performance, the uncached heap allocations do result in *much* improved performance on HiKey960 as it avoids a lot of flushing and invalidating buffers that the cpu doesn't touch often. Feedback on these would be great! thanks -john New in v5: * Added a comment explaining why the order sizes are chosen as they are Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Liam Mark <lmark@codeaurora.org> Cc: Laura Abbott <labbott@kernel.org> Cc: Brian Starkey <Brian.Starkey@arm.com> Cc: Hridya Valsaraju <hridya@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Daniel Mentz <danielmentz@google.com> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> Cc: Ørjan Eide <orjan.eide@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Ezequiel Garcia <ezequiel@collabora.com> Cc: Simon Ser <contact@emersion.fr> Cc: James Jones <jajones@nvidia.com> Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org John Stultz (7): dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists dma-buf: heaps: Move heap-helper logic into the cma_heap implementation dma-buf: heaps: Remove heap-helpers code dma-buf: heaps: Skip sync if not mapped dma-buf: system_heap: Allocate higher order pages if available dma-buf: dma-heap: Keep track of the heap device struct dma-buf: system_heap: Add a system-uncached heap re-using the system heap drivers/dma-buf/dma-heap.c | 33 +- drivers/dma-buf/heaps/Makefile | 1 - drivers/dma-buf/heaps/cma_heap.c | 324 +++++++++++++++--- drivers/dma-buf/heaps/heap-helpers.c | 270 --------------- drivers/dma-buf/heaps/heap-helpers.h | 53 --- drivers/dma-buf/heaps/system_heap.c | 494 ++++++++++++++++++++++++--- include/linux/dma-heap.h | 9 + 7 files changed, 753 insertions(+), 431 deletions(-) delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h
Comments
Hi John, On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > Hey All, > So just wanted to send my last revision of my patch series > of performance optimizations to the dma-buf system heap. Thanks very much for your patches - I think the first 5 patches look good to me. I know there was a bit of discussion over adding a new system-uncached heap v/s using a flag to identify that; I think I prefer the separate heap idea, but lets ask one last time if any one else has any real objections to it. Daniel, Christian: any comments from your side on this? I am planning to merge this series to drm-misc this week if I hear no objections. > > This series reworks the system heap to use sgtables, and then > consolidates the pagelist method from the heap-helpers into the > CMA heap. After which the heap-helpers logic is removed (as it > is unused). I'd still like to find a better way to avoid some of > the logic duplication in implementing the entire dma_buf_ops > handlers per heap. But unfortunately that code is tied somewhat > to how the buffer's memory is tracked. As more heaps show up I > think we'll have a better idea how to best share code, so for > now I think this is ok. > > After this, the series introduces an optimization that > Ørjan Eide implemented for ION that avoids calling sync on > attachments that don't have a mapping. > > Next, an optimization to use larger order pages for the system > heap. This change brings us closer to the current performance > of the ION allocation code (though there still is a gap due > to ION using a mix of deferred-freeing and page pools, I'll be > looking at integrating those eventually). > > Finally, a reworked version of my uncached system heap > implementation I was submitting a few weeks back. Since it > duplicated a lot of the now reworked system heap code, I > realized it would be much simpler to add the functionality to > the system_heap implementation itself. > > While not improving the core allocation performance, the > uncached heap allocations do result in *much* improved > performance on HiKey960 as it avoids a lot of flushing and > invalidating buffers that the cpu doesn't touch often. > > Feedback on these would be great! > > thanks > -john > > New in v5: > * Added a comment explaining why the order sizes are > chosen as they are > > Cc: Sumit Semwal <sumit.semwal@linaro.org> > Cc: Liam Mark <lmark@codeaurora.org> > Cc: Laura Abbott <labbott@kernel.org> > Cc: Brian Starkey <Brian.Starkey@arm.com> > Cc: Hridya Valsaraju <hridya@google.com> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Sandeep Patil <sspatil@google.com> > Cc: Daniel Mentz <danielmentz@google.com> > Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> > Cc: Ørjan Eide <orjan.eide@arm.com> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Ezequiel Garcia <ezequiel@collabora.com> > Cc: Simon Ser <contact@emersion.fr> > Cc: James Jones <jajones@nvidia.com> > Cc: linux-media@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > > John Stultz (7): > dma-buf: system_heap: Rework system heap to use sgtables instead of > pagelists > dma-buf: heaps: Move heap-helper logic into the cma_heap > implementation > dma-buf: heaps: Remove heap-helpers code > dma-buf: heaps: Skip sync if not mapped > dma-buf: system_heap: Allocate higher order pages if available > dma-buf: dma-heap: Keep track of the heap device struct > dma-buf: system_heap: Add a system-uncached heap re-using the system > heap > > drivers/dma-buf/dma-heap.c | 33 +- > drivers/dma-buf/heaps/Makefile | 1 - > drivers/dma-buf/heaps/cma_heap.c | 324 +++++++++++++++--- > drivers/dma-buf/heaps/heap-helpers.c | 270 --------------- > drivers/dma-buf/heaps/heap-helpers.h | 53 --- > drivers/dma-buf/heaps/system_heap.c | 494 ++++++++++++++++++++++++--- > include/linux/dma-heap.h | 9 + > 7 files changed, 753 insertions(+), 431 deletions(-) > delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c > delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h > > -- > 2.17.1 > Thanks much, Best, Sumit.
On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > Hi John, > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > > > Hey All, > > So just wanted to send my last revision of my patch series > > of performance optimizations to the dma-buf system heap. > > Thanks very much for your patches - I think the first 5 patches look good to me. > > I know there was a bit of discussion over adding a new system-uncached > heap v/s using a flag to identify that; I think I prefer the separate > heap idea, but lets ask one last time if any one else has any real > objections to it. > > Daniel, Christian: any comments from your side on this? I do wonder a bit where the userspace stack for this all is, since tuning allocators without a full stack is fairly pointless. dma-buf heaps is a bit in a limbo situation here it feels like. Plus I'm vary of anything related to leaking this kind of stuff beyond the dma-api because dma api maintainers don't like us doing that. But personally no concern on that front really, gpus need this. It's just that we do need solid justification I think if we land this. Hence back to first point. Ideally first point comes in the form of benchmarking on android together with a mesa driver (or mesa + some v4l driver or whatever it takes to actually show the benefits, I have no idea). -Daniel > > I am planning to merge this series to drm-misc this week if I hear no > objections. > > > > This series reworks the system heap to use sgtables, and then > > consolidates the pagelist method from the heap-helpers into the > > CMA heap. After which the heap-helpers logic is removed (as it > > is unused). I'd still like to find a better way to avoid some of > > the logic duplication in implementing the entire dma_buf_ops > > handlers per heap. But unfortunately that code is tied somewhat > > to how the buffer's memory is tracked. As more heaps show up I > > think we'll have a better idea how to best share code, so for > > now I think this is ok. > > > > After this, the series introduces an optimization that > > Ørjan Eide implemented for ION that avoids calling sync on > > attachments that don't have a mapping. > > > > Next, an optimization to use larger order pages for the system > > heap. This change brings us closer to the current performance > > of the ION allocation code (though there still is a gap due > > to ION using a mix of deferred-freeing and page pools, I'll be > > looking at integrating those eventually). > > > > Finally, a reworked version of my uncached system heap > > implementation I was submitting a few weeks back. Since it > > duplicated a lot of the now reworked system heap code, I > > realized it would be much simpler to add the functionality to > > the system_heap implementation itself. > > > > While not improving the core allocation performance, the > > uncached heap allocations do result in *much* improved > > performance on HiKey960 as it avoids a lot of flushing and > > invalidating buffers that the cpu doesn't touch often. > > > > Feedback on these would be great! > > > > thanks > > -john > > > > New in v5: > > * Added a comment explaining why the order sizes are > > chosen as they are > > > > Cc: Sumit Semwal <sumit.semwal@linaro.org> > > Cc: Liam Mark <lmark@codeaurora.org> > > Cc: Laura Abbott <labbott@kernel.org> > > Cc: Brian Starkey <Brian.Starkey@arm.com> > > Cc: Hridya Valsaraju <hridya@google.com> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Sandeep Patil <sspatil@google.com> > > Cc: Daniel Mentz <danielmentz@google.com> > > Cc: Chris Goldsworthy <cgoldswo@codeaurora.org> > > Cc: Ørjan Eide <orjan.eide@arm.com> > > Cc: Robin Murphy <robin.murphy@arm.com> > > Cc: Ezequiel Garcia <ezequiel@collabora.com> > > Cc: Simon Ser <contact@emersion.fr> > > Cc: James Jones <jajones@nvidia.com> > > Cc: linux-media@vger.kernel.org > > Cc: dri-devel@lists.freedesktop.org > > > > John Stultz (7): > > dma-buf: system_heap: Rework system heap to use sgtables instead of > > pagelists > > dma-buf: heaps: Move heap-helper logic into the cma_heap > > implementation > > dma-buf: heaps: Remove heap-helpers code > > dma-buf: heaps: Skip sync if not mapped > > dma-buf: system_heap: Allocate higher order pages if available > > dma-buf: dma-heap: Keep track of the heap device struct > > dma-buf: system_heap: Add a system-uncached heap re-using the system > > heap > > > > drivers/dma-buf/dma-heap.c | 33 +- > > drivers/dma-buf/heaps/Makefile | 1 - > > drivers/dma-buf/heaps/cma_heap.c | 324 +++++++++++++++--- > > drivers/dma-buf/heaps/heap-helpers.c | 270 --------------- > > drivers/dma-buf/heaps/heap-helpers.h | 53 --- > > drivers/dma-buf/heaps/system_heap.c | 494 ++++++++++++++++++++++++--- > > include/linux/dma-heap.h | 9 + > > 7 files changed, 753 insertions(+), 431 deletions(-) > > delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c > > delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h > > > > -- > > 2.17.1 > > > Thanks much, > > Best, > Sumit.
On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > > > > > Hey All, > > > So just wanted to send my last revision of my patch series > > > of performance optimizations to the dma-buf system heap. > > > > Thanks very much for your patches - I think the first 5 patches look good to me. > > > > I know there was a bit of discussion over adding a new system-uncached > > heap v/s using a flag to identify that; I think I prefer the separate > > heap idea, but lets ask one last time if any one else has any real > > objections to it. > > > > Daniel, Christian: any comments from your side on this? > > I do wonder a bit where the userspace stack for this all is, since tuning > allocators without a full stack is fairly pointless. dma-buf heaps is a > bit in a limbo situation here it feels like. As mentioned in the system-uncached patch: Pending opensource users of this code include: * AOSP HiKey960 gralloc: - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 - Visibly improves performance over the system heap * AOSP Codec2 (possibly, needs more review): - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325 Additionally both the HiKey, HiKey960 grallocs and Codec2 are already able to use the current dmabuf heaps instead of ION. So I'm not sure what you mean by limbo, other than it being in a transition state where the interface is upstream and we're working on moving vendors to it from ION (which is staged to be dropped in 5.11). Part of that work is making sure we don't regress the performance expectations. > Plus I'm vary of anything related to leaking this kind of stuff beyond the > dma-api because dma api maintainers don't like us doing that. But > personally no concern on that front really, gpus need this. It's just that > we do need solid justification I think if we land this. Hence back to > first point. > > Ideally first point comes in the form of benchmarking on android together > with a mesa driver (or mesa + some v4l driver or whatever it takes to > actually show the benefits, I have no idea). Tying it with mesa is a little tough as the grallocs for mesa devices usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the allocation path for dmabuf heaps there gets a little complex as last I tried that (when trying to get HiKey working with Lima graphics, as gbm wouldn't allocate the contiguous buffers required by the display), I ran into issues with the drm_hwcomposer and mesa expecting the gbm private handle metadata in the buffer when it was passed in. But I might take a look at it again. I got a bit lost digging through the mesa gbm allocation paths last time. I'll also try to see if I can find a benchmark for the codec2 code (using dmabuf heaps with and without the uncached heap) on on db845c (w/ mesa), as that is already working and I suspect that might be close to what you're looking for. thanks -john
On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote: > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > > > > > > > Hey All, > > > > So just wanted to send my last revision of my patch series > > > > of performance optimizations to the dma-buf system heap. > > > > > > Thanks very much for your patches - I think the first 5 patches look good to me. > > > > > > I know there was a bit of discussion over adding a new system-uncached > > > heap v/s using a flag to identify that; I think I prefer the separate > > > heap idea, but lets ask one last time if any one else has any real > > > objections to it. > > > > > > Daniel, Christian: any comments from your side on this? > > > > I do wonder a bit where the userspace stack for this all is, since tuning > > allocators without a full stack is fairly pointless. dma-buf heaps is a > > bit in a limbo situation here it feels like. > > As mentioned in the system-uncached patch: > Pending opensource users of this code include: > * AOSP HiKey960 gralloc: > - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 > - Visibly improves performance over the system heap > * AOSP Codec2 (possibly, needs more review): > - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325 > > Additionally both the HiKey, HiKey960 grallocs and Codec2 are already > able to use the current dmabuf heaps instead of ION. > > So I'm not sure what you mean by limbo, other than it being in a > transition state where the interface is upstream and we're working on > moving vendors to it from ION (which is staged to be dropped in 5.11). > Part of that work is making sure we don't regress the performance > expectations. The mesa thing below, since if we test this with some downstream kernel drivers or at least non-mesa userspace I'm somewhat worried we're just creating a nice split world between the android gfx world and the mesa/linux desktop gfx world. But then that's kinda how android rolls, so *shrug* > > Plus I'm vary of anything related to leaking this kind of stuff beyond the > > dma-api because dma api maintainers don't like us doing that. But > > personally no concern on that front really, gpus need this. It's just that > > we do need solid justification I think if we land this. Hence back to > > first point. > > > > Ideally first point comes in the form of benchmarking on android together > > with a mesa driver (or mesa + some v4l driver or whatever it takes to > > actually show the benefits, I have no idea). > > Tying it with mesa is a little tough as the grallocs for mesa devices > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the > allocation path for dmabuf heaps there gets a little complex as last I > tried that (when trying to get HiKey working with Lima graphics, as > gbm wouldn't allocate the contiguous buffers required by the display), > I ran into issues with the drm_hwcomposer and mesa expecting the gbm > private handle metadata in the buffer when it was passed in. > > But I might take a look at it again. I got a bit lost digging through > the mesa gbm allocation paths last time. > > I'll also try to see if I can find a benchmark for the codec2 code > (using dmabuf heaps with and without the uncached heap) on on db845c > (w/ mesa), as that is already working and I suspect that might be > close to what you're looking for. tbh I think trying to push for this long term is the best we can hope for. Media is also a lot more *meh* since it's deeply fragmented and a lot less of it upstream than on the gles/display side. I think confirming that this at least doesn't horrible blow up on a gralloc/gbm+mesa stack would be useful I think. -Daniel
On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote: > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > > > > > > > > > Hey All, > > > > > So just wanted to send my last revision of my patch series > > > > > of performance optimizations to the dma-buf system heap. > > > > > > > > Thanks very much for your patches - I think the first 5 patches look good to me. > > > > > > > > I know there was a bit of discussion over adding a new system-uncached > > > > heap v/s using a flag to identify that; I think I prefer the separate > > > > heap idea, but lets ask one last time if any one else has any real > > > > objections to it. > > > > > > > > Daniel, Christian: any comments from your side on this? > > > > > > I do wonder a bit where the userspace stack for this all is, since tuning > > > allocators without a full stack is fairly pointless. dma-buf heaps is a > > > bit in a limbo situation here it feels like. > > > > As mentioned in the system-uncached patch: > > Pending opensource users of this code include: > > * AOSP HiKey960 gralloc: > > - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 > > - Visibly improves performance over the system heap > > * AOSP Codec2 (possibly, needs more review): > > - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325 > > > > Additionally both the HiKey, HiKey960 grallocs and Codec2 are already > > able to use the current dmabuf heaps instead of ION. > > > > So I'm not sure what you mean by limbo, other than it being in a > > transition state where the interface is upstream and we're working on > > moving vendors to it from ION (which is staged to be dropped in 5.11). > > Part of that work is making sure we don't regress the performance > > expectations. > > The mesa thing below, since if we test this with some downstream kernel > drivers or at least non-mesa userspace I'm somewhat worried we're just > creating a nice split world between the android gfx world and the > mesa/linux desktop gfx world. > > But then that's kinda how android rolls, so *shrug* > > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the > > > dma-api because dma api maintainers don't like us doing that. But > > > personally no concern on that front really, gpus need this. It's just that > > > we do need solid justification I think if we land this. Hence back to > > > first point. > > > > > > Ideally first point comes in the form of benchmarking on android together > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to > > > actually show the benefits, I have no idea). > > > > Tying it with mesa is a little tough as the grallocs for mesa devices > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the > > allocation path for dmabuf heaps there gets a little complex as last I > > tried that (when trying to get HiKey working with Lima graphics, as > > gbm wouldn't allocate the contiguous buffers required by the display), > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm > > private handle metadata in the buffer when it was passed in. > > > > But I might take a look at it again. I got a bit lost digging through > > the mesa gbm allocation paths last time. > > > > I'll also try to see if I can find a benchmark for the codec2 code > > (using dmabuf heaps with and without the uncached heap) on on db845c > > (w/ mesa), as that is already working and I suspect that might be > > close to what you're looking for. > > tbh I think trying to push for this long term is the best we can hope for. > > Media is also a lot more *meh* since it's deeply fragmented and a lot less > of it upstream than on the gles/display side. > > I think confirming that this at least doesn't horrible blow up on a > gralloc/gbm+mesa stack would be useful I think. Sorry, I'm still a little foggy on precisely what you're suggesting here. The patch stack I have has already been used with db845c (mesa + gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps. So no blowing up there. And I'm working with Hridya to find a benchmark for codec2 so we can try to show the performance delta. However, if you're wanting a dma-buf gralloc implementation with mesa, that may be a little tougher to do, but I guess I can give it a go. Hopefully this will address concerns about the system-uncached heap patch (the last two patches in this series)? In the meantime I hope we can queue the first five patches, as it would be nice to get the code rearranging in as there are others trying to stage their own heaps, and I'd like to avoid dragging that churn out for too long (in addition to improving the allocation performance). Those changes have no ABI implications. thanks -john
On Wed, Nov 18, 2020 at 3:40 AM John Stultz <john.stultz@linaro.org> wrote: > On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote: > > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > > > > > > > > > > > Hey All, > > > > > > So just wanted to send my last revision of my patch series > > > > > > of performance optimizations to the dma-buf system heap. > > > > > > > > > > Thanks very much for your patches - I think the first 5 patches look good to me. > > > > > > > > > > I know there was a bit of discussion over adding a new system-uncached > > > > > heap v/s using a flag to identify that; I think I prefer the separate > > > > > heap idea, but lets ask one last time if any one else has any real > > > > > objections to it. > > > > > > > > > > Daniel, Christian: any comments from your side on this? > > > > > > > > I do wonder a bit where the userspace stack for this all is, since tuning > > > > allocators without a full stack is fairly pointless. dma-buf heaps is a > > > > bit in a limbo situation here it feels like. > > > > > > As mentioned in the system-uncached patch: > > > Pending opensource users of this code include: > > > * AOSP HiKey960 gralloc: > > > - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 > > > - Visibly improves performance over the system heap > > > * AOSP Codec2 (possibly, needs more review): > > > - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325 > > > > > > Additionally both the HiKey, HiKey960 grallocs and Codec2 are already > > > able to use the current dmabuf heaps instead of ION. > > > > > > So I'm not sure what you mean by limbo, other than it being in a > > > transition state where the interface is upstream and we're working on > > > moving vendors to it from ION (which is staged to be dropped in 5.11). > > > Part of that work is making sure we don't regress the performance > > > expectations. > > > > The mesa thing below, since if we test this with some downstream kernel > > drivers or at least non-mesa userspace I'm somewhat worried we're just > > creating a nice split world between the android gfx world and the > > mesa/linux desktop gfx world. > > > > But then that's kinda how android rolls, so *shrug* > > > > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the > > > > dma-api because dma api maintainers don't like us doing that. But > > > > personally no concern on that front really, gpus need this. It's just that > > > > we do need solid justification I think if we land this. Hence back to > > > > first point. > > > > > > > > Ideally first point comes in the form of benchmarking on android together > > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to > > > > actually show the benefits, I have no idea). > > > > > > Tying it with mesa is a little tough as the grallocs for mesa devices > > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the > > > allocation path for dmabuf heaps there gets a little complex as last I > > > tried that (when trying to get HiKey working with Lima graphics, as > > > gbm wouldn't allocate the contiguous buffers required by the display), > > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm > > > private handle metadata in the buffer when it was passed in. > > > > > > But I might take a look at it again. I got a bit lost digging through > > > the mesa gbm allocation paths last time. > > > > > > I'll also try to see if I can find a benchmark for the codec2 code > > > (using dmabuf heaps with and without the uncached heap) on on db845c > > > (w/ mesa), as that is already working and I suspect that might be > > > close to what you're looking for. > > > > tbh I think trying to push for this long term is the best we can hope for. > > > > Media is also a lot more *meh* since it's deeply fragmented and a lot less > > of it upstream than on the gles/display side. > > > > I think confirming that this at least doesn't horrible blow up on a > > gralloc/gbm+mesa stack would be useful I think. > > Sorry, I'm still a little foggy on precisely what you're suggesting here. > > The patch stack I have has already been used with db845c (mesa + > gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps. > So no blowing up there. And I'm working with Hridya to find a > benchmark for codec2 so we can try to show the performance delta. > > However, if you're wanting a dma-buf gralloc implementation with mesa, > that may be a little tougher to do, but I guess I can give it a go. > > Hopefully this will address concerns about the system-uncached heap > patch (the last two patches in this series)? > > In the meantime I hope we can queue the first five patches, as it > would be nice to get the code rearranging in as there are others > trying to stage their own heaps, and I'd like to avoid dragging that > churn out for too long (in addition to improving the allocation > performance). Those changes have no ABI implications. Maybe I'm also misunderstanding what dma-buf heaps is used for in Android, at least usually. I thought it's used to allocate all the winsys/shared buffers through gralloc (at least in the blobby stacks), to handle the allocation constraints problem. In the open stacks we don't seem to have a platform with both mesa and v4l (or some other codec) with "interesting" allocations constraints, so no one using that gralloc+dma-buf heaps combo for what it was meant for. Hence why I'm a bit vary that we're creating something here which just misses the point a bit when we try to actually use it (in that glorious forever-future world where an android platform has enough drivers in upstream to do so). For other "this solves a system problem" we tend to be quite a bit more picky with the demonstration use case, to make sure we're actually creating something that solves the problem in reality. But it also looks like Android's just not there yet, so *shrug* ... -Daniel
Hi Daniel, On Wed, 18 Nov 2020 at 13:16, Daniel Vetter <daniel@ffwll.ch> wrote: > > On Wed, Nov 18, 2020 at 3:40 AM John Stultz <john.stultz@linaro.org> wrote: > > On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote: > > > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > > > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > > > > > > > > > > > > > Hey All, > > > > > > > So just wanted to send my last revision of my patch series > > > > > > > of performance optimizations to the dma-buf system heap. > > > > > > > > > > > > Thanks very much for your patches - I think the first 5 patches look good to me. > > > > > > > > > > > > I know there was a bit of discussion over adding a new system-uncached > > > > > > heap v/s using a flag to identify that; I think I prefer the separate > > > > > > heap idea, but lets ask one last time if any one else has any real > > > > > > objections to it. > > > > > > > > > > > > Daniel, Christian: any comments from your side on this? > > > > > > > > > > I do wonder a bit where the userspace stack for this all is, since tuning > > > > > allocators without a full stack is fairly pointless. dma-buf heaps is a > > > > > bit in a limbo situation here it feels like. > > > > > > > > As mentioned in the system-uncached patch: > > > > Pending opensource users of this code include: > > > > * AOSP HiKey960 gralloc: > > > > - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 > > > > - Visibly improves performance over the system heap > > > > * AOSP Codec2 (possibly, needs more review): > > > > - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325 > > > > > > > > Additionally both the HiKey, HiKey960 grallocs and Codec2 are already > > > > able to use the current dmabuf heaps instead of ION. > > > > > > > > So I'm not sure what you mean by limbo, other than it being in a > > > > transition state where the interface is upstream and we're working on > > > > moving vendors to it from ION (which is staged to be dropped in 5.11). > > > > Part of that work is making sure we don't regress the performance > > > > expectations. > > > > > > The mesa thing below, since if we test this with some downstream kernel > > > drivers or at least non-mesa userspace I'm somewhat worried we're just > > > creating a nice split world between the android gfx world and the > > > mesa/linux desktop gfx world. > > > > > > But then that's kinda how android rolls, so *shrug* > > > > > > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the > > > > > dma-api because dma api maintainers don't like us doing that. But > > > > > personally no concern on that front really, gpus need this. It's just that > > > > > we do need solid justification I think if we land this. Hence back to > > > > > first point. > > > > > > > > > > Ideally first point comes in the form of benchmarking on android together > > > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to > > > > > actually show the benefits, I have no idea). > > > > > > > > Tying it with mesa is a little tough as the grallocs for mesa devices > > > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the > > > > allocation path for dmabuf heaps there gets a little complex as last I > > > > tried that (when trying to get HiKey working with Lima graphics, as > > > > gbm wouldn't allocate the contiguous buffers required by the display), > > > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm > > > > private handle metadata in the buffer when it was passed in. > > > > > > > > But I might take a look at it again. I got a bit lost digging through > > > > the mesa gbm allocation paths last time. > > > > > > > > I'll also try to see if I can find a benchmark for the codec2 code > > > > (using dmabuf heaps with and without the uncached heap) on on db845c > > > > (w/ mesa), as that is already working and I suspect that might be > > > > close to what you're looking for. > > > > > > tbh I think trying to push for this long term is the best we can hope for. > > > > > > Media is also a lot more *meh* since it's deeply fragmented and a lot less > > > of it upstream than on the gles/display side. > > > > > > I think confirming that this at least doesn't horrible blow up on a > > > gralloc/gbm+mesa stack would be useful I think. > > > > Sorry, I'm still a little foggy on precisely what you're suggesting here. > > > > The patch stack I have has already been used with db845c (mesa + > > gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps. > > So no blowing up there. And I'm working with Hridya to find a > > benchmark for codec2 so we can try to show the performance delta. > > > > However, if you're wanting a dma-buf gralloc implementation with mesa, > > that may be a little tougher to do, but I guess I can give it a go. > > > > Hopefully this will address concerns about the system-uncached heap > > patch (the last two patches in this series)? > > > > In the meantime I hope we can queue the first five patches, as it > > would be nice to get the code rearranging in as there are others > > trying to stage their own heaps, and I'd like to avoid dragging that > > churn out for too long (in addition to improving the allocation > > performance). Those changes have no ABI implications. > > Maybe I'm also misunderstanding what dma-buf heaps is used for in > Android, at least usually. I thought it's used to allocate all the > winsys/shared buffers through gralloc (at least in the blobby stacks), > to handle the allocation constraints problem. In the open stacks we > don't seem to have a platform with both mesa and v4l (or some other > codec) with "interesting" allocations constraints, so no one using > that gralloc+dma-buf heaps combo for what it was meant for. Hence why > I'm a bit vary that we're creating something here which just misses > the point a bit when we try to actually use it (in that glorious > forever-future world where an android platform has enough drivers in > upstream to do so). > > For other "this solves a system problem" we tend to be quite a bit > more picky with the demonstration use case, to make sure we're > actually creating something that solves the problem in reality. > > But it also looks like Android's just not there yet, so *shrug* ... For me, looking at the first 5 patches (listed below, for quick reference), they are only doing code reorganisation and minor updates for already existing heaps, and no ABI change, I am not able to clearly see your objection here. To me, these seem to be required updates that the existing system heap users can benefit from. dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists dma-buf: heaps: Move heap-helper logic into the cma_heap implementation dma-buf: heaps: Remove heap-helpers code dma-buf: heaps: Skip sync if not mapped dma-buf: system_heap: Allocate higher order pages if available If we talk about the last two patches - the ones that add system uncached heap, I somewhat agree that we should be able to show the performance gains with this approach (which has been in use on ION and in devices) using dma-buf gralloc or similar. We can discuss the system-uncached heap when the dma-buf gralloc or similar demonstration for performance benefits is done, but I am inclined to push these 5 patches listed above through. Best, Sumit. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Fri, Nov 20, 2020 at 7:32 AM Sumit Semwal <sumit.semwal@linaro.org> wrote: > > Hi Daniel, > > > On Wed, 18 Nov 2020 at 13:16, Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Wed, Nov 18, 2020 at 3:40 AM John Stultz <john.stultz@linaro.org> wrote: > > > On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote: > > > > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > > > > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote: > > > > > > > > > > > > > > > > Hey All, > > > > > > > > So just wanted to send my last revision of my patch series > > > > > > > > of performance optimizations to the dma-buf system heap. > > > > > > > > > > > > > > Thanks very much for your patches - I think the first 5 patches look good to me. > > > > > > > > > > > > > > I know there was a bit of discussion over adding a new system-uncached > > > > > > > heap v/s using a flag to identify that; I think I prefer the separate > > > > > > > heap idea, but lets ask one last time if any one else has any real > > > > > > > objections to it. > > > > > > > > > > > > > > Daniel, Christian: any comments from your side on this? > > > > > > > > > > > > I do wonder a bit where the userspace stack for this all is, since tuning > > > > > > allocators without a full stack is fairly pointless. dma-buf heaps is a > > > > > > bit in a limbo situation here it feels like. > > > > > > > > > > As mentioned in the system-uncached patch: > > > > > Pending opensource users of this code include: > > > > > * AOSP HiKey960 gralloc: > > > > > - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 > > > > > - Visibly improves performance over the system heap > > > > > * AOSP Codec2 (possibly, needs more review): > > > > > - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325 > > > > > > > > > > Additionally both the HiKey, HiKey960 grallocs and Codec2 are already > > > > > able to use the current dmabuf heaps instead of ION. > > > > > > > > > > So I'm not sure what you mean by limbo, other than it being in a > > > > > transition state where the interface is upstream and we're working on > > > > > moving vendors to it from ION (which is staged to be dropped in 5.11). > > > > > Part of that work is making sure we don't regress the performance > > > > > expectations. > > > > > > > > The mesa thing below, since if we test this with some downstream kernel > > > > drivers or at least non-mesa userspace I'm somewhat worried we're just > > > > creating a nice split world between the android gfx world and the > > > > mesa/linux desktop gfx world. > > > > > > > > But then that's kinda how android rolls, so *shrug* > > > > > > > > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the > > > > > > dma-api because dma api maintainers don't like us doing that. But > > > > > > personally no concern on that front really, gpus need this. It's just that > > > > > > we do need solid justification I think if we land this. Hence back to > > > > > > first point. > > > > > > > > > > > > Ideally first point comes in the form of benchmarking on android together > > > > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to > > > > > > actually show the benefits, I have no idea). > > > > > > > > > > Tying it with mesa is a little tough as the grallocs for mesa devices > > > > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the > > > > > allocation path for dmabuf heaps there gets a little complex as last I > > > > > tried that (when trying to get HiKey working with Lima graphics, as > > > > > gbm wouldn't allocate the contiguous buffers required by the display), > > > > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm > > > > > private handle metadata in the buffer when it was passed in. > > > > > > > > > > But I might take a look at it again. I got a bit lost digging through > > > > > the mesa gbm allocation paths last time. > > > > > > > > > > I'll also try to see if I can find a benchmark for the codec2 code > > > > > (using dmabuf heaps with and without the uncached heap) on on db845c > > > > > (w/ mesa), as that is already working and I suspect that might be > > > > > close to what you're looking for. > > > > > > > > tbh I think trying to push for this long term is the best we can hope for. > > > > > > > > Media is also a lot more *meh* since it's deeply fragmented and a lot less > > > > of it upstream than on the gles/display side. > > > > > > > > I think confirming that this at least doesn't horrible blow up on a > > > > gralloc/gbm+mesa stack would be useful I think. > > > > > > Sorry, I'm still a little foggy on precisely what you're suggesting here. > > > > > > The patch stack I have has already been used with db845c (mesa + > > > gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps. > > > So no blowing up there. And I'm working with Hridya to find a > > > benchmark for codec2 so we can try to show the performance delta. > > > > > > However, if you're wanting a dma-buf gralloc implementation with mesa, > > > that may be a little tougher to do, but I guess I can give it a go. > > > > > > Hopefully this will address concerns about the system-uncached heap > > > patch (the last two patches in this series)? > > > > > > In the meantime I hope we can queue the first five patches, as it > > > would be nice to get the code rearranging in as there are others > > > trying to stage their own heaps, and I'd like to avoid dragging that > > > churn out for too long (in addition to improving the allocation > > > performance). Those changes have no ABI implications. > > > > Maybe I'm also misunderstanding what dma-buf heaps is used for in > > Android, at least usually. I thought it's used to allocate all the > > winsys/shared buffers through gralloc (at least in the blobby stacks), > > to handle the allocation constraints problem. In the open stacks we > > don't seem to have a platform with both mesa and v4l (or some other > > codec) with "interesting" allocations constraints, so no one using > > that gralloc+dma-buf heaps combo for what it was meant for. Hence why > > I'm a bit vary that we're creating something here which just misses > > the point a bit when we try to actually use it (in that glorious > > forever-future world where an android platform has enough drivers in > > upstream to do so). > > > > For other "this solves a system problem" we tend to be quite a bit > > more picky with the demonstration use case, to make sure we're > > actually creating something that solves the problem in reality. > > > > But it also looks like Android's just not there yet, so *shrug* ... > > For me, looking at the first 5 patches (listed below, for quick > reference), they are only doing code reorganisation and minor updates > for already existing heaps, and no ABI change, I am not able to > clearly see your objection here. To me, these seem to be required > updates that the existing system heap users can benefit from. > > dma-buf: system_heap: Rework system heap to use sgtables instead of > pagelists > dma-buf: heaps: Move heap-helper logic into the cma_heap > implementation > dma-buf: heaps: Remove heap-helpers code > dma-buf: heaps: Skip sync if not mapped > dma-buf: system_heap: Allocate higher order pages if available > > If we talk about the last two patches - the ones that add system > uncached heap, I somewhat agree that we should be able to show the > performance gains with this approach (which has been in use on ION and > in devices) using dma-buf gralloc or similar. > > We can discuss the system-uncached heap when the dma-buf gralloc or > similar demonstration for performance benefits is done, but I am > inclined to push these 5 patches listed above through. Yeah makes total sense - I was arguing about the new stuff, not the refactoring. -Daniel > > Best, > Sumit. > > > -Daniel > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch
Le vendredi 13 novembre 2020 à 21:39 +0100, Daniel Vetter a écrit : > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote: > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote: > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> > > > > wrote: > > > > > > > > > > Hey All, > > > > > So just wanted to send my last revision of my patch series > > > > > of performance optimizations to the dma-buf system heap. > > > > > > > > Thanks very much for your patches - I think the first 5 patches look > > > > good to me. > > > > > > > > I know there was a bit of discussion over adding a new system-uncached > > > > heap v/s using a flag to identify that; I think I prefer the separate > > > > heap idea, but lets ask one last time if any one else has any real > > > > objections to it. > > > > > > > > Daniel, Christian: any comments from your side on this? > > > > > > I do wonder a bit where the userspace stack for this all is, since tuning > > > allocators without a full stack is fairly pointless. dma-buf heaps is a > > > bit in a limbo situation here it feels like. > > > > As mentioned in the system-uncached patch: > > Pending opensource users of this code include: > > * AOSP HiKey960 gralloc: > > - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519 > > - Visibly improves performance over the system heap > > * AOSP Codec2 (possibly, needs more review): > > - > > https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325 > > > > Additionally both the HiKey, HiKey960 grallocs and Codec2 are already > > able to use the current dmabuf heaps instead of ION. > > > > So I'm not sure what you mean by limbo, other than it being in a > > transition state where the interface is upstream and we're working on > > moving vendors to it from ION (which is staged to be dropped in 5.11). > > Part of that work is making sure we don't regress the performance > > expectations. > > The mesa thing below, since if we test this with some downstream kernel > drivers or at least non-mesa userspace I'm somewhat worried we're just > creating a nice split world between the android gfx world and the > mesa/linux desktop gfx world. > > But then that's kinda how android rolls, so *shrug* > > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the > > > dma-api because dma api maintainers don't like us doing that. But > > > personally no concern on that front really, gpus need this. It's just that > > > we do need solid justification I think if we land this. Hence back to > > > first point. > > > > > > Ideally first point comes in the form of benchmarking on android together > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to > > > actually show the benefits, I have no idea). > > > > Tying it with mesa is a little tough as the grallocs for mesa devices > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the > > allocation path for dmabuf heaps there gets a little complex as last I > > tried that (when trying to get HiKey working with Lima graphics, as > > gbm wouldn't allocate the contiguous buffers required by the display), > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm > > private handle metadata in the buffer when it was passed in. > > > > But I might take a look at it again. I got a bit lost digging through > > the mesa gbm allocation paths last time. > > > > I'll also try to see if I can find a benchmark for the codec2 code > > (using dmabuf heaps with and without the uncached heap) on on db845c > > (w/ mesa), as that is already working and I suspect that might be > > close to what you're looking for. > > tbh I think trying to push for this long term is the best we can hope for. > > Media is also a lot more *meh* since it's deeply fragmented and a lot less > of it upstream than on the gles/display side. Sorry to jump in, but I'd like to reset a bit. The Media APIs are a lot more generic, most of the kernel API is usable without specific knowledge of the HW. Pretty much all APIs are exercised through v4l2-ctl and v4l2-compliance on the V4L2 side (including performance testing). It would be pretty straight forward to demonstrate the use of DMABuf heaps (just do live resolution switching, you'll beat the internal V4L2 allocator without even looking at DMA cache optimization). > > I think confirming that this at least doesn't horrible blow up on a > gralloc/gbm+mesa stack would be useful I think. > -Daniel
On Fri, May 21, 2021 at 2:40 AM Lee Jones <lee.jones@linaro.org> wrote: > On Tue, 10 Nov 2020 at 03:49, John Stultz <john.stultz@linaro.org> wrote: >> This series reworks the system heap to use sgtables, and then >> consolidates the pagelist method from the heap-helpers into the >> CMA heap. After which the heap-helpers logic is removed (as it >> is unused). I'd still like to find a better way to avoid some of >> the logic duplication in implementing the entire dma_buf_ops >> handlers per heap. But unfortunately that code is tied somewhat >> to how the buffer's memory is tracked. As more heaps show up I >> think we'll have a better idea how to best share code, so for >> now I think this is ok. >> >> After this, the series introduces an optimization that >> Ørjan Eide implemented for ION that avoids calling sync on >> attachments that don't have a mapping. >> >> Next, an optimization to use larger order pages for the system >> heap. This change brings us closer to the current performance >> of the ION allocation code (though there still is a gap due >> to ION using a mix of deferred-freeing and page pools, I'll be >> looking at integrating those eventually). >> >> Finally, a reworked version of my uncached system heap >> implementation I was submitting a few weeks back. Since it >> duplicated a lot of the now reworked system heap code, I >> realized it would be much simpler to add the functionality to >> the system_heap implementation itself. >> >> While not improving the core allocation performance, the >> uncached heap allocations do result in *much* improved >> performance on HiKey960 as it avoids a lot of flushing and >> invalidating buffers that the cpu doesn't touch often. >> > > > John, did this ever make it past v5? I don't see a follow-up. So most of these have landed upstream already. The one exception is the system-uncached heap implementation, as DanielV wanted a usecase where it was beneficial to a device with an open driver. Unfortunately this hasn't been trivial to show with the open gpu devices I have, but taking Nicolas Dufresne's note, we're looking to enable v4l2 integration in AOSP on db845c, so we can hopefully show some benefit there. The HAL integration work has been taking some time to get working though. So it's a bit blocked on that for now. thanks -john