From patchwork Fri Feb 11 16:18:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "T.J. Mercier" X-Patchwork-Id: 80697 Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from ) id 1nIYdI-00EWwc-0X; Fri, 11 Feb 2022 16:18:52 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351251AbiBKQSv (ORCPT + 1 other); Fri, 11 Feb 2022 11:18:51 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:46640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233594AbiBKQSu (ORCPT ); Fri, 11 Feb 2022 11:18:50 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E7902E5 for ; Fri, 11 Feb 2022 08:18:48 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id b187-20020a251bc4000000b0061e15c5024fso19551991ybb.4 for ; Fri, 11 Feb 2022 08:18:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=kKcjEXjOeEH0mGAkUcoWrOf4L77tAlhRMN7oO+B50iM=; b=T9LUWLemzPZzWYno8+slJF7qkK9H6xZtLOTMdahBN5x+c0VFhJMUx8wFWEMkCbnPAF c49tzOGzV24vKh6ZcteeJfm3K/h9dxxJPqP4MdiFOtfB6ha8vES8N3btoQrzDj99NC6N NThuVzfB1W1O0Kh++ZNdvtrId5jUcr1F28amVcqkRC74pjDyEHpBeQS8e/cQfBjFt7hj X67UvBvdRjeJR8maJKVWPqHkmcP8r4DnFCaYNX+KPCQut2f8zjoSvF0Y11POa8xywdg4 pNi7vTlLa3ElD0YVicbwppcwiTNNz6d3MAEHWfh4tsR1cEwGnXSja5hctDRznDh1fBfM Jfvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=kKcjEXjOeEH0mGAkUcoWrOf4L77tAlhRMN7oO+B50iM=; b=1f6KE3f90fKFTu8UWOuaOOP/L7TMmV99EjSbHUTg5ONJr757TOh2SnOIVIhZYaqsLK gaWZGAOXdTOEhN6bms3gwxL9GblJOu7CKfsqxwSm+6YiKLHrmv0pmm24aR/nVi5SLyk8 A6kqZFB6/Pi8QnOWUX180p6ZORZcpicFUK7IyiKQSR4aNDoGjjZTuGrPLIElmjqdSZj2 2Ri63UqdX9Quv+zDUl6ywEmac+K4Prkqt8QNLFn/0RIQmkSrSR8xaFu+0bxEqoBB6LRm If+kAPvxRjaUhb0fvqXVFPBP2GN6W9ZCMIN5ubwl9DY7jFUy1L8DigowrWHm4WgWfxbj RPPA== X-Gm-Message-State: AOAM531H05lnEOXztg2IcvkSY/4ZpUHvdRAaDDVs8yYjbBzdHiZ9fSib cHJmlr7gpRcFmR/nDseL/u3rAxORob/kkpg= X-Google-Smtp-Source: ABdhPJycT0YOG160oFos0c32qQdd1BG3KkZireh624cMDeZCi951f8jFV6xjhpQwy90JhdPUdi2Jed9M+wpOVmI= X-Received: from tj2.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:187]) (user=tjmercier job=sendgmr) by 2002:a25:8a8a:: with SMTP id h10mr1988762ybl.49.1644596327676; Fri, 11 Feb 2022 08:18:47 -0800 (PST) Date: Fri, 11 Feb 2022 16:18:24 +0000 In-Reply-To: <20220211161831.3493782-1-tjmercier@google.com> Message-Id: <20220211161831.3493782-2-tjmercier@google.com> Mime-Version: 1.0 References: <20220211161831.3493782-1-tjmercier@google.com> X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog Subject: [RFC v2 1/6] gpu: rfc: Proposal for a GPU cgroup controller From: "T.J. Mercier" To: Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Jonathan Corbet , Greg Kroah-Hartman , " =?utf-8?q?Arve_Hj=C3=B8n?= =?utf-8?q?nev=C3=A5g?= " , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Hridya Valsaraju , Suren Baghdasaryan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Benjamin Gaignard , Liam Mark , Laura Abbott , Brian Starkey , John Stultz , Tejun Heo , Zefan Li , Johannes Weiner Cc: kaleshsingh@google.com, Kenny.Ho@amd.com, "T.J. Mercier" , dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -12.3 (------------) X-LSpam-Report: No, score=-12.3 required=5.0 tests=BAYES_00=-1.9,DKIMWL_WL_MED=0.001,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1,RCVD_IN_DNSWL_MED=-2.3,USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no This patch adds a proposal for a new GPU cgroup controller for accounting/limiting GPU and GPU-related memory allocations. The proposed controller is based on the DRM cgroup controller[1] and follows the design of the RDMA cgroup controller. The new cgroup controller would: * Allow setting per-cgroup limits on the total size of buffers charged to it. * Allow setting per-device limits on the total size of buffers allocated by device within a cgroup. * Expose a per-device/allocator breakdown of the buffers charged to a cgroup. The prototype in the following patches is only for memory accounting using the GPU cgroup controller and does not implement limit setting. [1]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.com/ From: Hridya Valsaraju Signed-off-by: Hridya Valsaraju Co-developed-by: T.J. Mercier Signed-off-by: T.J. Mercier --- Documentation/gpu/rfc/gpu-cgroup.rst | 195 +++++++++++++++++++++++++++ Documentation/gpu/rfc/index.rst | 4 + 2 files changed, 199 insertions(+) create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst diff --git a/Documentation/gpu/rfc/gpu-cgroup.rst b/Documentation/gpu/rfc/gpu-cgroup.rst new file mode 100644 index 000000000000..0bb761223b97 --- /dev/null +++ b/Documentation/gpu/rfc/gpu-cgroup.rst @@ -0,0 +1,195 @@ +=================================== +GPU cgroup controller +=================================== + +Goals +===== +This document intends to outline a plan to create a cgroup v2 controller subsystem +for the per-cgroup accounting of device and system memory allocated by the GPU +and related subsystems. + +The new cgroup controller would: + +* Allow setting per-cgroup limits on the total size of buffers charged to it. + +* Allow setting per-device limits on the total size of buffers allocated by a + device/allocator within a cgroup. + +* Expose a per-device/allocator breakdown of the buffers charged to a cgroup. + +Alternatives Considered +======================= + +The following alternatives were considered: + +The memory cgroup controller +____________________________ + +1. As was noted in [1], memory accounting provided by the GPU cgroup +controller is not a good fit for integration into memcg due to the +differences in how accounting is performed. It implements a mechanism +for the allocator attribution of GPU and GPU-related memory by +charging each buffer to the cgroup of the process on behalf of which +the memory was allocated. The buffer stays charged to the cgroup until +it is freed regardless of whether the process retains any references +to it. On the other hand, the memory cgroup controller offers a more +fine-grained charging and uncharging behavior depending on the kind of +page being accounted. + +2. Memcg performs accounting in units of pages. In the DMA-BUF buffer sharing model, +a process takes a reference to the entire buffer(hence keeping it alive) even if +it is only accessing parts of it. Therefore, per-page memory tracking for DMA-BUF +memory accounting would only introduce additional overhead without any benefits. + +[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-brian.welty@intel.com/#22624705 + +Userspace service to keep track of buffer allocations and releases +__________________________________________________________________ + +1. There is no way for a userspace service to intercept all allocations and releases. +2. In case the process gets killed or restarted, we lose all accounting so far. + +UAPI +==== +When enabled, the new cgroup controller would create the following files in every cgroup. + +:: + + gpu.memory.current (R) + gpu.memory.max (R/W) + +gpu.memory.current is a read-only file and would contain per-device memory allocations +in a key-value format where key is a string representing the device name +and the value is the size of memory charged to the device in the cgroup in bytes. + +For example: + +:: + + cat /sys/kernel/fs/cgroup1/gpu.memory.current + dev1 4194304 + dev2 4194304 + +The string key for each device is set by the device driver when the device registers +with the GPU cgroup controller to participate in resource accounting(see section +'Design and Implementation' for more details). + +gpu.memory.max is a read/write file. It would show the current total +size limits on memory usage for the cgroup and the limits on total memory usage +for each allocator/device. + +Setting a total limit for a cgroup can be done as follows: + +:: + + echo “total 41943040” > /sys/kernel/fs/cgroup1/gpu.memory.max + +Setting a total limit for a particular device/allocator can be done as follows: + +:: + + echo “dev1 4194304” > /sys/kernel/fs/cgroup1/gpu.memory.max + +In this example, 'dev1' is the string key set by the device driver during +registration. + +Design and Implementation +========================= + +The cgroup controller would closely follow the design of the RDMA cgroup controller +subsystem where each cgroup maintains a list of resource pools. +Each resource pool contains a struct device and the counter to track current total, +and the maximum limit set for the device. + +The below code block is a preliminary estimation on how the core kernel data structures +and APIs would look like. + +.. code-block:: c + + /** + * The GPU cgroup controller data structure. + */ + struct gpucg { + struct cgroup_subsys_state css; + + /* list of all resource pools that belong to this cgroup */ + struct list_head rpools; + }; + + struct gpucg_device { + /* + * list of various resource pools in various cgroups that the device is + * part of. + */ + struct list_head rpools; + + /* list of all devices registered for GPU cgroup accounting */ + struct list_head dev_node; + + /* name to be used as identifier for accounting and limit setting */ + const char *name; + }; + + struct gpucg_resource_pool { + /* The device whose resource usage is tracked by this resource pool */ + struct gpucg_device *device; + + /* list of all resource pools for the cgroup */ + struct list_head cg_node; + + /* + * list maintained by the gpucg_device to keep track of its + * resource pools + */ + struct list_head dev_node; + + /* tracks memory usage of the resource pool */ + struct page_counter total; + }; + + /** + * gpucg_register_device - Registers a device for memory accounting using the + * GPU cgroup controller. + * + * @device: The device to register for memory accounting. Must remain valid + * after registration. + * @name: Pointer to a string literal to denote the name of the device. + */ + void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name); + + /** + * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device. + * + * @gpucg: The gpu cgroup to charge the memory to. + * @device: The device to charge the memory to. + * @usage: size of memory to charge in bytes. + * + * Return: returns 0 if the charging is successful and otherwise returns an + * error code. + */ + int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage); + + /** + * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device. + * + * @gpucg: The gpu cgroup to uncharge the memory from. + * @device: The device to charge the memory from. + * @usage: size of memory to uncharge in bytes. + */ + void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage); + +Future Work +=========== +Additional GPU resources can be supported by adding new controller files. + +Upstreaming Plan +================ +* Decide on a UAPI that accommodates all use-cases for the upstream GPU ecosystem + as well as for Android. + +* Prototype the GPU cgroup controller and integrate its usage into the DMA-BUF + system heap. + +* Demonstrate its usage from userspace in the Android Open Space Project. + +* Send out RFCs to LKML for the GPU cgroup controller and iterate. diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst index 91e93a705230..0a9bcd94e95d 100644 --- a/Documentation/gpu/rfc/index.rst +++ b/Documentation/gpu/rfc/index.rst @@ -23,3 +23,7 @@ host such documentation: .. toctree:: i915_scheduler.rst + +.. toctree:: + + gpu-cgroup.rst From patchwork Fri Feb 11 16:18:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "T.J. Mercier" X-Patchwork-Id: 80698 Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from ) id 1nIYdQ-00EWxF-F0; Fri, 11 Feb 2022 16:19:01 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351278AbiBKQS7 (ORCPT + 1 other); Fri, 11 Feb 2022 11:18:59 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:46834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351394AbiBKQS6 (ORCPT ); Fri, 11 Feb 2022 11:18:58 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB85B2F4 for ; Fri, 11 Feb 2022 08:18:54 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id y4-20020a5b0f44000000b00611862e546dso19809308ybr.7 for ; Fri, 11 Feb 2022 08:18:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Bc1BT8FVz1a5rx9060s9LPPape8rDeYKznWc0mQdTXk=; b=H8A0zu+FuKCo6H4HI7eac6LOZXdEM2EFBzjk/aN6ovipOGCEqFD851Qopn66zFcu9J 8cYm9+vvAfuvZPEztbDMdMrzA6tYRvP6k6HOL3PqTuFaGKOvl81kjDZvVxH/xRYvA9IM n5uRZA4n8QkC9ulGCv2zTSoTuo+dSOj96sJrIANOQFvGaK/A9j7i0N5VKOG9HiMJ9y27 367tMe4FRXDxffQtlVzUCsKW28mb0nn40ye0cAHegwhyzDyyS9N4p1Yn48vKvriERKKC SJGLny8KfhQTI+ebJ46M65eeDIuCLXAqz+JMikwq7dZiENFSbp8GTpvv1silt91IC16T Hz7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Bc1BT8FVz1a5rx9060s9LPPape8rDeYKznWc0mQdTXk=; b=tzq2Ruj/YK/wnIBYC2bNd7Gk3JN6EqYIn7kAhk3wbwk2+a1YbEaczMpeaW1Ocldxr/ 775PsufnnG/m4xz3HS4BwoXqwN2iIXTQJYQ9j7ZZ+LZaaZWuSO58n2tgK9L6VibrXReG QUkIagLg2fyzY4b8TICC0Bt6F4FFOV0M5u3ZVNZOD27tcnfa1qO+UJAAyZLxA0j6dkrG uT1FSbzV4QuNHog7ig7xTtlClbR6O/xIpCBMMN6EBMRGctQCOChzZbcF1vK733wxfW0t fp1OCiE2RWXr34OeaRUdrKs1aI0p0dFrm46vVuDyiEIR9LFLpU74cdPvBjCiwbZGE4KD xVhA== X-Gm-Message-State: AOAM533laQbKQ48jnU+RwAFyA/CMqGRwWeXUTahJwtDfC8OiJKAK34eK Nbgk36kwlOWyVFTrWlG8XIWJq1Htt0OH/f4= X-Google-Smtp-Source: ABdhPJz8FVOrKOqKlsZ8ZK1aeLaet/rzq36w6hlMxoTN9ZtZ5W4OHsMR855nOGYigUpqCN7vkYc6o9TtTcKm5nU= X-Received: from tj2.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:187]) (user=tjmercier job=sendgmr) by 2002:a81:7e43:: with SMTP id p3mr2491823ywn.135.1644596334140; Fri, 11 Feb 2022 08:18:54 -0800 (PST) Date: Fri, 11 Feb 2022 16:18:25 +0000 In-Reply-To: <20220211161831.3493782-1-tjmercier@google.com> Message-Id: <20220211161831.3493782-3-tjmercier@google.com> Mime-Version: 1.0 References: <20220211161831.3493782-1-tjmercier@google.com> X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog Subject: [RFC v2 2/6] cgroup: gpu: Add a cgroup controller for allocator attribution of GPU memory From: "T.J. Mercier" To: Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Jonathan Corbet , Greg Kroah-Hartman , " =?utf-8?q?Arve_Hj=C3=B8n?= =?utf-8?q?nev=C3=A5g?= " , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Hridya Valsaraju , Suren Baghdasaryan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Benjamin Gaignard , Liam Mark , Laura Abbott , Brian Starkey , John Stultz , Tejun Heo , Zefan Li , Johannes Weiner Cc: kaleshsingh@google.com, Kenny.Ho@amd.com, "T.J. Mercier" , dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -12.3 (------------) X-LSpam-Report: No, score=-12.3 required=5.0 tests=BAYES_00=-1.9,DKIMWL_WL_MED=0.001,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1,RCVD_IN_DNSWL_MED=-2.3,USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no The cgroup controller provides accounting for GPU and GPU-related memory allocations. The memory being accounted can be device memory or memory allocated from pools dedicated to serve GPU-related tasks. This patch adds APIs to: -allow a device to register for memory accounting using the GPU cgroup controller. -charge and uncharge allocated memory to a cgroup. When the cgroup controller is enabled, it would expose information about the memory allocated by each device(registered for GPU cgroup memory accounting) for each cgroup. The API/UAPI can be extended to set per-device/total allocation limits in the future. The cgroup controller has been named following the discussion in [1]. [1]: https://lore.kernel.org/amd-gfx/YCJp%2F%2FkMC7YjVMXv@phenom.ffwll.local/ From: Hridya Valsaraju Signed-off-by: Hridya Valsaraju Co-developed-by: T.J. Mercier Signed-off-by: T.J. Mercier --- changes in v2 - Fix incorrect Kconfig help section indentation per Randy Dunlap. include/linux/cgroup_gpu.h | 127 ++++++++++++++ include/linux/cgroup_subsys.h | 4 + init/Kconfig | 7 + kernel/cgroup/Makefile | 1 + kernel/cgroup/gpu.c | 304 ++++++++++++++++++++++++++++++++++ 5 files changed, 443 insertions(+) create mode 100644 include/linux/cgroup_gpu.h create mode 100644 kernel/cgroup/gpu.c diff --git a/include/linux/cgroup_gpu.h b/include/linux/cgroup_gpu.h new file mode 100644 index 000000000000..c5bc2b882783 --- /dev/null +++ b/include/linux/cgroup_gpu.h @@ -0,0 +1,127 @@ +/* SPDX-License-Identifier: MIT + * Copyright 2019 Advanced Micro Devices, Inc. + * Copyright (C) 2022 Google LLC. + */ +#ifndef _CGROUP_GPU_H +#define _CGROUP_GPU_H + +#include +#include + +#ifdef CONFIG_CGROUP_GPU + /* The GPU cgroup controller data structure */ +struct gpucg { + struct cgroup_subsys_state css; + + /* list of all resource pools that belong to this cgroup */ + struct list_head rpools; +}; + +struct gpucg_device { + /* + * list of various resource pools in various cgroups that the device is + * part of. + */ + struct list_head rpools; + + /* list of all devices registered for GPU cgroup accounting */ + struct list_head dev_node; + + /* + * pointer to string literal to be used as identifier for accounting and + * limit setting + */ + const char *name; +}; + +/** + * css_to_gpucg - get the corresponding gpucg ref from a cgroup_subsys_state + * @css: the target cgroup_subsys_state + * + * Returns: gpu cgroup that contains the @css + */ +static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css) +{ + return css ? container_of(css, struct gpucg, css) : NULL; +} + +/** + * gpucg_get - get the gpucg reference that a task belongs to + * @task: the target task + * + * This increases the reference count of the css that the @task belongs to. + * + * Returns: reference to the gpu cgroup the task belongs to. + */ +static inline struct gpucg *gpucg_get(struct task_struct *task) +{ + if (!cgroup_subsys_enabled(gpu_cgrp_subsys)) + return NULL; + return css_to_gpucg(task_get_css(task, gpu_cgrp_id)); +} + +/** + * gpucg_put - put a gpucg reference + * @gpucg: the target gpucg + * + * Put a reference obtained via gpucg_get + */ +static inline void gpucg_put(struct gpucg *gpucg) +{ + if (gpucg) + css_put(&gpucg->css); +} + +/** + * gpucg_parent - find the parent of a gpu cgroup + * @cg: the target gpucg + * + * This does not increase the reference count of the parent cgroup + * + * Returns: parent gpu cgroup of @cg + */ +static inline struct gpucg *gpucg_parent(struct gpucg *cg) +{ + return css_to_gpucg(cg->css.parent); +} + +int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage); +void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage); +void gpucg_register_device(struct gpucg_device *gpucg_dev, const char *name); +#else /* CONFIG_CGROUP_GPU */ + +struct gpucg; +struct gpucg_device; + +static inline struct gpucg *css_to_gpucg(struct cgroup_subsys_state *css) +{ + return NULL; +} + +static inline struct gpucg *gpucg_get(struct task_struct *task) +{ + return NULL; +} + +static inline void gpucg_put(struct gpucg *gpucg) {} + +static inline struct gpucg *gpucg_parent(struct gpucg *cg) +{ + return NULL; +} + +static inline int gpucg_try_charge(struct gpucg *gpucg, + struct gpucg_device *device, + u64 usage) +{ + return 0; +} + +static inline void gpucg_uncharge(struct gpucg *gpucg, + struct gpucg_device *device, + u64 usage) {} + +static inline void gpucg_register_device(struct gpucg_device *gpucg_dev, + const char *name) {} +#endif /* CONFIG_CGROUP_GPU */ +#endif /* _CGROUP_GPU_H */ diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index 445235487230..46a2a7b93c41 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -65,6 +65,10 @@ SUBSYS(rdma) SUBSYS(misc) #endif +#if IS_ENABLED(CONFIG_CGROUP_GPU) +SUBSYS(gpu) +#endif + /* * The following subsystems are not supported on the default hierarchy. */ diff --git a/init/Kconfig b/init/Kconfig index e9119bf54b1f..43568472930a 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -980,6 +980,13 @@ config BLK_CGROUP See Documentation/admin-guide/cgroup-v1/blkio-controller.rst for more information. +config CGROUP_GPU + bool "gpu cgroup controller (EXPERIMENTAL)" + select PAGE_COUNTER + help + Provides accounting and limit setting for memory allocations by the GPU and + GPU-related subsystems. + config CGROUP_WRITEBACK bool depends on MEMCG && BLK_CGROUP diff --git a/kernel/cgroup/Makefile b/kernel/cgroup/Makefile index 12f8457ad1f9..be95a5a532fc 100644 --- a/kernel/cgroup/Makefile +++ b/kernel/cgroup/Makefile @@ -7,3 +7,4 @@ obj-$(CONFIG_CGROUP_RDMA) += rdma.o obj-$(CONFIG_CPUSETS) += cpuset.o obj-$(CONFIG_CGROUP_MISC) += misc.o obj-$(CONFIG_CGROUP_DEBUG) += debug.o +obj-$(CONFIG_CGROUP_GPU) += gpu.o diff --git a/kernel/cgroup/gpu.c b/kernel/cgroup/gpu.c new file mode 100644 index 000000000000..3e9bfb45c6af --- /dev/null +++ b/kernel/cgroup/gpu.c @@ -0,0 +1,304 @@ +// SPDX-License-Identifier: MIT +// Copyright 2019 Advanced Micro Devices, Inc. +// Copyright (C) 2022 Google LLC. + +#include +#include +#include +#include +#include +#include + +static struct gpucg *root_gpucg __read_mostly; + +/* + * Protects list of resource pools maintained on per cgroup basis + * and list of devices registered for memory accounting using the GPU cgroup + * controller. + */ +static DEFINE_MUTEX(gpucg_mutex); +static LIST_HEAD(gpucg_devices); + +struct gpucg_resource_pool { + /* The device whose resource usage is tracked by this resource pool */ + struct gpucg_device *device; + + /* list of all resource pools for the cgroup */ + struct list_head cg_node; + + /* + * list maintained by the gpucg_device to keep track of its + * resource pools + */ + struct list_head dev_node; + + /* tracks memory usage of the resource pool */ + struct page_counter total; +}; + +static void free_cg_rpool_locked(struct gpucg_resource_pool *rpool) +{ + lockdep_assert_held(&gpucg_mutex); + + list_del(&rpool->cg_node); + list_del(&rpool->dev_node); + kfree(rpool); +} + +static void gpucg_css_free(struct cgroup_subsys_state *css) +{ + struct gpucg_resource_pool *rpool, *tmp; + struct gpucg *gpucg = css_to_gpucg(css); + + // delete all resource pools + mutex_lock(&gpucg_mutex); + list_for_each_entry_safe(rpool, tmp, &gpucg->rpools, cg_node) + free_cg_rpool_locked(rpool); + mutex_unlock(&gpucg_mutex); + + kfree(gpucg); +} + +static struct cgroup_subsys_state * +gpucg_css_alloc(struct cgroup_subsys_state *parent_css) +{ + struct gpucg *gpucg, *parent; + + gpucg = kzalloc(sizeof(struct gpucg), GFP_KERNEL); + if (!gpucg) + return ERR_PTR(-ENOMEM); + + parent = css_to_gpucg(parent_css); + if (!parent) + root_gpucg = gpucg; + + INIT_LIST_HEAD(&gpucg->rpools); + + return &gpucg->css; +} + +static struct gpucg_resource_pool *find_cg_rpool_locked( + struct gpucg *cg, + struct gpucg_device *device) +{ + struct gpucg_resource_pool *pool; + + lockdep_assert_held(&gpucg_mutex); + + list_for_each_entry(pool, &cg->rpools, cg_node) + if (pool->device == device) + return pool; + + return NULL; +} + +static struct gpucg_resource_pool *init_cg_rpool(struct gpucg *cg, + struct gpucg_device *device) +{ + struct gpucg_resource_pool *rpool = kzalloc(sizeof(*rpool), + GFP_KERNEL); + if (!rpool) + return ERR_PTR(-ENOMEM); + + rpool->device = device; + + page_counter_init(&rpool->total, NULL); + INIT_LIST_HEAD(&rpool->cg_node); + INIT_LIST_HEAD(&rpool->dev_node); + list_add_tail(&rpool->cg_node, &cg->rpools); + list_add_tail(&rpool->dev_node, &device->rpools); + + return rpool; +} + +/** + * get_cg_rpool_locked - find the resource pool for the specified device and + * specified cgroup. If the resource pool does not exist for the cg, it is + * created in a hierarchical manner in the cgroup and its ancestor cgroups who + * do not already have a resource pool entry for the device. + * + * @cg: The cgroup to find the resource pool for. + * @device: The device associated with the returned resource pool. + * + * Return: return resource pool entry corresponding to the specified device in + * the specified cgroup (hierarchically creating them if not existing already). + * + */ +static struct gpucg_resource_pool * +get_cg_rpool_locked(struct gpucg *cg, struct gpucg_device *device) +{ + struct gpucg *parent_cg, *p, *stop_cg; + struct gpucg_resource_pool *rpool, *tmp_rpool; + struct gpucg_resource_pool *parent_rpool = NULL, *leaf_rpool = NULL; + + rpool = find_cg_rpool_locked(cg, device); + if (rpool) + return rpool; + + stop_cg = cg; + do { + rpool = init_cg_rpool(stop_cg, device); + if (IS_ERR(rpool)) + goto err; + + if (!leaf_rpool) + leaf_rpool = rpool; + + stop_cg = gpucg_parent(stop_cg); + if (!stop_cg) + break; + + rpool = find_cg_rpool_locked(stop_cg, device); + } while (!rpool); + + /* + * Re-initialize page counters of all rpools created in this invocation + * to enable hierarchical charging. + * stop_cg is the first ancestor cg who already had a resource pool for + * the device. It can also be NULL if no ancestors had a pre-existing + * resource pool for the device before this invocation. + */ + rpool = leaf_rpool; + for (p = cg; p != stop_cg; p = parent_cg) { + parent_cg = gpucg_parent(p); + if (!parent_cg) + break; + parent_rpool = find_cg_rpool_locked(parent_cg, device); + page_counter_init(&rpool->total, &parent_rpool->total); + + rpool = parent_rpool; + } + + return leaf_rpool; +err: + for (p = cg; p != stop_cg; p = gpucg_parent(p)) { + tmp_rpool = find_cg_rpool_locked(p, device); + free_cg_rpool_locked(tmp_rpool); + } + return rpool; +} + +/** + * gpucg_try_charge - charge memory to the specified gpucg and gpucg_device. + * Caller must hold a reference to @gpucg obtained through gpucg_get(). The size + * of the memory is rounded up to be a multiple of the page size. + * + * @gpucg: The gpu cgroup to charge the memory to. + * @device: The device to charge the memory to. + * @usage: size of memory to charge in bytes. + * + * Return: returns 0 if the charging is successful and otherwise returns an + * error code. + */ +int gpucg_try_charge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage) +{ + struct page_counter *counter; + u64 nr_pages; + struct gpucg_resource_pool *rp; + int ret = 0; + + mutex_lock(&gpucg_mutex); + rp = get_cg_rpool_locked(gpucg, device); + /* + * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is + * freed and the caller is holding a reference to the gpucg. + */ + mutex_unlock(&gpucg_mutex); + + if (IS_ERR(rp)) + return PTR_ERR(rp); + + nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT; + if (page_counter_try_charge(&rp->total, nr_pages, &counter)) + css_get_many(&gpucg->css, nr_pages); + else + ret = -ENOMEM; + + return ret; +} + +/** + * gpucg_uncharge - uncharge memory from the specified gpucg and gpucg_device. + * The caller must hold a reference to @gpucg obtained through gpucg_get(). + * + * @gpucg: The gpu cgroup to uncharge the memory from. + * @device: The device to uncharge the memory from. + * @usage: size of memory to uncharge in bytes. + */ +void gpucg_uncharge(struct gpucg *gpucg, struct gpucg_device *device, u64 usage) +{ + u64 nr_pages; + struct gpucg_resource_pool *rp; + + mutex_lock(&gpucg_mutex); + rp = find_cg_rpool_locked(gpucg, device); + /* + * gpucg_mutex can be unlocked here, rp will stay valid until gpucg is + * freed and there are active refs on gpucg. + */ + mutex_unlock(&gpucg_mutex); + + if (unlikely(!rp)) { + pr_err("Resource pool not found, incorrect charge/uncharge ordering?\n"); + return; + } + + nr_pages = PAGE_ALIGN(usage) >> PAGE_SHIFT; + page_counter_uncharge(&rp->total, nr_pages); + css_put_many(&gpucg->css, nr_pages); +} + +/** + * gpucg_register_device - Registers a device for memory accounting using the + * GPU cgroup controller. + * + * @device: The device to register for memory accounting. + * @name: Pointer to a string literal to denote the name of the device. + * + * Both @device andd @name must remain valid. + */ +void gpucg_register_device(struct gpucg_device *device, const char *name) +{ + if (!device) + return; + + INIT_LIST_HEAD(&device->dev_node); + INIT_LIST_HEAD(&device->rpools); + + mutex_lock(&gpucg_mutex); + list_add_tail(&device->dev_node, &gpucg_devices); + mutex_unlock(&gpucg_mutex); + + device->name = name; +} + +static int gpucg_resource_show(struct seq_file *sf, void *v) +{ + struct gpucg_resource_pool *rpool; + struct gpucg *cg = css_to_gpucg(seq_css(sf)); + + mutex_lock(&gpucg_mutex); + list_for_each_entry(rpool, &cg->rpools, cg_node) { + seq_printf(sf, "%s %lu\n", rpool->device->name, + page_counter_read(&rpool->total) * PAGE_SIZE); + } + mutex_unlock(&gpucg_mutex); + + return 0; +} + +struct cftype files[] = { + { + .name = "memory.current", + .seq_show = gpucg_resource_show, + }, + { } /* terminate */ +}; + +struct cgroup_subsys gpu_cgrp_subsys = { + .css_alloc = gpucg_css_alloc, + .css_free = gpucg_css_free, + .early_init = false, + .legacy_cftypes = files, + .dfl_cftypes = files, +}; From patchwork Fri Feb 11 16:18:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "T.J. Mercier" X-Patchwork-Id: 80699 Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from ) id 1nIYdW-00EWxF-2F; Fri, 11 Feb 2022 16:19:06 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351377AbiBKQTF (ORCPT + 1 other); Fri, 11 Feb 2022 11:19:05 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:46992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351409AbiBKQTD (ORCPT ); Fri, 11 Feb 2022 11:19:03 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E5B52EB for ; Fri, 11 Feb 2022 08:18:59 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3-20020a250103000000b0061d99b7d0b8so19763585ybb.13 for ; Fri, 11 Feb 2022 08:18:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=sjycs3J1raw78h/CcVrcAvxXNJMm38YGwWJkg+hcSJk=; b=HCicQU9wOvrep9Zko1Sp6f1kcNJH7vA3zXU3TjVhgtj7ecoJwQbiD5zNwlfXKnd3Uk fKTPKQahF2g4Y6Yl7OC7kdWe7Xj8Eip01fQ6bgZwKtUVfM3OdrQeD+P6IfjP4J6lHMxA 3Wd8KCv9vSHVzLrGft5zgckgtzVcnmXlHGqDJHcNNED707tgW9lJ0MnWqLKY0kp2dvQ/ UUurtSGmuV/6T387NDPTkYw+EQTiIjeuuZAohK9DBXj7o64/5JeY3gPt76JuMQ8y3+Fu NHs0H9pPGSnyHfRViau6furuWINKT4TIcOBlhZR4UuboJcrJaNNaihVZN+Vu6nhzK9qs mgTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=sjycs3J1raw78h/CcVrcAvxXNJMm38YGwWJkg+hcSJk=; b=hEfFjnyoL2dGc3m7sbKa1Kxo0O14lBuMw7gqss+koSOwiPxtpVwoSC/+fWuGxAKflD h47nBsieCjghswf6rcHnLUL50HvIkvEoq8h5Fz4OjJksi3YIObit4YM9bG/dYdF46urp nZDet6HlTzjIq91khvnt/HOYWlPkKdYOhXigq7B0NITbbh62ZBZU5GEkoPi5hQgMDtDL kMHO4f3wsLyzf2vzk1yNRc1rp3nGLO6VyLMqDMctqSsZ3aWKMfcbcDZwJfd17eughyET LTi0ZLNtuUb4qKsZLliVQHCgDvgKLR9cZTGEsovktp5fja57uNreCYThK/2ktrwZZaTU oEoQ== X-Gm-Message-State: AOAM532xF2XJ3lpXCwb7Pp2ZlcZWONTedUwm7cCq3EgfGX6VI9TquQDX Ew6c3FPDkG+q/38AVs03knsRhn/hupXbATk= X-Google-Smtp-Source: ABdhPJyO8mB9+1hjpEI4c2iys3oGoAzHpKFN+ZZyeFAfnRemdzA9D7jOCwPVue2W9QmLyraSfSF4AB2bqc0vndI= X-Received: from tj2.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:187]) (user=tjmercier job=sendgmr) by 2002:a81:ae0c:: with SMTP id m12mr2454548ywh.19.1644596338464; Fri, 11 Feb 2022 08:18:58 -0800 (PST) Date: Fri, 11 Feb 2022 16:18:26 +0000 In-Reply-To: <20220211161831.3493782-1-tjmercier@google.com> Message-Id: <20220211161831.3493782-4-tjmercier@google.com> Mime-Version: 1.0 References: <20220211161831.3493782-1-tjmercier@google.com> X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog Subject: [RFC v2 3/6] dmabuf: Use the GPU cgroup charge/uncharge APIs From: "T.J. Mercier" To: Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Jonathan Corbet , Greg Kroah-Hartman , " =?utf-8?q?Arve_Hj=C3=B8n?= =?utf-8?q?nev=C3=A5g?= " , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Hridya Valsaraju , Suren Baghdasaryan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Benjamin Gaignard , Liam Mark , Laura Abbott , Brian Starkey , John Stultz , Tejun Heo , Zefan Li , Johannes Weiner Cc: kaleshsingh@google.com, Kenny.Ho@amd.com, "T.J. Mercier" , dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -12.3 (------------) X-LSpam-Report: No, score=-12.3 required=5.0 tests=BAYES_00=-1.9,DKIMWL_WL_MED=0.001,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1,RCVD_IN_DNSWL_MED=-2.3,USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no This patch uses the GPU cgroup charge/uncharge APIs to charge buffers allocated by any DMA-BUF exporter that exports a buffer with a GPU cgroup device association. By doing so, it becomes possible to track who allocated/exported a DMA-BUF even after the allocating process drops all references to a buffer. From: Hridya Valsaraju Signed-off-by: Hridya Valsaraju Co-developed-by: T.J. Mercier Signed-off-by: T.J. Mercier --- changes in v2 - Move dma-buf cgroup charging/uncharging from a dma_buf_op defined by every heap to a single dma-buf function for all heaps per Daniel Vetter and Christian König. drivers/dma-buf/dma-buf.c | 52 +++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 20 +++++++++++++-- 2 files changed, 70 insertions(+), 2 deletions(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 602b12d7470d..83d0d1b91547 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -56,6 +56,50 @@ static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen) dentry->d_name.name, ret > 0 ? name : ""); } +#ifdef CONFIG_CGROUP_GPU +static inline struct gpucg_device * +exp_info_gpucg_dev(const struct dma_buf_export_info *exp_info) +{ + return exp_info->gpucg_dev; +} + +static bool dmabuf_try_charge(struct dma_buf *dmabuf, + struct gpucg_device *gpucg_dev) +{ + dmabuf->gpucg = gpucg_get(current); + dmabuf->gpucg_dev = gpucg_dev; + if (gpucg_try_charge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size)) { + gpucg_put(dmabuf->gpucg); + dmabuf->gpucg = NULL; + dmabuf->gpucg_dev = NULL; + return false; + } + return true; +} + +static void dmabuf_uncharge(struct dma_buf *dmabuf) +{ + if (dmabuf->gpucg && dmabuf->gpucg_dev) { + gpucg_uncharge(dmabuf->gpucg, dmabuf->gpucg_dev, dmabuf->size); + gpucg_put(dmabuf->gpucg); + } +} +#else /* CONFIG_CGROUP_GPU */ +static inline struct gpucg_device *exp_info_gpucg_dev( +const struct dma_buf_export_info *exp_info) +{ + return NULL; +} + +static inline bool dmabuf_try_charge(struct dma_buf *dmabuf, + struct gpucg_device *gpucg_dev)) +{ + return false; +} + +static inline void dmabuf_uncharge(struct dma_buf *dmabuf) {} +#endif /* CONFIG_CGROUP_GPU */ + static void dma_buf_release(struct dentry *dentry) { struct dma_buf *dmabuf; @@ -79,6 +123,8 @@ static void dma_buf_release(struct dentry *dentry) if (dmabuf->resv == (struct dma_resv *)&dmabuf[1]) dma_resv_fini(dmabuf->resv); + dmabuf_uncharge(dmabuf); + WARN_ON(!list_empty(&dmabuf->attachments)); module_put(dmabuf->owner); kfree(dmabuf->name); @@ -484,6 +530,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) { struct dma_buf *dmabuf; struct dma_resv *resv = exp_info->resv; + struct gpucg_device *gpucg_dev = exp_info_gpucg_dev(exp_info); struct file *file; size_t alloc_size = sizeof(struct dma_buf); int ret; @@ -534,6 +581,9 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) } dmabuf->resv = resv; + if (gpucg_dev && !dmabuf_try_charge(dmabuf, gpucg_dev)) + goto err_charge; + file = dma_buf_getfile(dmabuf, exp_info->flags); if (IS_ERR(file)) { ret = PTR_ERR(file); @@ -565,6 +615,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info) file->f_path.dentry->d_fsdata = NULL; fput(file); err_dmabuf: + dmabuf_uncharge(dmabuf); +err_charge: kfree(dmabuf); err_module: module_put(exp_info->owner); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 7ab50076e7a6..742f29c3daaf 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -13,6 +13,7 @@ #ifndef __DMA_BUF_H__ #define __DMA_BUF_H__ +#include #include #include #include @@ -303,7 +304,7 @@ struct dma_buf { /** * @size: * - * Size of the buffer; invariant over the lifetime of the buffer. + * Size of the buffer in bytes; invariant over the lifetime of the buffer. */ size_t size; @@ -453,6 +454,17 @@ struct dma_buf { struct dma_buf *dmabuf; } *sysfs_entry; #endif + +#ifdef CONFIG_CGROUP_GPU + /** @gpucg: Pointer to the cgroup this buffer currently belongs to. */ + struct gpucg *gpucg; + + /** @gpucg_dev: + * + * Pointer to the cgroup GPU device whence this buffer originates. + */ + struct gpucg_device *gpucg_dev; +#endif }; /** @@ -529,9 +541,10 @@ struct dma_buf_attachment { * @exp_name: name of the exporter - useful for debugging. * @owner: pointer to exporter module - used for refcounting kernel module * @ops: Attach allocator-defined dma buf ops to the new buffer - * @size: Size of the buffer - invariant over the lifetime of the buffer + * @size: Size of the buffer in bytes - invariant over the lifetime of the buffer * @flags: mode flags for the file * @resv: reservation-object, NULL to allocate default one + * @gpucg_dev: pointer to the gpu cgroup device this buffer belongs to * @priv: Attach private data of allocator to this buffer * * This structure holds the information required to export the buffer. Used @@ -544,6 +557,9 @@ struct dma_buf_export_info { size_t size; int flags; struct dma_resv *resv; +#ifdef CONFIG_CGROUP_GPU + struct gpucg_device *gpucg_dev; +#endif void *priv; }; From patchwork Fri Feb 11 16:18:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "T.J. Mercier" X-Patchwork-Id: 80700 Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from ) id 1nIYdY-00EWxr-2N; Fri, 11 Feb 2022 16:19:09 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351421AbiBKQTG (ORCPT + 1 other); Fri, 11 Feb 2022 11:19:06 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:47010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351429AbiBKQTE (ORCPT ); Fri, 11 Feb 2022 11:19:04 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C1CC302 for ; Fri, 11 Feb 2022 08:19:03 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id z12-20020a17090a1fcc00b001b63e477f9fso5989031pjz.3 for ; Fri, 11 Feb 2022 08:19:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=H9ElGyQs3gn6zkZHkaswT2uUcjDtIMKU0yBKyPd+zRw=; b=pqlxHsof189xym//3NOb/X4sUEqabODThOYWzG6aE6YJ7NVdPjULt2IJ1cr9fGd286 TqN8Vwy43pbjAeV0VM9D9lBgLG4vSp0hd5VeYAKLDd4c+fBW8AVWzET7AUF9gs3d9DnW gj6vH7NfEcDQIT5DTgijFUOnIfHCxeO7XgA/CHN9DVWmhsFC84dmykrVDNiTyVAb0Uga lAiN/rH84g/hNRQgrPyaLunHyngW5zJcDyWr3zfhJNVeBxDY4SR5wMNbeFGNJd7Ub9kP G4ZlAh2HeMqycTHUESlobX2v63RZoLJURMn7i3CeIYfJRL5Jz9s0DxOG0Oh6xijpBQgi i4IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=H9ElGyQs3gn6zkZHkaswT2uUcjDtIMKU0yBKyPd+zRw=; b=SdyS99SWukpNdUgsGqxzuv9J+jSNCX6d8quK888hAkGIc3Wzt/KlY1t/QSuxOHaeIf UEAlMpwv6eTPGaDIvwwrYYjdPg2bD68pjxKBjcVDgR974ik94TKZBPPFtGC78yERXzIm KQAPDwh3YOihz2LmkUM1Abi10v1eD2cAAJOqr9BByVQ6waNlJmX0I2sAW+fdrD1YxQym md/WTYmIW+k9tmust1fgznYghOI6Lh3xdoBQystuBIZAQF5FOxCgsg4IScUSI9y7NDdV fe6q1gPHy6ZCXMUavItHQjqZuaZflZpBkGL5Ugn3DibwlT6G9pZvvxpjYd9GP2reOoZZ lUIw== X-Gm-Message-State: AOAM5300XyuYPvqqEgZYjo8zeyhue+9gOvJs66yxohHLiwqqXsUVKd8i s5JeW8Hir4VVXL2XRORBf0nbPU/aLzqnA+A= X-Google-Smtp-Source: ABdhPJw+X052tlo7rO4R6ZJggau2VhHkvqL4TWCXAHP2/z0pcE7Hcx1yJPi6R86BW0yLQibs4VQNry7tP5PL4gI= X-Received: from tj2.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:187]) (user=tjmercier job=sendgmr) by 2002:aa7:8883:: with SMTP id z3mr2365344pfe.65.1644596342665; Fri, 11 Feb 2022 08:19:02 -0800 (PST) Date: Fri, 11 Feb 2022 16:18:27 +0000 In-Reply-To: <20220211161831.3493782-1-tjmercier@google.com> Message-Id: <20220211161831.3493782-5-tjmercier@google.com> Mime-Version: 1.0 References: <20220211161831.3493782-1-tjmercier@google.com> X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog Subject: [RFC v2 4/6] dmabuf: heaps: export system_heap buffers with GPU cgroup charging From: "T.J. Mercier" To: Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Jonathan Corbet , Greg Kroah-Hartman , " =?utf-8?q?Arve_Hj=C3=B8n?= =?utf-8?q?nev=C3=A5g?= " , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Hridya Valsaraju , Suren Baghdasaryan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Benjamin Gaignard , Liam Mark , Laura Abbott , Brian Starkey , John Stultz , Tejun Heo , Zefan Li , Johannes Weiner Cc: kaleshsingh@google.com, Kenny.Ho@amd.com, "T.J. Mercier" , dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -12.3 (------------) X-LSpam-Report: No, score=-12.3 required=5.0 tests=BAYES_00=-1.9,DKIMWL_WL_MED=0.001,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1,RCVD_IN_DNSWL_MED=-2.3,USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no All DMA heaps now register a new GPU cgroup device upon creation, and the system_heap now exports buffers associated with its GPU cgroup device for tracking purposes. From: Hridya Valsaraju Signed-off-by: Hridya Valsaraju Co-developed-by: T.J. Mercier Signed-off-by: T.J. Mercier --- changes in v2 - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every heap to a single dma-buf function for all heaps per Daniel Vetter and Christian König. drivers/dma-buf/dma-heap.c | 27 +++++++++++++++++++++++++++ drivers/dma-buf/heaps/system_heap.c | 3 +++ include/linux/dma-heap.h | 11 +++++++++++ 3 files changed, 41 insertions(+) diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index 8f5848aa144f..885072427775 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -7,6 +7,7 @@ */ #include +#include #include #include #include @@ -31,6 +32,7 @@ * @heap_devt heap device node * @list list head connecting to list of heaps * @heap_cdev heap char device + * @gpucg_dev gpu cgroup device for memory accounting * * Represents a heap of memory from which buffers can be made. */ @@ -41,6 +43,9 @@ struct dma_heap { dev_t heap_devt; struct list_head list; struct cdev heap_cdev; +#ifdef CONFIG_CGROUP_GPU + struct gpucg_device gpucg_dev; +#endif }; static LIST_HEAD(heap_list); @@ -216,6 +221,26 @@ const char *dma_heap_get_name(struct dma_heap *heap) return heap->name; } +#ifdef CONFIG_CGROUP_GPU +/** + * dma_heap_get_gpucg_dev() - get struct gpucg_device for the heap. + * @heap: DMA-Heap to get the gpucg_device struct for. + * + * Returns: + * The gpucg_device struct for the heap. NULL if the GPU cgroup controller is + * not enabled. + */ +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap) +{ + return &heap->gpucg_dev; +} +#else /* CONFIG_CGROUP_GPU */ +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap) +{ + return NULL; +} +#endif /* CONFIG_CGROUP_GPU */ + struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) { struct dma_heap *heap, *h, *err_ret; @@ -288,6 +313,8 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) list_add(&heap->list, &heap_list); mutex_unlock(&heap_list_lock); + gpucg_register_device(dma_heap_get_gpucg_dev(heap), exp_info->name); + return heap; err2: diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index ab7fd896d2c4..752a05c3cfe2 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -395,6 +395,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, exp_info.ops = &system_heap_buf_ops; exp_info.size = buffer->len; exp_info.flags = fd_flags; +#ifdef CONFIG_CGROUP_GPU + exp_info.gpucg_dev = dma_heap_get_gpucg_dev(heap); +#endif exp_info.priv = buffer; dmabuf = dma_buf_export(&exp_info); if (IS_ERR(dmabuf)) { diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h index 0c05561cad6e..e447a61d054e 100644 --- a/include/linux/dma-heap.h +++ b/include/linux/dma-heap.h @@ -10,6 +10,7 @@ #define _DMA_HEAPS_H #include +#include #include struct dma_heap; @@ -59,6 +60,16 @@ void *dma_heap_get_drvdata(struct dma_heap *heap); */ const char *dma_heap_get_name(struct dma_heap *heap); +/** + * dma_heap_get_gpucg_dev() - get a pointer to the struct gpucg_device for the + * heap. + * @heap: DMA-Heap to retrieve gpucg_device for. + * + * Returns: + * The gpucg_device struct for the heap. + */ +struct gpucg_device *dma_heap_get_gpucg_dev(struct dma_heap *heap); + /** * dma_heap_add - adds a heap to dmabuf heaps * @exp_info: information needed to register this heap From patchwork Fri Feb 11 16:18:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "T.J. Mercier" X-Patchwork-Id: 80701 Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from ) id 1nIYdk-00EWyF-Dd; Fri, 11 Feb 2022 16:19:20 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351426AbiBKQTS (ORCPT + 1 other); Fri, 11 Feb 2022 11:19:18 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:47156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351442AbiBKQTP (ORCPT ); Fri, 11 Feb 2022 11:19:15 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D83A12F7 for ; Fri, 11 Feb 2022 08:19:07 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id s133-20020a252c8b000000b0062112290d0bso4762318ybs.23 for ; Fri, 11 Feb 2022 08:19:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=yvZODKlks7uSqzZPx/QuM2FpvgBeZ2Sk/1s0RF7z1Xc=; b=ec1Yy1HYUYsk8sjBv+NO8SvcoXRcbLd5uiv9OPxDsNNur23ie7DmHnG9pRO3ilNRJc 1WvllX32ogWUY0ES0dpiEE0PkkZAqEtGHDkIOKsW2C7o8MPYg3yfWoNyM+WdEiwbD35T Wm/nnsqgsS4c257blD2h2q3WLvab8g25Dj83ORJVcsTjGYJyVICqcM/zkGknDnaHwtni S4ZDOVi8GlgnNuE1vqJ9LyP4yV9NB3a+ujEPEu7FgVA5ADIEY9Yho6UbQZlXhZiHM4fN WFpGmfjmGs9WnZVI4xN8jGbbfCl8+QIJaWkRzPaFr5hx6+gvkC/2/iIAsFPFyG5TXVMH vNVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=yvZODKlks7uSqzZPx/QuM2FpvgBeZ2Sk/1s0RF7z1Xc=; b=of1jwTNWXkS0pemOt0rN0Mfq5jtRCnHObVa0SLbvZs1B3oggEwFOp5nZv8vuSuDelC 1xBG9Zu3UOCq9VFgef1tEAhgnNNFaZA9Jt3LOGy4GMUHW2ECzzRLC7mQ5Jj16n/4QYId x4DpMseQ+KXTJv8zIFIo3FHRjXq7FsaBKAo5AZLyJvTCBKLP27Sufi3O/YNRd7S/L4xY X2/saMOniV3F2K2tqwQJRB5/Uh4Nvf1Er2Cx9F4TlpdTRNL4SQE7Rpa1L8dRbgzxQxxC ph4Hn7PKUNNuUi2SRn8YyrfxB6/8YaSq/ZEwIorwFmQH3n6Aj2ntyaaeXZGnOkmOxAH3 QLTA== X-Gm-Message-State: AOAM530uCm8ll31loX4afzQ9lX6z6HGz9VgPHvlbJU46dW35nw6Y31FG uVnhayy1m+L+71cwkkp89ejgMlIKBmb1S2o= X-Google-Smtp-Source: ABdhPJyzRZpguSr08v/JtkpnGzL6Z+Soi+F7TZB5Jn0SLwqv4tx2BBp/V2mC7+AUgYjetHM8LbPfONmE/Uc6VpY= X-Received: from tj2.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:187]) (user=tjmercier job=sendgmr) by 2002:a81:2d03:: with SMTP id t3mr2496118ywt.215.1644596347024; Fri, 11 Feb 2022 08:19:07 -0800 (PST) Date: Fri, 11 Feb 2022 16:18:28 +0000 In-Reply-To: <20220211161831.3493782-1-tjmercier@google.com> Message-Id: <20220211161831.3493782-6-tjmercier@google.com> Mime-Version: 1.0 References: <20220211161831.3493782-1-tjmercier@google.com> X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog Subject: [RFC v2 5/6] dmabuf: Add gpu cgroup charge transfer function From: "T.J. Mercier" To: Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Jonathan Corbet , Greg Kroah-Hartman , " =?utf-8?q?Arve_Hj=C3=B8n?= =?utf-8?q?nev=C3=A5g?= " , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Hridya Valsaraju , Suren Baghdasaryan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Benjamin Gaignard , Liam Mark , Laura Abbott , Brian Starkey , John Stultz , Tejun Heo , Zefan Li , Johannes Weiner Cc: kaleshsingh@google.com, Kenny.Ho@amd.com, "T.J. Mercier" , dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -12.3 (------------) X-LSpam-Report: No, score=-12.3 required=5.0 tests=BAYES_00=-1.9,DKIMWL_WL_MED=0.001,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1,RCVD_IN_DNSWL_MED=-2.3,USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no The dma_buf_charge_transfer function provides a way for processes to transfer charge of a buffer to a different process. This is essential for the cases where a central allocator process does allocations for various subsystems, hands over the fd to the client who requested the memory and drops all references to the allocated memory. From: Hridya Valsaraju Signed-off-by: Hridya Valsaraju Co-developed-by: T.J. Mercier Signed-off-by: T.J. Mercier --- changes in v2 - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every heap to a single dma-buf function for all heaps per Daniel Vetter and Christian König. drivers/dma-buf/dma-buf.c | 48 +++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 2 ++ 2 files changed, 50 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 83d0d1b91547..55e1b982f840 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -1374,6 +1374,54 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map) } EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap, DMA_BUF); +/** + * dma_buf_charge_transfer - Change the GPU cgroup to which the provided dma_buf + * is charged. + * @dmabuf: [in] buffer whose charge will be migrated to a different GPU + * cgroup + * @gpucg: [in] the destination GPU cgroup for dmabuf's charge + * + * Only tasks that belong to the same cgroup the buffer is currently charged to + * may call this function, otherwise it will return -EPERM. + * + * Returns 0 on success, or a negative errno code otherwise. + */ +int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg) +{ +#ifdef CONFIG_CGROUP_GPU + struct gpucg *current_gpucg; + int ret = 0; + + /* + * Verify that the cgroup of the process requesting the transfer is the + * same as the one the buffer is currently charged to. + */ + current_gpucg = gpucg_get(current); + mutex_lock(&dmabuf->lock); + if (current_gpucg != dmabuf->gpucg) { + ret = -EPERM; + goto err; + } + + ret = gpucg_try_charge(gpucg, dmabuf->gpucg_dev, dmabuf->size); + if (ret) + goto err; + + dmabuf->gpucg = gpucg; + + /* uncharge the buffer from the cgroup it's currently charged to. */ + gpucg_uncharge(current_gpucg, dmabuf->gpucg_dev, dmabuf->size); + +err: + mutex_unlock(&dmabuf->lock); + gpucg_put(current_gpucg); + return ret; +#else + return 0; +#endif /* CONFIG_CGROUP_GPU */ +} +EXPORT_SYMBOL_NS_GPL(dma_buf_charge_transfer, DMA_BUF); + #ifdef CONFIG_DEBUG_FS static int dma_buf_debug_show(struct seq_file *s, void *unused) { diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 742f29c3daaf..85c940c08867 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -646,4 +646,6 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map); void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map); + +int dma_buf_charge_transfer(struct dma_buf *dmabuf, struct gpucg *gpucg); #endif /* __DMA_BUF_H__ */ From patchwork Fri Feb 11 16:18:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "T.J. Mercier" X-Patchwork-Id: 80702 Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from ) id 1nIYdt-00EWyY-8D; Fri, 11 Feb 2022 16:19:29 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351439AbiBKQT1 (ORCPT + 1 other); Fri, 11 Feb 2022 11:19:27 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:47412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351461AbiBKQTP (ORCPT ); Fri, 11 Feb 2022 11:19:15 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E04FDD8E for ; Fri, 11 Feb 2022 08:19:12 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id c5-20020a25f305000000b0061dd6123f18so19723473ybs.17 for ; Fri, 11 Feb 2022 08:19:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc:content-transfer-encoding; bh=09UL10d0I9UYF+bVDoR9sZ50jMDUcLUx49HZh4jVL78=; b=JE1EYWsWmJMVKKKziQTWEWuNUsyaPxqCZ/opyUrIE1QJ2zMzT8OF9HdxhY+GhtuCaK HyDDiusYJME2veTyqvuwR1555jPMqz95nnShokOB0VNQWEe93IdgXongToVPd90O3dAV IimzoYc1rIm15su0G+eGvhoRyruDLS+nZbjAQSDDWgXzx6DUp60nKevkdTlnOcGS/yxC 7VDKvQJbgIkNVOdKUHyv+vhK1BCP41v5/xbIT27E+bix812dIwb5ma26aGylR3U5/hbr /YZu4e+ldXsQQ0aLJTqhpqIkZ/qs9fg8ieI0b6EipnVFoSEM/clguN4lkL+c5q3or2+f JMAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc:content-transfer-encoding; bh=09UL10d0I9UYF+bVDoR9sZ50jMDUcLUx49HZh4jVL78=; b=DUy+llxvqy/hiVh1O2i7J0wngF727l9y8WW37HSgSOQb+oXCBqxSu34tu/zAqlx25D adc5S52SXAwgz7+tjw7CEdQ0bmOIrqA5mWH12ZLUEJ9w2ptEhjlE91S/K8qn+gvkHnOI Q70l5Ksp7GzAAblo/lIp7nPU8WFx3kQ/FeeUlbH4iKfPL+4lp1qaFyqISDA94h8J0D8i M6tRZJcaI14W5wpTdF7WVg8zdFN2O0CiRFv13iAu4ps4ajHR8QCQS1Dx1b0ZHjvHC2E3 pIW3ml9VvjIY0XwLvuUJjEKI26e627k/nRovWrfo+UJhcTcMJTFRpb6Us78DNiBdu3NJ W5QA== X-Gm-Message-State: AOAM531OFzbrtRyIoM8gy4rNvZyxk6K2X3w5PQ2MtG1o637xYQC1uAH5 NBdaCezzOyXsP/KfGGbuXkxWdvZUQmAhBNo= X-Google-Smtp-Source: ABdhPJyt28xsiqi+oRkwIgJ74WzjQyOoBVx2xkYnBQvOje0p3ZB58ZSdJeKE27JlSq7ErJdcjRoIy8BigQrNax8= X-Received: from tj2.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:187]) (user=tjmercier job=sendgmr) by 2002:a81:d007:: with SMTP id v7mr2415829ywi.88.1644596352146; Fri, 11 Feb 2022 08:19:12 -0800 (PST) Date: Fri, 11 Feb 2022 16:18:29 +0000 In-Reply-To: <20220211161831.3493782-1-tjmercier@google.com> Message-Id: <20220211161831.3493782-7-tjmercier@google.com> Mime-Version: 1.0 References: <20220211161831.3493782-1-tjmercier@google.com> X-Mailer: git-send-email 2.35.1.265.g69c8d7142f-goog Subject: [RFC v2 6/6] android: binder: Add a buffer flag to relinquish ownership of fds From: "T.J. Mercier" To: Maarten Lankhorst , Maxime Ripard , Thomas Zimmermann , David Airlie , Daniel Vetter , Jonathan Corbet , Greg Kroah-Hartman , " =?utf-8?q?Arve_Hj=C3=B8n?= =?utf-8?q?nev=C3=A5g?= " , Todd Kjos , Martijn Coenen , Joel Fernandes , Christian Brauner , Hridya Valsaraju , Suren Baghdasaryan , Sumit Semwal , " =?utf-8?q?Christian_K=C3=B6nig?= " , Benjamin Gaignard , Liam Mark , Laura Abbott , Brian Starkey , John Stultz , Tejun Heo , Zefan Li , Johannes Weiner Cc: kaleshsingh@google.com, Kenny.Ho@amd.com, "T.J. Mercier" , dri-devel@lists.freedesktop.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org, cgroups@vger.kernel.org X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -12.3 (------------) X-LSpam-Report: No, score=-12.3 required=5.0 tests=BAYES_00=-1.9,DKIMWL_WL_MED=0.001,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1,RCVD_IN_DNSWL_MED=-2.3,USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no This patch introduces a buffer flag BINDER_BUFFER_FLAG_SENDER_NO_NEED that a process sending an fd array to another process over binder IPC can set to relinquish ownership of the fds being sent for memory accounting purposes. If the flag is found to be set during the fd array translation and the fd is for a DMA-BUF, the buffer is uncharged from the sender's cgroup and charged to the receiving process's cgroup instead. It is up to the sending process to ensure that it closes the fds regardless of whether the transfer failed or succeeded. Most graphics shared memory allocations in Android are done by the graphics allocator HAL process. On requests from clients, the HAL process allocates memory and sends the fds to the clients over binder IPC. The graphics allocator HAL will not retain any references to the buffers. When the HAL sets the BINDER_BUFFER_FLAG_SENDER_NO_NEED for fd arrays holding DMA-BUF fds, the gpu cgroup controller will be able to correctly charge the buffers to the client processes instead of the graphics allocator HAL. From: Hridya Valsaraju Signed-off-by: Hridya Valsaraju Co-developed-by: T.J. Mercier Signed-off-by: T.J. Mercier --- changes in v2 - Move dma-buf cgroup charge transfer from a dma_buf_op defined by every heap to a single dma-buf function for all heaps per Daniel Vetter and Christian König. drivers/android/binder.c | 26 ++++++++++++++++++++++++++ include/uapi/linux/android/binder.h | 1 + 2 files changed, 27 insertions(+) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 8351c5638880..f50d88ded188 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -42,6 +42,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include #include #include #include @@ -2482,8 +2483,10 @@ static int binder_translate_fd_array(struct list_head *pf_head, { binder_size_t fdi, fd_buf_size; binder_size_t fda_offset; + bool transfer_gpu_charge = false; const void __user *sender_ufda_base; struct binder_proc *proc = thread->proc; + struct binder_proc *target_proc = t->to_proc; int ret; fd_buf_size = sizeof(u32) * fda->num_fds; @@ -2521,8 +2524,15 @@ static int binder_translate_fd_array(struct list_head *pf_head, if (ret) return ret; + if (IS_ENABLED(CONFIG_CGROUP_GPU) && + parent->flags & BINDER_BUFFER_FLAG_SENDER_NO_NEED) + transfer_gpu_charge = true; + for (fdi = 0; fdi < fda->num_fds; fdi++) { u32 fd; + struct dma_buf *dmabuf; + struct gpucg *gpucg; + binder_size_t offset = fda_offset + fdi * sizeof(fd); binder_size_t sender_uoffset = fdi * sizeof(fd); @@ -2532,6 +2542,22 @@ static int binder_translate_fd_array(struct list_head *pf_head, in_reply_to); if (ret) return ret > 0 ? -EINVAL : ret; + + if (!transfer_gpu_charge) + continue; + + dmabuf = dma_buf_get(fd); + if (IS_ERR(dmabuf)) + continue; + + gpucg = gpucg_get(target_proc->tsk); + ret = dma_buf_charge_transfer(dmabuf, gpucg); + if (ret) { + pr_warn("%d:%d Unable to transfer DMA-BUF fd charge to %d", + proc->pid, thread->pid, target_proc->pid); + gpucg_put(gpucg); + } + dma_buf_put(dmabuf); } return 0; } diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h index 3246f2c74696..169fd5069a1a 100644 --- a/include/uapi/linux/android/binder.h +++ b/include/uapi/linux/android/binder.h @@ -137,6 +137,7 @@ struct binder_buffer_object { enum { BINDER_BUFFER_FLAG_HAS_PARENT = 0x01, + BINDER_BUFFER_FLAG_SENDER_NO_NEED = 0x02, }; /* struct binder_fd_array_object - object describing an array of fds in a buffer