Message ID | 20201127164131.2244124-13-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers |
Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from <linux-media-owner@vger.kernel.org>) id 1kigq5-00Cqiq-6P; Fri, 27 Nov 2020 16:43:18 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731864AbgK0QmQ (ORCPT <rfc822;mkrufky@linuxtv.org> + 1 other); Fri, 27 Nov 2020 11:42:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731852AbgK0QmP (ORCPT <rfc822;linux-media@vger.kernel.org>); Fri, 27 Nov 2020 11:42:15 -0500 Received: from mail-wr1-x441.google.com (mail-wr1-x441.google.com [IPv6:2a00:1450:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 077FBC0613D4 for <linux-media@vger.kernel.org>; Fri, 27 Nov 2020 08:42:15 -0800 (PST) Received: by mail-wr1-x441.google.com with SMTP id 23so6210186wrc.8 for <linux-media@vger.kernel.org>; Fri, 27 Nov 2020 08:42:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bZLO6vEx95nUtx6CuKWTcHXqu7slblRq/asiF9n4R9Q=; b=bfdMAxwy52rgRxtwprlTR36vbL/WuJBHR7YFD1aEFvUfH365MSxsaWLSUxCKD/m3HH 9nXqhdkr9snvldILUH6ADf93IxhnUGSuGEHBM8xxC+qAhJxpphHYwfOeuyQBEI6wKeux /HawjaGC1sroDImXZqjz7mDR8TMDzj7/0OJ0A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bZLO6vEx95nUtx6CuKWTcHXqu7slblRq/asiF9n4R9Q=; b=j3o8ypnAmDr3vmHnT0YWQnPUUb8P3R+aRU0uSE7j+ajHZX3uPN9Nq7GdGlQcldA22j wJd0E2rJuEd2YUP4GhLr3Q1Zl6p1L0Yrna1JPFZ4OK+/hndjIjoYw5XK+rKuWSOvjopU 5LsFdK6UdxXvIa1YyaKiZzMsvUzPwNt/yqpPzRa5gY67ki3ZTD1y+3tv/qyexKoAtAMy cU7KRO5VRc+i9edDIBeNl3WOQ9ulig/MKBqO1Biey8k/81c7pFz5jlpLu+z1HjXeuh3k y7rMCcl1lhzgHOBIhVfZtytwhdXf8QGV+opBqLkMdOO9ynEbw1fRBQSw7BSijWrLNFww nGkg== X-Gm-Message-State: AOAM530r7lGXUcAY8X/ME4tmZEE3ddKOdyAG0vGYbjjXuV52vN7y3QJp ocbCyXgKpMSj9+qP0zLjRfIKag== X-Google-Smtp-Source: ABdhPJxUCkQ6HFi0cbvCMkW5tNlgAvvq7bqvbKRVoYW9jxmRXl8bRVyZY0G/0QWerxBubewFkTik7A== X-Received: by 2002:adf:f8d2:: with SMTP id f18mr11408583wrq.379.1606495333818; Fri, 27 Nov 2020 08:42:13 -0800 (PST) Received: from phenom.ffwll.local ([2a02:168:57f4:0:efd0:b9e5:5ae6:c2fa]) by smtp.gmail.com with ESMTPSA id q12sm14859078wrx.86.2020.11.27.08.42.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Nov 2020 08:42:13 -0800 (PST) From: Daniel Vetter <daniel.vetter@ffwll.ch> To: DRI Development <dri-devel@lists.freedesktop.org>, LKML <linux-kernel@vger.kernel.org> Cc: kvm@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-media@vger.kernel.org, Daniel Vetter <daniel.vetter@ffwll.ch>, Bjorn Helgaas <bhelgaas@google.com>, Dan Williams <dan.j.williams@intel.com>, Daniel Vetter <daniel.vetter@intel.com>, Jason Gunthorpe <jgg@ziepe.ca>, Kees Cook <keescook@chromium.org>, Andrew Morton <akpm@linux-foundation.org>, John Hubbard <jhubbard@nvidia.com>, =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= <jglisse@redhat.com>, Jan Kara <jack@suse.cz>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, linux-pci@vger.kernel.org Subject: [PATCH v7 12/17] PCI: Revoke mappings like devmem Date: Fri, 27 Nov 2020 17:41:26 +0100 Message-Id: <20201127164131.2244124-13-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201127164131.2244124-1-daniel.vetter@ffwll.ch> References: <20201127164131.2244124-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-media.vger.kernel.org> X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -2.5 (--) X-LSpam-Report: No, score=-2.5 required=5.0 tests=BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1 autolearn=ham autolearn_force=no |
Series |
follow_pfn and other iomap races
|
|
Commit Message
Daniel Vetter
Nov. 27, 2020, 4:41 p.m. UTC
Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region") /dev/kmem zaps ptes when the kernel requests exclusive acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is the default for all driver uses. Except there's two more ways to access PCI BARs: sysfs and proc mmap support. Let's plug that hole. For revoke_devmem() to work we need to link our vma into the same address_space, with consistent vma->vm_pgoff. ->pgoff is already adjusted, because that's how (io_)remap_pfn_range works, but for the mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is to adjust this at at ->open time: - for sysfs this is easy, now that binary attributes support this. We just set bin_attr->mapping when mmap is supported - for procfs it's a bit more tricky, since procfs pci access has only one file per device, and access to a specific resources first needs to be set up with some ioctl calls. But mmap is only supported for the same resources as sysfs exposes with mmap support, and otherwise rejected, so we can set the mapping unconditionally at open time without harm. A special consideration is for arch_can_pci_mmap_io() - we need to make sure that the ->f_mapping doesn't alias between ioport and iomem space. There's only 2 ways in-tree to support mmap of ioports: generic pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single architecture hand-rolling. Both approach support ioport mmap through a special pfn range and not through magic pte attributes. Aliasing is therefore not a problem. The only difference in access checks left is that sysfs PCI mmap does not check for CAP_RAWIO. I'm not really sure whether that should be added or not. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Kees Cook <keescook@chromium.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> -- v2: - Totally new approach: Adjust filp->f_mapping at open time. Note that this now works on all architectures, not just those support ARCH_GENERIC_PCI_MMAP_RESOURCE --- drivers/pci/pci-sysfs.c | 4 ++++ drivers/pci/proc.c | 1 + 2 files changed, 5 insertions(+)
Comments
On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > the region") /dev/kmem zaps ptes when the kernel requests exclusive > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > the default for all driver uses. > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > support. Let's plug that hole. > > For revoke_devmem() to work we need to link our vma into the same > address_space, with consistent vma->vm_pgoff. ->pgoff is already > adjusted, because that's how (io_)remap_pfn_range works, but for the > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > to adjust this at at ->open time: > > - for sysfs this is easy, now that binary attributes support this. We > just set bin_attr->mapping when mmap is supported > - for procfs it's a bit more tricky, since procfs pci access has only > one file per device, and access to a specific resources first needs > to be set up with some ioctl calls. But mmap is only supported for > the same resources as sysfs exposes with mmap support, and otherwise > rejected, so we can set the mapping unconditionally at open time > without harm. > > A special consideration is for arch_can_pci_mmap_io() - we need to > make sure that the ->f_mapping doesn't alias between ioport and iomem > space. There's only 2 ways in-tree to support mmap of ioports: generic > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > architecture hand-rolling. Both approach support ioport mmap through a > special pfn range and not through magic pte attributes. Aliasing is > therefore not a problem. > > The only difference in access checks left is that sysfs PCI mmap does > not check for CAP_RAWIO. I'm not really sure whether that should be > added or not. > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Kees Cook <keescook@chromium.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Jérôme Glisse <jglisse@redhat.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: linux-mm@kvack.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-samsung-soc@vger.kernel.org > Cc: linux-media@vger.kernel.org > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-pci@vger.kernel.org > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > -- > v2: > - Totally new approach: Adjust filp->f_mapping at open time. Note that > this now works on all architectures, not just those support > ARCH_GENERIC_PCI_MMAP_RESOURCE > --- > drivers/pci/pci-sysfs.c | 4 ++++ > drivers/pci/proc.c | 1 + > 2 files changed, 5 insertions(+) > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > index d15c881e2e7e..3f1c31bc0b7c 100644 > --- a/drivers/pci/pci-sysfs.c > +++ b/drivers/pci/pci-sysfs.c > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_io->read = pci_read_legacy_io; > b->legacy_io->write = pci_write_legacy_io; > b->legacy_io->mmap = pci_mmap_legacy_io; > + b->legacy_io->mapping = iomem_get_mapping(); > pci_adjust_legacy_attr(b, pci_mmap_io); > error = device_create_bin_file(&b->dev, b->legacy_io); > if (error) > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > b->legacy_mem->size = 1024*1024; > b->legacy_mem->attr.mode = 0600; > b->legacy_mem->mmap = pci_mmap_legacy_mem; > + b->legacy_io->mapping = iomem_get_mapping(); Unlike the normal pci stuff below, the legacy files here go boom because they're set up much earlier in the boot sequence. This only affects HAVE_PCI_LEGACY architectures, which aren't that many. So what should we do here now: - drop the devmem revoke for these - rework the init sequence somehow to set up these files a lot later - redo the sysfs patch so that it doesn't take an address_space pointer, but instead a callback to get at that (since at open time everything is set up). Imo rather ugly - ditch this part of the series (since there's not really any takers for the latter parts it might just not make sense to push for this) - something else? Bjorn, Greg, thoughts? Issuge got reported by Stephen on a powerpc when trying to build linux-next with this patch included. Thanks, Daniel > pci_adjust_legacy_attr(b, pci_mmap_mem); > error = device_create_bin_file(&b->dev, b->legacy_mem); > if (error) > @@ -1156,6 +1158,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) > res_attr->mmap = pci_mmap_resource_uc; > } > } > + if (res_attr->mmap) > + res_attr->mapping = iomem_get_mapping(); > res_attr->attr.name = res_attr_name; > res_attr->attr.mode = 0600; > res_attr->size = pci_resource_len(pdev, num); > diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c > index 3a2f90beb4cb..9bab07302bbf 100644 > --- a/drivers/pci/proc.c > +++ b/drivers/pci/proc.c > @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) > fpriv->write_combine = 0; > > file->private_data = fpriv; > + file->f_mapping = iomem_get_mapping(); > > return 0; > } > -- > 2.29.2 >
On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote: > On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > the default for all driver uses. > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > > support. Let's plug that hole. > > > > For revoke_devmem() to work we need to link our vma into the same > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > > to adjust this at at ->open time: > > > > - for sysfs this is easy, now that binary attributes support this. We > > just set bin_attr->mapping when mmap is supported > > - for procfs it's a bit more tricky, since procfs pci access has only > > one file per device, and access to a specific resources first needs > > to be set up with some ioctl calls. But mmap is only supported for > > the same resources as sysfs exposes with mmap support, and otherwise > > rejected, so we can set the mapping unconditionally at open time > > without harm. > > > > A special consideration is for arch_can_pci_mmap_io() - we need to > > make sure that the ->f_mapping doesn't alias between ioport and iomem > > space. There's only 2 ways in-tree to support mmap of ioports: generic > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > > architecture hand-rolling. Both approach support ioport mmap through a > > special pfn range and not through magic pte attributes. Aliasing is > > therefore not a problem. > > > > The only difference in access checks left is that sysfs PCI mmap does > > not check for CAP_RAWIO. I'm not really sure whether that should be > > added or not. > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: John Hubbard <jhubbard@nvidia.com> > > Cc: Jérôme Glisse <jglisse@redhat.com> > > Cc: Jan Kara <jack@suse.cz> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Cc: linux-mm@kvack.org > > Cc: linux-arm-kernel@lists.infradead.org > > Cc: linux-samsung-soc@vger.kernel.org > > Cc: linux-media@vger.kernel.org > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > Cc: linux-pci@vger.kernel.org > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > -- > > v2: > > - Totally new approach: Adjust filp->f_mapping at open time. Note that > > this now works on all architectures, not just those support > > ARCH_GENERIC_PCI_MMAP_RESOURCE > > --- > > drivers/pci/pci-sysfs.c | 4 ++++ > > drivers/pci/proc.c | 1 + > > 2 files changed, 5 insertions(+) > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > index d15c881e2e7e..3f1c31bc0b7c 100644 > > --- a/drivers/pci/pci-sysfs.c > > +++ b/drivers/pci/pci-sysfs.c > > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > b->legacy_io->read = pci_read_legacy_io; > > b->legacy_io->write = pci_write_legacy_io; > > b->legacy_io->mmap = pci_mmap_legacy_io; > > + b->legacy_io->mapping = iomem_get_mapping(); > > pci_adjust_legacy_attr(b, pci_mmap_io); > > error = device_create_bin_file(&b->dev, b->legacy_io); > > if (error) > > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > b->legacy_mem->size = 1024*1024; > > b->legacy_mem->attr.mode = 0600; > > b->legacy_mem->mmap = pci_mmap_legacy_mem; > > + b->legacy_io->mapping = iomem_get_mapping(); > > Unlike the normal pci stuff below, the legacy files here go boom > because they're set up much earlier in the boot sequence. This only > affects HAVE_PCI_LEGACY architectures, which aren't that many. So what > should we do here now: > - drop the devmem revoke for these > - rework the init sequence somehow to set up these files a lot later > - redo the sysfs patch so that it doesn't take an address_space > pointer, but instead a callback to get at that (since at open time > everything is set up). Imo rather ugly > - ditch this part of the series (since there's not really any takers > for the latter parts it might just not make sense to push for this) > - something else? > > Bjorn, Greg, thoughts? What sysfs patch are you referring to here? thanks, greg k-h
On Tue, Jan 19, 2021 at 3:32 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote: > > On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > > the default for all driver uses. > > > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > > > support. Let's plug that hole. > > > > > > For revoke_devmem() to work we need to link our vma into the same > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > > > to adjust this at at ->open time: > > > > > > - for sysfs this is easy, now that binary attributes support this. We > > > just set bin_attr->mapping when mmap is supported > > > - for procfs it's a bit more tricky, since procfs pci access has only > > > one file per device, and access to a specific resources first needs > > > to be set up with some ioctl calls. But mmap is only supported for > > > the same resources as sysfs exposes with mmap support, and otherwise > > > rejected, so we can set the mapping unconditionally at open time > > > without harm. > > > > > > A special consideration is for arch_can_pci_mmap_io() - we need to > > > make sure that the ->f_mapping doesn't alias between ioport and iomem > > > space. There's only 2 ways in-tree to support mmap of ioports: generic > > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > > > architecture hand-rolling. Both approach support ioport mmap through a > > > special pfn range and not through magic pte attributes. Aliasing is > > > therefore not a problem. > > > > > > The only difference in access checks left is that sysfs PCI mmap does > > > not check for CAP_RAWIO. I'm not really sure whether that should be > > > added or not. > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > Cc: Jérôme Glisse <jglisse@redhat.com> > > > Cc: Jan Kara <jack@suse.cz> > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > Cc: linux-mm@kvack.org > > > Cc: linux-arm-kernel@lists.infradead.org > > > Cc: linux-samsung-soc@vger.kernel.org > > > Cc: linux-media@vger.kernel.org > > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > > Cc: linux-pci@vger.kernel.org > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > -- > > > v2: > > > - Totally new approach: Adjust filp->f_mapping at open time. Note that > > > this now works on all architectures, not just those support > > > ARCH_GENERIC_PCI_MMAP_RESOURCE > > > --- > > > drivers/pci/pci-sysfs.c | 4 ++++ > > > drivers/pci/proc.c | 1 + > > > 2 files changed, 5 insertions(+) > > > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > > index d15c881e2e7e..3f1c31bc0b7c 100644 > > > --- a/drivers/pci/pci-sysfs.c > > > +++ b/drivers/pci/pci-sysfs.c > > > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > b->legacy_io->read = pci_read_legacy_io; > > > b->legacy_io->write = pci_write_legacy_io; > > > b->legacy_io->mmap = pci_mmap_legacy_io; > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > pci_adjust_legacy_attr(b, pci_mmap_io); > > > error = device_create_bin_file(&b->dev, b->legacy_io); > > > if (error) > > > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > b->legacy_mem->size = 1024*1024; > > > b->legacy_mem->attr.mode = 0600; > > > b->legacy_mem->mmap = pci_mmap_legacy_mem; > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > Unlike the normal pci stuff below, the legacy files here go boom > > because they're set up much earlier in the boot sequence. This only > > affects HAVE_PCI_LEGACY architectures, which aren't that many. So what > > should we do here now: > > - drop the devmem revoke for these > > - rework the init sequence somehow to set up these files a lot later > > - redo the sysfs patch so that it doesn't take an address_space > > pointer, but instead a callback to get at that (since at open time > > everything is set up). Imo rather ugly > > - ditch this part of the series (since there's not really any takers > > for the latter parts it might just not make sense to push for this) > > - something else? > > > > Bjorn, Greg, thoughts? > > What sysfs patch are you referring to here? Currently in linux-next: commit 74b30195395c406c787280a77ae55aed82dbbfc7 (HEAD -> topic/iomem-mmap-vs-gup, drm/topic/iomem-mmap-vs-gup) Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Nov 27 17:41:25 2020 +0100 sysfs: Support zapping of binary attr mmaps Or the patch right before this one in this submission here: https://lore.kernel.org/dri-devel/20201127164131.2244124-12-daniel.vetter@ffwll.ch/ Cheers, Daniel
On Tue, Jan 19, 2021 at 03:34:47PM +0100, Daniel Vetter wrote: > On Tue, Jan 19, 2021 at 3:32 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote: > > > On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > > > the default for all driver uses. > > > > > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > > > > support. Let's plug that hole. > > > > > > > > For revoke_devmem() to work we need to link our vma into the same > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > > > > to adjust this at at ->open time: > > > > > > > > - for sysfs this is easy, now that binary attributes support this. We > > > > just set bin_attr->mapping when mmap is supported > > > > - for procfs it's a bit more tricky, since procfs pci access has only > > > > one file per device, and access to a specific resources first needs > > > > to be set up with some ioctl calls. But mmap is only supported for > > > > the same resources as sysfs exposes with mmap support, and otherwise > > > > rejected, so we can set the mapping unconditionally at open time > > > > without harm. > > > > > > > > A special consideration is for arch_can_pci_mmap_io() - we need to > > > > make sure that the ->f_mapping doesn't alias between ioport and iomem > > > > space. There's only 2 ways in-tree to support mmap of ioports: generic > > > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > > > > architecture hand-rolling. Both approach support ioport mmap through a > > > > special pfn range and not through magic pte attributes. Aliasing is > > > > therefore not a problem. > > > > > > > > The only difference in access checks left is that sysfs PCI mmap does > > > > not check for CAP_RAWIO. I'm not really sure whether that should be > > > > added or not. > > > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > Cc: Kees Cook <keescook@chromium.org> > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > > Cc: Jérôme Glisse <jglisse@redhat.com> > > > > Cc: Jan Kara <jack@suse.cz> > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > Cc: linux-mm@kvack.org > > > > Cc: linux-arm-kernel@lists.infradead.org > > > > Cc: linux-samsung-soc@vger.kernel.org > > > > Cc: linux-media@vger.kernel.org > > > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > > > Cc: linux-pci@vger.kernel.org > > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > -- > > > > v2: > > > > - Totally new approach: Adjust filp->f_mapping at open time. Note that > > > > this now works on all architectures, not just those support > > > > ARCH_GENERIC_PCI_MMAP_RESOURCE > > > > --- > > > > drivers/pci/pci-sysfs.c | 4 ++++ > > > > drivers/pci/proc.c | 1 + > > > > 2 files changed, 5 insertions(+) > > > > > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > > > index d15c881e2e7e..3f1c31bc0b7c 100644 > > > > --- a/drivers/pci/pci-sysfs.c > > > > +++ b/drivers/pci/pci-sysfs.c > > > > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > b->legacy_io->read = pci_read_legacy_io; > > > > b->legacy_io->write = pci_write_legacy_io; > > > > b->legacy_io->mmap = pci_mmap_legacy_io; > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > pci_adjust_legacy_attr(b, pci_mmap_io); > > > > error = device_create_bin_file(&b->dev, b->legacy_io); > > > > if (error) > > > > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > b->legacy_mem->size = 1024*1024; > > > > b->legacy_mem->attr.mode = 0600; > > > > b->legacy_mem->mmap = pci_mmap_legacy_mem; > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > > > Unlike the normal pci stuff below, the legacy files here go boom > > > because they're set up much earlier in the boot sequence. This only > > > affects HAVE_PCI_LEGACY architectures, which aren't that many. So what > > > should we do here now: > > > - drop the devmem revoke for these > > > - rework the init sequence somehow to set up these files a lot later > > > - redo the sysfs patch so that it doesn't take an address_space > > > pointer, but instead a callback to get at that (since at open time > > > everything is set up). Imo rather ugly > > > - ditch this part of the series (since there's not really any takers > > > for the latter parts it might just not make sense to push for this) > > > - something else? > > > > > > Bjorn, Greg, thoughts? > > > > What sysfs patch are you referring to here? > > Currently in linux-next: > > commit 74b30195395c406c787280a77ae55aed82dbbfc7 (HEAD -> > topic/iomem-mmap-vs-gup, drm/topic/iomem-mmap-vs-gup) > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Fri Nov 27 17:41:25 2020 +0100 > > sysfs: Support zapping of binary attr mmaps > > Or the patch right before this one in this submission here: > > https://lore.kernel.org/dri-devel/20201127164131.2244124-12-daniel.vetter@ffwll.ch/ Ah. Hm, a callback in the sysfs file logic seems really hairy, so I would prefer that not happen. If no one really needs this stuff, why not just drop it like you mention? thanks, greg k-h
On Tue, Jan 19, 2021 at 4:20 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Tue, Jan 19, 2021 at 03:34:47PM +0100, Daniel Vetter wrote: > > On Tue, Jan 19, 2021 at 3:32 PM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote: > > > > On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > > > > the default for all driver uses. > > > > > > > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > > > > > support. Let's plug that hole. > > > > > > > > > > For revoke_devmem() to work we need to link our vma into the same > > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > > > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > > > > > to adjust this at at ->open time: > > > > > > > > > > - for sysfs this is easy, now that binary attributes support this. We > > > > > just set bin_attr->mapping when mmap is supported > > > > > - for procfs it's a bit more tricky, since procfs pci access has only > > > > > one file per device, and access to a specific resources first needs > > > > > to be set up with some ioctl calls. But mmap is only supported for > > > > > the same resources as sysfs exposes with mmap support, and otherwise > > > > > rejected, so we can set the mapping unconditionally at open time > > > > > without harm. > > > > > > > > > > A special consideration is for arch_can_pci_mmap_io() - we need to > > > > > make sure that the ->f_mapping doesn't alias between ioport and iomem > > > > > space. There's only 2 ways in-tree to support mmap of ioports: generic > > > > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > > > > > architecture hand-rolling. Both approach support ioport mmap through a > > > > > special pfn range and not through magic pte attributes. Aliasing is > > > > > therefore not a problem. > > > > > > > > > > The only difference in access checks left is that sysfs PCI mmap does > > > > > not check for CAP_RAWIO. I'm not really sure whether that should be > > > > > added or not. > > > > > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > > Cc: Kees Cook <keescook@chromium.org> > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > > > Cc: Jérôme Glisse <jglisse@redhat.com> > > > > > Cc: Jan Kara <jack@suse.cz> > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > > Cc: linux-mm@kvack.org > > > > > Cc: linux-arm-kernel@lists.infradead.org > > > > > Cc: linux-samsung-soc@vger.kernel.org > > > > > Cc: linux-media@vger.kernel.org > > > > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > > > > Cc: linux-pci@vger.kernel.org > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > > -- > > > > > v2: > > > > > - Totally new approach: Adjust filp->f_mapping at open time. Note that > > > > > this now works on all architectures, not just those support > > > > > ARCH_GENERIC_PCI_MMAP_RESOURCE > > > > > --- > > > > > drivers/pci/pci-sysfs.c | 4 ++++ > > > > > drivers/pci/proc.c | 1 + > > > > > 2 files changed, 5 insertions(+) > > > > > > > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > > > > index d15c881e2e7e..3f1c31bc0b7c 100644 > > > > > --- a/drivers/pci/pci-sysfs.c > > > > > +++ b/drivers/pci/pci-sysfs.c > > > > > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > > b->legacy_io->read = pci_read_legacy_io; > > > > > b->legacy_io->write = pci_write_legacy_io; > > > > > b->legacy_io->mmap = pci_mmap_legacy_io; > > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > > pci_adjust_legacy_attr(b, pci_mmap_io); > > > > > error = device_create_bin_file(&b->dev, b->legacy_io); > > > > > if (error) > > > > > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > > b->legacy_mem->size = 1024*1024; > > > > > b->legacy_mem->attr.mode = 0600; > > > > > b->legacy_mem->mmap = pci_mmap_legacy_mem; > > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > > > > > Unlike the normal pci stuff below, the legacy files here go boom > > > > because they're set up much earlier in the boot sequence. This only > > > > affects HAVE_PCI_LEGACY architectures, which aren't that many. So what > > > > should we do here now: > > > > - drop the devmem revoke for these > > > > - rework the init sequence somehow to set up these files a lot later > > > > - redo the sysfs patch so that it doesn't take an address_space > > > > pointer, but instead a callback to get at that (since at open time > > > > everything is set up). Imo rather ugly > > > > - ditch this part of the series (since there's not really any takers > > > > for the latter parts it might just not make sense to push for this) > > > > - something else? > > > > > > > > Bjorn, Greg, thoughts? > > > > > > What sysfs patch are you referring to here? > > > > Currently in linux-next: > > > > commit 74b30195395c406c787280a77ae55aed82dbbfc7 (HEAD -> > > topic/iomem-mmap-vs-gup, drm/topic/iomem-mmap-vs-gup) > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > > Date: Fri Nov 27 17:41:25 2020 +0100 > > > > sysfs: Support zapping of binary attr mmaps > > > > Or the patch right before this one in this submission here: > > > > https://lore.kernel.org/dri-devel/20201127164131.2244124-12-daniel.vetter@ffwll.ch/ > > Ah. Hm, a callback in the sysfs file logic seems really hairy, so I > would prefer that not happen. If no one really needs this stuff, why > not just drop it like you mention? Well it is needed, but just on architectures I don't care about much. Most relevant is perhaps powerpc (that's where Stephen hit the issue). I do wonder whether we could move the legacy pci files setup to where the modern stuff is set up from pci_create_resource_files() or maybe pci_create_sysfs_dev_files() even for HAVE_PCI_LEGACY. I think that might work, but since it's legacy flow on some funny architectures (alpha, itanium, that kind of stuff) I have no idea what kind of monsters I'm going to anger :-) -Daniel
On Tue, Jan 19, 2021 at 5:03 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Tue, Jan 19, 2021 at 4:20 PM Greg Kroah-Hartman > <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Jan 19, 2021 at 03:34:47PM +0100, Daniel Vetter wrote: > > > On Tue, Jan 19, 2021 at 3:32 PM Greg Kroah-Hartman > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote: > > > > > On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > > > > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > > > > > the default for all driver uses. > > > > > > > > > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > > > > > > support. Let's plug that hole. > > > > > > > > > > > > For revoke_devmem() to work we need to link our vma into the same > > > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > > > > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > > > > > > to adjust this at at ->open time: > > > > > > > > > > > > - for sysfs this is easy, now that binary attributes support this. We > > > > > > just set bin_attr->mapping when mmap is supported > > > > > > - for procfs it's a bit more tricky, since procfs pci access has only > > > > > > one file per device, and access to a specific resources first needs > > > > > > to be set up with some ioctl calls. But mmap is only supported for > > > > > > the same resources as sysfs exposes with mmap support, and otherwise > > > > > > rejected, so we can set the mapping unconditionally at open time > > > > > > without harm. > > > > > > > > > > > > A special consideration is for arch_can_pci_mmap_io() - we need to > > > > > > make sure that the ->f_mapping doesn't alias between ioport and iomem > > > > > > space. There's only 2 ways in-tree to support mmap of ioports: generic > > > > > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > > > > > > architecture hand-rolling. Both approach support ioport mmap through a > > > > > > special pfn range and not through magic pte attributes. Aliasing is > > > > > > therefore not a problem. > > > > > > > > > > > > The only difference in access checks left is that sysfs PCI mmap does > > > > > > not check for CAP_RAWIO. I'm not really sure whether that should be > > > > > > added or not. > > > > > > > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > > > Cc: Kees Cook <keescook@chromium.org> > > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > > > > Cc: Jérôme Glisse <jglisse@redhat.com> > > > > > > Cc: Jan Kara <jack@suse.cz> > > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > > > Cc: linux-mm@kvack.org > > > > > > Cc: linux-arm-kernel@lists.infradead.org > > > > > > Cc: linux-samsung-soc@vger.kernel.org > > > > > > Cc: linux-media@vger.kernel.org > > > > > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > > > > > Cc: linux-pci@vger.kernel.org > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > > > -- > > > > > > v2: > > > > > > - Totally new approach: Adjust filp->f_mapping at open time. Note that > > > > > > this now works on all architectures, not just those support > > > > > > ARCH_GENERIC_PCI_MMAP_RESOURCE > > > > > > --- > > > > > > drivers/pci/pci-sysfs.c | 4 ++++ > > > > > > drivers/pci/proc.c | 1 + > > > > > > 2 files changed, 5 insertions(+) > > > > > > > > > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > > > > > index d15c881e2e7e..3f1c31bc0b7c 100644 > > > > > > --- a/drivers/pci/pci-sysfs.c > > > > > > +++ b/drivers/pci/pci-sysfs.c > > > > > > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > > > b->legacy_io->read = pci_read_legacy_io; > > > > > > b->legacy_io->write = pci_write_legacy_io; > > > > > > b->legacy_io->mmap = pci_mmap_legacy_io; > > > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > > > pci_adjust_legacy_attr(b, pci_mmap_io); > > > > > > error = device_create_bin_file(&b->dev, b->legacy_io); > > > > > > if (error) > > > > > > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > > > b->legacy_mem->size = 1024*1024; > > > > > > b->legacy_mem->attr.mode = 0600; > > > > > > b->legacy_mem->mmap = pci_mmap_legacy_mem; > > > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > > > > > > > Unlike the normal pci stuff below, the legacy files here go boom > > > > > because they're set up much earlier in the boot sequence. This only > > > > > affects HAVE_PCI_LEGACY architectures, which aren't that many. So what > > > > > should we do here now: > > > > > - drop the devmem revoke for these > > > > > - rework the init sequence somehow to set up these files a lot later > > > > > - redo the sysfs patch so that it doesn't take an address_space > > > > > pointer, but instead a callback to get at that (since at open time > > > > > everything is set up). Imo rather ugly > > > > > - ditch this part of the series (since there's not really any takers > > > > > for the latter parts it might just not make sense to push for this) > > > > > - something else? > > > > > > > > > > Bjorn, Greg, thoughts? > > > > > > > > What sysfs patch are you referring to here? > > > > > > Currently in linux-next: > > > > > > commit 74b30195395c406c787280a77ae55aed82dbbfc7 (HEAD -> > > > topic/iomem-mmap-vs-gup, drm/topic/iomem-mmap-vs-gup) > > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > > > Date: Fri Nov 27 17:41:25 2020 +0100 > > > > > > sysfs: Support zapping of binary attr mmaps > > > > > > Or the patch right before this one in this submission here: > > > > > > https://lore.kernel.org/dri-devel/20201127164131.2244124-12-daniel.vetter@ffwll.ch/ > > > > Ah. Hm, a callback in the sysfs file logic seems really hairy, so I > > would prefer that not happen. If no one really needs this stuff, why > > not just drop it like you mention? > > Well it is needed, but just on architectures I don't care about much. > Most relevant is perhaps powerpc (that's where Stephen hit the issue). > I do wonder whether we could move the legacy pci files setup to where > the modern stuff is set up from pci_create_resource_files() or maybe > pci_create_sysfs_dev_files() even for HAVE_PCI_LEGACY. I think that > might work, but since it's legacy flow on some funny architectures > (alpha, itanium, that kind of stuff) I have no idea what kind of > monsters I'm going to anger :-) Back from a week of vacation, I looked at this again and I think shouldn't be hard to fix this with the sam trick pci_create_sysfs_dev_files() uses: As long as sysfs_initialized isn't set we skip, and then later on when the vfs is up&running we can initialize everything. To be able to apply the same thing to pci_create_legacy_files() I think all I need is to iterate overa all struct pci_bus in pci_sysfs_init() and we're good. Unfortunately I didn't find any for_each_pci_bus(), so how do I do that? Thanks, Daniel
On Wed, Feb 3, 2021 at 5:14 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Tue, Jan 19, 2021 at 5:03 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > On Tue, Jan 19, 2021 at 4:20 PM Greg Kroah-Hartman > > <gregkh@linuxfoundation.org> wrote: > > > > > > On Tue, Jan 19, 2021 at 03:34:47PM +0100, Daniel Vetter wrote: > > > > On Tue, Jan 19, 2021 at 3:32 PM Greg Kroah-Hartman > > > > <gregkh@linuxfoundation.org> wrote: > > > > > > > > > > On Tue, Jan 19, 2021 at 09:17:55AM +0100, Daniel Vetter wrote: > > > > > > On Fri, Nov 27, 2020 at 5:42 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > > > > > > > > > Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims > > > > > > > the region") /dev/kmem zaps ptes when the kernel requests exclusive > > > > > > > acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is > > > > > > > the default for all driver uses. > > > > > > > > > > > > > > Except there's two more ways to access PCI BARs: sysfs and proc mmap > > > > > > > support. Let's plug that hole. > > > > > > > > > > > > > > For revoke_devmem() to work we need to link our vma into the same > > > > > > > address_space, with consistent vma->vm_pgoff. ->pgoff is already > > > > > > > adjusted, because that's how (io_)remap_pfn_range works, but for the > > > > > > > mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is > > > > > > > to adjust this at at ->open time: > > > > > > > > > > > > > > - for sysfs this is easy, now that binary attributes support this. We > > > > > > > just set bin_attr->mapping when mmap is supported > > > > > > > - for procfs it's a bit more tricky, since procfs pci access has only > > > > > > > one file per device, and access to a specific resources first needs > > > > > > > to be set up with some ioctl calls. But mmap is only supported for > > > > > > > the same resources as sysfs exposes with mmap support, and otherwise > > > > > > > rejected, so we can set the mapping unconditionally at open time > > > > > > > without harm. > > > > > > > > > > > > > > A special consideration is for arch_can_pci_mmap_io() - we need to > > > > > > > make sure that the ->f_mapping doesn't alias between ioport and iomem > > > > > > > space. There's only 2 ways in-tree to support mmap of ioports: generic > > > > > > > pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single > > > > > > > architecture hand-rolling. Both approach support ioport mmap through a > > > > > > > special pfn range and not through magic pte attributes. Aliasing is > > > > > > > therefore not a problem. > > > > > > > > > > > > > > The only difference in access checks left is that sysfs PCI mmap does > > > > > > > not check for CAP_RAWIO. I'm not really sure whether that should be > > > > > > > added or not. > > > > > > > > > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > > > > > > Reviewed-by: Dan Williams <dan.j.williams@intel.com> > > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > > > > > > > Cc: Jason Gunthorpe <jgg@ziepe.ca> > > > > > > > Cc: Kees Cook <keescook@chromium.org> > > > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > > > > Cc: Andrew Morton <akpm@linux-foundation.org> > > > > > > > Cc: John Hubbard <jhubbard@nvidia.com> > > > > > > > Cc: Jérôme Glisse <jglisse@redhat.com> > > > > > > > Cc: Jan Kara <jack@suse.cz> > > > > > > > Cc: Dan Williams <dan.j.williams@intel.com> > > > > > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > > > > > > Cc: linux-mm@kvack.org > > > > > > > Cc: linux-arm-kernel@lists.infradead.org > > > > > > > Cc: linux-samsung-soc@vger.kernel.org > > > > > > > Cc: linux-media@vger.kernel.org > > > > > > > Cc: Bjorn Helgaas <bhelgaas@google.com> > > > > > > > Cc: linux-pci@vger.kernel.org > > > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > > > > -- > > > > > > > v2: > > > > > > > - Totally new approach: Adjust filp->f_mapping at open time. Note that > > > > > > > this now works on all architectures, not just those support > > > > > > > ARCH_GENERIC_PCI_MMAP_RESOURCE > > > > > > > --- > > > > > > > drivers/pci/pci-sysfs.c | 4 ++++ > > > > > > > drivers/pci/proc.c | 1 + > > > > > > > 2 files changed, 5 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c > > > > > > > index d15c881e2e7e..3f1c31bc0b7c 100644 > > > > > > > --- a/drivers/pci/pci-sysfs.c > > > > > > > +++ b/drivers/pci/pci-sysfs.c > > > > > > > @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > > > > b->legacy_io->read = pci_read_legacy_io; > > > > > > > b->legacy_io->write = pci_write_legacy_io; > > > > > > > b->legacy_io->mmap = pci_mmap_legacy_io; > > > > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > > > > pci_adjust_legacy_attr(b, pci_mmap_io); > > > > > > > error = device_create_bin_file(&b->dev, b->legacy_io); > > > > > > > if (error) > > > > > > > @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) > > > > > > > b->legacy_mem->size = 1024*1024; > > > > > > > b->legacy_mem->attr.mode = 0600; > > > > > > > b->legacy_mem->mmap = pci_mmap_legacy_mem; > > > > > > > + b->legacy_io->mapping = iomem_get_mapping(); > > > > > > > > > > > > Unlike the normal pci stuff below, the legacy files here go boom > > > > > > because they're set up much earlier in the boot sequence. This only > > > > > > affects HAVE_PCI_LEGACY architectures, which aren't that many. So what > > > > > > should we do here now: > > > > > > - drop the devmem revoke for these > > > > > > - rework the init sequence somehow to set up these files a lot later > > > > > > - redo the sysfs patch so that it doesn't take an address_space > > > > > > pointer, but instead a callback to get at that (since at open time > > > > > > everything is set up). Imo rather ugly > > > > > > - ditch this part of the series (since there's not really any takers > > > > > > for the latter parts it might just not make sense to push for this) > > > > > > - something else? > > > > > > > > > > > > Bjorn, Greg, thoughts? > > > > > > > > > > What sysfs patch are you referring to here? > > > > > > > > Currently in linux-next: > > > > > > > > commit 74b30195395c406c787280a77ae55aed82dbbfc7 (HEAD -> > > > > topic/iomem-mmap-vs-gup, drm/topic/iomem-mmap-vs-gup) > > > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > Date: Fri Nov 27 17:41:25 2020 +0100 > > > > > > > > sysfs: Support zapping of binary attr mmaps > > > > > > > > Or the patch right before this one in this submission here: > > > > > > > > https://lore.kernel.org/dri-devel/20201127164131.2244124-12-daniel.vetter@ffwll.ch/ > > > > > > Ah. Hm, a callback in the sysfs file logic seems really hairy, so I > > > would prefer that not happen. If no one really needs this stuff, why > > > not just drop it like you mention? > > > > Well it is needed, but just on architectures I don't care about much. > > Most relevant is perhaps powerpc (that's where Stephen hit the issue). > > I do wonder whether we could move the legacy pci files setup to where > > the modern stuff is set up from pci_create_resource_files() or maybe > > pci_create_sysfs_dev_files() even for HAVE_PCI_LEGACY. I think that > > might work, but since it's legacy flow on some funny architectures > > (alpha, itanium, that kind of stuff) I have no idea what kind of > > monsters I'm going to anger :-) > > Back from a week of vacation, I looked at this again and I think > shouldn't be hard to fix this with the sam trick > pci_create_sysfs_dev_files() uses: As long as sysfs_initialized isn't > set we skip, and then later on when the vfs is up&running we can > initialize everything. > > To be able to apply the same thing to pci_create_legacy_files() I > think all I need is to iterate overa all struct pci_bus in > pci_sysfs_init() and we're good. Unfortunately I didn't find any > for_each_pci_bus(), so how do I do that? pci_find_next_bus() seems to be the answer I want. I'll see whether that works and then send out new patches. -Daniel
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index d15c881e2e7e..3f1c31bc0b7c 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -929,6 +929,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_io->read = pci_read_legacy_io; b->legacy_io->write = pci_write_legacy_io; b->legacy_io->mmap = pci_mmap_legacy_io; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_io); error = device_create_bin_file(&b->dev, b->legacy_io); if (error) @@ -941,6 +942,7 @@ void pci_create_legacy_files(struct pci_bus *b) b->legacy_mem->size = 1024*1024; b->legacy_mem->attr.mode = 0600; b->legacy_mem->mmap = pci_mmap_legacy_mem; + b->legacy_io->mapping = iomem_get_mapping(); pci_adjust_legacy_attr(b, pci_mmap_mem); error = device_create_bin_file(&b->dev, b->legacy_mem); if (error) @@ -1156,6 +1158,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine) res_attr->mmap = pci_mmap_resource_uc; } } + if (res_attr->mmap) + res_attr->mapping = iomem_get_mapping(); res_attr->attr.name = res_attr_name; res_attr->attr.mode = 0600; res_attr->size = pci_resource_len(pdev, num); diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c index 3a2f90beb4cb..9bab07302bbf 100644 --- a/drivers/pci/proc.c +++ b/drivers/pci/proc.c @@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file) fpriv->write_combine = 0; file->private_data = fpriv; + file->f_mapping = iomem_get_mapping(); return 0; }