LinuxTV Patchwork Revert 95f408bb Ryzen DMA related RiSC engine stall fixes

login
register
mail settings
Submitter Mauro Carvalho Chehab
Date Dec. 6, 2018, 7:32 p.m.
Message ID <20181206173204.21b9366e@coco.lan>
Download mbox | patch
Permalink /patch/53351/
State New
Delegated to: Mauro Carvalho Chehab
Headers show

Comments

Mauro Carvalho Chehab - Dec. 6, 2018, 7:32 p.m.
Em Thu, 6 Dec 2018 17:07:52 -0200
Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:

> Em Thu, 6 Dec 2018 13:36:24 -0500
> Alex Deucher <alexdeucher@gmail.com> escreveu:
> 
> > On Thu, Dec 6, 2018 at 1:05 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:  
> > >
> > > Em Thu, 06 Dec 2018 18:18:23 +0100
> > > Markus Dobel <markus.dobel@gmx.de> escreveu:
> > >    
> > > > Hi everyone,
> > > >
> > > > I will try if the hack mentioned fixes the issue for me on the weekend (but I assume, as if effectively removes the function).    
> > >
> > > It should, but it keeps a few changes. Just want to be sure that what
> > > would be left won't cause issues. If this works, the logic that would
> > > solve Ryzen DMA fixes will be contained into a single point, making
> > > easier to maintain it.
> > >    
> > > >
> > > > Just in case this is of interest, I neither have Ryzen nor Intel, but an HP Microserver G7 with an AMD Turion II Neo  N54L, so the machine is more on the slow side.    
> > >
> > > Good to know. It would probably worth to check if this Ryzen
> > > bug occors with all versions of it or with just a subset.
> > > I mean: maybe it is only at the first gen or Ryzen and doesn't
> > > affect Ryzen 2 (or vice versa).    
> > 
> > The original commit also mentions some Xeons are affected too.  Seems
> > like this is potentially an issue on the device side rather than the
> > platform.  
> 
> Maybe.
> 
> > >
> > > The PCI quirks logic will likely need to detect the PCI ID of
> > > the memory controllers found at the buggy CPUs, in order to enable
> > > the quirk only for the affected ones.
> > >
> > > It could be worth talking with AMD people in order to be sure about
> > > the differences at the DMA engine side.
> > >    
> > 
> > It's not clear to me what the pci or platform quirk would do.  The
> > workaround seems to be in the driver, not on the platform.  
> 
> Yeah, the fix should be at the driver, but pci/quirk.c would be able
> to detect memory controllers that would require a hack inside the
> driver, in a similar way to what the media PCI drivers already do
> for DMA controllers that don't support pci2pci transfers.
> 
> There, basically the PCI core (drivers/pci/pci.c and 
> drivers/pci/quirks.c) sets a flag (pci_pci_problems) indicating
> potential issues.
> 
> Then, the driver compares such flag in order to enable the specific quirk.
> 
> Ok, there would be a different way to handle it. The driver could use a 
> logic similar to the one I wrote for drivers/edac/i7core_edac.c. There,
> the logic seeks for some specific PCI device IDs using pci_get_device(),
> adjusting the code accordingly, depending on the detected PCI devices.
> 
> In other words, running something like this (untested), at probe time should
> produce similar results:
> 
> 	/*
> 	 * FIXME: It probably makes sense to also be able to identify specific
> 	 * versions of the same PCI ID, just in case a latter stepping got a
> 	 * fix for the issue.
> 	 */
> 	const static struct {
> 		int vendor, dev;
> 	} broken_dev_id[] = {
> 		PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_foo,
> 		PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_bar,
> 	},
> 
> 	bool cx23885_does_dma_require_reset(void) 
> 	{
> 		int i;
> 		struct pci_dev *pdev = NULL;
> 
> 		for (i = 0; i < sizeof(broken_dev_id); i++) {
> 			pdev = pci_get_device(broken_dev_id[i].vendor, broken_dev_id[i].dev, NULL);
> 			if (pdev) {
> 				pci_put_device(pdev);
> 				return true;
> 			}
> 		}
> 		return false;
> 	}
> 
> Should work. In any case, we need to know what memory controllers 
> have problems, and what are their PCI IDs, and add them (if not there yet)
> at include/linux/pci_ids.h
> 
> Thanks,
> Mauro

To be clearer, I'm thinking on something like the (untested)
code below (untested).

PS.: the PCI ID used there may be wrong. I just added one in
order to have a proof of concept.

Thanks,
Mauro

[PATCH] media: cx23885: only reset DMA on problematic CPUs

It is reported that changeset 95f408bbc4e4 ("media: cx23885:
Ryzen DMA related RiSC engine stall fixes") caused regresssions
with other CPUs.

Ensure that the quirk will be applied only for the CPUs that
are known to cause problems.

Fixes: 95f408bbc4e4 ("media: cx23885: Ryzen DMA related RiSC engine stall fixes")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

Patch

diff --git a/drivers/media/pci/cx23885/cx23885-core.c b/drivers/media/pci/cx23885/cx23885-core.c
index 39804d830305..48da7d194cc1 100644
--- a/drivers/media/pci/cx23885/cx23885-core.c
+++ b/drivers/media/pci/cx23885/cx23885-core.c
@@ -23,6 +23,7 @@ 
 #include <linux/moduleparam.h>
 #include <linux/kmod.h>
 #include <linux/kernel.h>
+#include <linux/pci.h>
 #include <linux/slab.h>
 #include <linux/interrupt.h>
 #include <linux/delay.h>
@@ -603,8 +604,13 @@  static void cx23885_risc_disasm(struct cx23885_tsport *port,
 
 static void cx23885_clear_bridge_error(struct cx23885_dev *dev)
 {
-	uint32_t reg1_val = cx_read(TC_REQ); /* read-only */
-	uint32_t reg2_val = cx_read(TC_REQ_SET);
+	uint32_t reg1_val, reg2_val;
+
+	if (!dev->need_dma_reset)
+		return;
+
+	reg1_val = cx_read(TC_REQ); /* read-only */
+	reg2_val = cx_read(TC_REQ_SET);
 
 	if (reg1_val && reg2_val) {
 		cx_write(TC_REQ, reg1_val);
@@ -2058,6 +2064,31 @@  void cx23885_gpio_enable(struct cx23885_dev *dev, u32 mask, int asoutput)
 	/* TODO: 23-19 */
 }
 
+static struct {
+	int vendor, dev;
+} const broken_dev_id[] = {
+	/* According with
+	 * https://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen%207%201800X/lspci,
+	 * 0x1451 is PCI ID for the IOMMU found on Ryzen 7
+	 */
+	{ PCI_VENDOR_ID_AMD, 0x1451 },
+};
+
+static bool cx23885_does_need_dma_reset(void)
+{
+	int i;
+	struct pci_dev *pdev = NULL;
+
+	for (i = 0; i < sizeof(broken_dev_id); i++) {
+		pdev = pci_get_device(broken_dev_id[i].vendor, broken_dev_id[i].dev, NULL);
+		if (pdev) {
+			pci_dev_put(pdev);
+			return true;
+		}
+	}
+	return false;
+}
+
 static int cx23885_initdev(struct pci_dev *pci_dev,
 			   const struct pci_device_id *pci_id)
 {
@@ -2069,6 +2100,8 @@  static int cx23885_initdev(struct pci_dev *pci_dev,
 	if (NULL == dev)
 		return -ENOMEM;
 
+	dev->need_dma_reset = cx23885_does_need_dma_reset();
+
 	err = v4l2_device_register(&pci_dev->dev, &dev->v4l2_dev);
 	if (err < 0)
 		goto fail_free;
diff --git a/drivers/media/pci/cx23885/cx23885.h b/drivers/media/pci/cx23885/cx23885.h
index d54c7ee1ab21..cf965efabe66 100644
--- a/drivers/media/pci/cx23885/cx23885.h
+++ b/drivers/media/pci/cx23885/cx23885.h
@@ -451,6 +451,8 @@  struct cx23885_dev {
 	/* Analog raw audio */
 	struct cx23885_audio_dev   *audio_dev;
 
+	/* Does the system require periodic DMA resets? */
+	unsigned int		need_dma_reset:1;
 };
 
 static inline struct cx23885_dev *to_cx23885(struct v4l2_device *v4l2_dev)

Privacy Policy