Message ID | 20230824013858.303105-1-marex@denx.de (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Hans Verkuil |
Headers |
Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from <linux-media-owner@vger.kernel.org>) id 1qYzKr-006WKh-G3; Thu, 24 Aug 2023 01:40:33 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239107AbjHXBj6 (ORCPT <rfc822;mkrufky@linuxtv.org> + 1 other); Wed, 23 Aug 2023 21:39:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239128AbjHXBja (ORCPT <rfc822;linux-media@vger.kernel.org>); Wed, 23 Aug 2023 21:39:30 -0400 Received: from phobos.denx.de (phobos.denx.de [IPv6:2a01:238:438b:c500:173d:9f52:ddab:ee01]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D900610FC for <linux-media@vger.kernel.org>; Wed, 23 Aug 2023 18:39:11 -0700 (PDT) Received: from tr.lan (ip-86-49-120-218.bb.vodafone.cz [86.49.120.218]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: marex@denx.de) by phobos.denx.de (Postfix) with ESMTPSA id 78638807C4; Thu, 24 Aug 2023 03:39:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=denx.de; s=phobos-20191101; t=1692841149; bh=66jlWYI6u7UpHycVpxDeaFTO7Z2w8Drh+F3eeYqhcqk=; h=From:To:Cc:Subject:Date:From; b=t8/Ey5ppHIIVgGvVVTFGdBlSeXtxzhEjZ6+QvYAzTjE36HCs80yxXcdFrlpd0gTYO 9sa6WCUaIoNsoKjjE/FEK/kk0EJH1SR/qpGd3o9v1lHs0mzf0bxW7E8nJks+zUDsFk 51sklLBAe6pmyetqhJLzj9WwA4p7xC2q5gqnwDXJmtSdN44OmGqNYdHFIEOjFdiYcm 7aS5+m3Wi3gi8ZPoCgC1z8tHC4NNXgPR60xqh2dzoJhmHJ5ppPowwnOIJMxzDKrj7T ZW1Siwb0Os44EOmZswgvoaOqwbxfLP6Yn8XKxicd4ZqReV0C2B822d7cSsK3qDkZLu ogmbky8r3rWjA== From: Marek Vasut <marex@denx.de> To: linux-media@vger.kernel.org Cc: Marek Vasut <marex@denx.de>, Adam Ford <aford173@gmail.com>, Benjamin Gaignard <benjamin.gaignard@collabora.com>, Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>, Mauro Carvalho Chehab <mchehab@kernel.org>, Philipp Zabel <p.zabel@pengutronix.de>, linux-rockchip@lists.infradead.org Subject: [PATCH] media: hantro: Check whether reset op is defined before use Date: Thu, 24 Aug 2023 03:38:58 +0200 Message-Id: <20230824013858.303105-1-marex@denx.de> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Virus-Scanned: clamav-milter 0.103.8 at phobos.denx.de X-Virus-Status: Clean X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-media.vger.kernel.org> X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -2.5 (--) X-LSpam-Report: No, score=-2.5 required=5.0 tests=BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1 autolearn=ham autolearn_force=no |
Series |
media: hantro: Check whether reset op is defined before use
|
|
Commit Message
Marek Vasut
Aug. 24, 2023, 1:38 a.m. UTC
The i.MX8MM/N/P does not define the .reset op since reset of the VPU is
done by genpd. Check whether the .reset op is defined before calling it
to avoid NULL pointer dereference.
Note that the Fixes tag is set to the commit which removed the reset op
from i.MX8M Hantro G2 implementation, this is because before this commit
all the implementations did define the .reset op.
Fixes: 6971efb70ac3 ("media: hantro: Allow i.MX8MQ G1 and G2 to run independently")
Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: Adam Ford <aford173@gmail.com>
Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: linux-media@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
---
drivers/media/platform/verisilicon/hantro_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
On Thu, Aug 24, 2023 at 9:39 AM Marek Vasut <marex@denx.de> wrote: > > The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > done by genpd. Check whether the .reset op is defined before calling it > to avoid NULL pointer dereference. > > Note that the Fixes tag is set to the commit which removed the reset op > from i.MX8M Hantro G2 implementation, this is because before this commit > all the implementations did define the .reset op. > > Fixes: 6971efb70ac3 ("media: hantro: Allow i.MX8MQ G1 and G2 to run independently") > Signed-off-by: Marek Vasut <marex@denx.de> Had the same change in my local tree, so Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Tested-by: Chen-Yu Tsai <wenst@chromium.org> > --- > Cc: Adam Ford <aford173@gmail.com> > Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> > Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org> > Cc: Philipp Zabel <p.zabel@pengutronix.de> > Cc: linux-media@vger.kernel.org > Cc: linux-rockchip@lists.infradead.org > --- > drivers/media/platform/verisilicon/hantro_drv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c > index 423fc85d79ee3..50ec24c753e9e 100644 > --- a/drivers/media/platform/verisilicon/hantro_drv.c > +++ b/drivers/media/platform/verisilicon/hantro_drv.c > @@ -125,7 +125,8 @@ void hantro_watchdog(struct work_struct *work) > ctx = v4l2_m2m_get_curr_priv(vpu->m2m_dev); > if (ctx) { > vpu_err("frame processing timed out!\n"); > - ctx->codec_ops->reset(ctx); > + if (ctx->codec_ops->reset) > + ctx->codec_ops->reset(ctx); > hantro_job_finish(vpu, ctx, VB2_BUF_STATE_ERROR); > } > } > -- > 2.40.1 > > > _______________________________________________ > Linux-rockchip mailing list > Linux-rockchip@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-rockchip
On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > > The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > done by genpd. Check whether the .reset op is defined before calling it > to avoid NULL pointer dereference. > > Note that the Fixes tag is set to the commit which removed the reset op > from i.MX8M Hantro G2 implementation, this is because before this commit > all the implementations did define the .reset op. I am surprised I didn't have issues when I was testing the 8MQ and 8MM, but this makes sense. > > Fixes: 6971efb70ac3 ("media: hantro: Allow i.MX8MQ G1 and G2 to run independently") > Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Adam Ford <aford173@gmail.com> > --- > Cc: Adam Ford <aford173@gmail.com> > Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> > Cc: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org> > Cc: Philipp Zabel <p.zabel@pengutronix.de> > Cc: linux-media@vger.kernel.org > Cc: linux-rockchip@lists.infradead.org > --- > drivers/media/platform/verisilicon/hantro_drv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c > index 423fc85d79ee3..50ec24c753e9e 100644 > --- a/drivers/media/platform/verisilicon/hantro_drv.c > +++ b/drivers/media/platform/verisilicon/hantro_drv.c > @@ -125,7 +125,8 @@ void hantro_watchdog(struct work_struct *work) > ctx = v4l2_m2m_get_curr_priv(vpu->m2m_dev); > if (ctx) { > vpu_err("frame processing timed out!\n"); > - ctx->codec_ops->reset(ctx); > + if (ctx->codec_ops->reset) > + ctx->codec_ops->reset(ctx); > hantro_job_finish(vpu, ctx, VB2_BUF_STATE_ERROR); > } > } > -- > 2.40.1 >
On 8/24/23 12:39, Adam Ford wrote: > On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >> >> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >> done by genpd. Check whether the .reset op is defined before calling it >> to avoid NULL pointer dereference. >> >> Note that the Fixes tag is set to the commit which removed the reset op >> from i.MX8M Hantro G2 implementation, this is because before this commit >> all the implementations did define the .reset op. > > I am surprised I didn't have issues when I was testing the 8MQ and > 8MM, but this makes sense. You need to trigger the VPU watchdog to trigger the crash, that means you have to get the VPU into some weird state where it fails to decode frame. Then it triggers the reset and ... boom. See this patch, that contains a gstreamer invocation to generate such trigger condition input data: [PATCH] media: verisilicon: Do not enable G2 postproc downscale if source is narrower than destination " To generate input test data to trigger this bug, use e.g.: $ gst-launch-1.0 videotestsrc ! video/x-raw,width=272,height=256,format=I420 ! \ vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 To trigger the bug upon decoding (note that the NV12 must be forced, as that assures the output data would pass the G2 postproc): $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! vp9parse ! \ v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert ! fbdevsink "
On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > > On 8/24/23 12:39, Adam Ford wrote: > > On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >> > >> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >> done by genpd. Check whether the .reset op is defined before calling it > >> to avoid NULL pointer dereference. > >> > >> Note that the Fixes tag is set to the commit which removed the reset op > >> from i.MX8M Hantro G2 implementation, this is because before this commit > >> all the implementations did define the .reset op. > > > > I am surprised I didn't have issues when I was testing the 8MQ and > > 8MM, but this makes sense. > > You need to trigger the VPU watchdog to trigger the crash, that means > you have to get the VPU into some weird state where it fails to decode > frame. Then it triggers the reset and ... boom. > > See this patch, that contains a gstreamer invocation to generate such > trigger condition input data: > > [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > source is narrower than destination > > " > To generate input test data to trigger this bug, use e.g.: > $ gst-launch-1.0 videotestsrc ! > video/x-raw,width=272,height=256,format=I420 ! \ > vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > To trigger the bug upon decoding (note that the NV12 must be forced, as > that assures the output data would pass the G2 postproc): > $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > vp9parse ! \ > v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > ! fbdevsink > " Does it completely recover afterwards? In my previous trials the hardware ended up in some bizzare state: while decoding succeeds, the output's md5sum didn't match up. ChenYu
On 8/25/23 09:09, Chen-Yu Tsai wrote: > On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: >> >> On 8/24/23 12:39, Adam Ford wrote: >>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >>>> >>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >>>> done by genpd. Check whether the .reset op is defined before calling it >>>> to avoid NULL pointer dereference. >>>> >>>> Note that the Fixes tag is set to the commit which removed the reset op >>>> from i.MX8M Hantro G2 implementation, this is because before this commit >>>> all the implementations did define the .reset op. >>> >>> I am surprised I didn't have issues when I was testing the 8MQ and >>> 8MM, but this makes sense. >> >> You need to trigger the VPU watchdog to trigger the crash, that means >> you have to get the VPU into some weird state where it fails to decode >> frame. Then it triggers the reset and ... boom. >> >> See this patch, that contains a gstreamer invocation to generate such >> trigger condition input data: >> >> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if >> source is narrower than destination >> >> " >> To generate input test data to trigger this bug, use e.g.: >> $ gst-launch-1.0 videotestsrc ! >> video/x-raw,width=272,height=256,format=I420 ! \ >> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 >> To trigger the bug upon decoding (note that the NV12 must be forced, as >> that assures the output data would pass the G2 postproc): >> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! >> vp9parse ! \ >> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert >> ! fbdevsink >> " > > Does it completely recover afterwards? In my previous trials the hardware > ended up in some bizzare state: while decoding succeeds, the output's md5sum > didn't match up. Have you got a testcase that triggers this, one I can try ? I am not entirely sure whether this is happening here as well or not, but I can imagine that the power domain went down and back up between tests, so the VPU would be power cycled (and therefore reset) that way. So, I think it is worth testing that.
On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: > > On 8/25/23 09:09, Chen-Yu Tsai wrote: > > On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > >> > >> On 8/24/23 12:39, Adam Ford wrote: > >>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >>>> > >>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >>>> done by genpd. Check whether the .reset op is defined before calling it > >>>> to avoid NULL pointer dereference. > >>>> > >>>> Note that the Fixes tag is set to the commit which removed the reset op > >>>> from i.MX8M Hantro G2 implementation, this is because before this commit > >>>> all the implementations did define the .reset op. > >>> > >>> I am surprised I didn't have issues when I was testing the 8MQ and > >>> 8MM, but this makes sense. > >> > >> You need to trigger the VPU watchdog to trigger the crash, that means > >> you have to get the VPU into some weird state where it fails to decode > >> frame. Then it triggers the reset and ... boom. > >> > >> See this patch, that contains a gstreamer invocation to generate such > >> trigger condition input data: > >> > >> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > >> source is narrower than destination > >> > >> " > >> To generate input test data to trigger this bug, use e.g.: > >> $ gst-launch-1.0 videotestsrc ! > >> video/x-raw,width=272,height=256,format=I420 ! \ > >> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > >> To trigger the bug upon decoding (note that the NV12 must be forced, as > >> that assures the output data would pass the G2 postproc): > >> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > >> vp9parse ! \ > >> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > >> ! fbdevsink > >> " > > > > Does it completely recover afterwards? In my previous trials the hardware > > ended up in some bizzare state: while decoding succeeds, the output's md5sum > > didn't match up. > > Have you got a testcase that triggers this, one I can try ? > > I am not entirely sure whether this is happening here as well or not, > but I can imagine that the power domain went down and back up between > tests, so the VPU would be power cycled (and therefore reset) that way. > So, I think it is worth testing that. This was last year while I was writing HEVC decoding code for Chromium. IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test suite does cause our stack to crash, but Gstreamer seemed to handle it OK. It could be that the Chromium decoder stack is passing bad values to the decoder. ChenYu
On 8/25/23 10:52, Chen-Yu Tsai wrote: > On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: >> >> On 8/25/23 09:09, Chen-Yu Tsai wrote: >>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: >>>> >>>> On 8/24/23 12:39, Adam Ford wrote: >>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >>>>>> >>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >>>>>> done by genpd. Check whether the .reset op is defined before calling it >>>>>> to avoid NULL pointer dereference. >>>>>> >>>>>> Note that the Fixes tag is set to the commit which removed the reset op >>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit >>>>>> all the implementations did define the .reset op. >>>>> >>>>> I am surprised I didn't have issues when I was testing the 8MQ and >>>>> 8MM, but this makes sense. >>>> >>>> You need to trigger the VPU watchdog to trigger the crash, that means >>>> you have to get the VPU into some weird state where it fails to decode >>>> frame. Then it triggers the reset and ... boom. >>>> >>>> See this patch, that contains a gstreamer invocation to generate such >>>> trigger condition input data: >>>> >>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if >>>> source is narrower than destination >>>> >>>> " >>>> To generate input test data to trigger this bug, use e.g.: >>>> $ gst-launch-1.0 videotestsrc ! >>>> video/x-raw,width=272,height=256,format=I420 ! \ >>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 >>>> To trigger the bug upon decoding (note that the NV12 must be forced, as >>>> that assures the output data would pass the G2 postproc): >>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! >>>> vp9parse ! \ >>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert >>>> ! fbdevsink >>>> " >>> >>> Does it completely recover afterwards? In my previous trials the hardware >>> ended up in some bizzare state: while decoding succeeds, the output's md5sum >>> didn't match up. >> >> Have you got a testcase that triggers this, one I can try ? >> >> I am not entirely sure whether this is happening here as well or not, >> but I can imagine that the power domain went down and back up between >> tests, so the VPU would be power cycled (and therefore reset) that way. >> So, I think it is worth testing that. > > This was last year while I was writing HEVC decoding code for Chromium. > IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test > suite does cause our stack to crash, but Gstreamer seemed to handle it > OK. It could be that the Chromium decoder stack is passing bad values to > the decoder. That can be easily tested with ftrace enabled. I was just tracking down an issue with gstreamer and added the following patch to the hantro driver. Then: echo > /sys/kernel/debug/tracing/trace <run fail test> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace echo > /sys/kernel/debug/tracing/trace <run pass test> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace # remove time stamps etc. diff /tmp/{fail,pass}.trace You should see whether some register programming differs between gstreamer and chromium. diff --git a/drivers/media/platform/verisilicon/hantro.h b/drivers/media/platform/verisilicon/hantro.h index dea35a501ba30..529f1ab478ec8 100644 --- a/drivers/media/platform/verisilicon/hantro.h +++ b/drivers/media/platform/verisilicon/hantro.h @@ -353,8 +353,7 @@ extern int hantro_debug; #define vpu_debug(level, fmt, args...) \ do { \ - if (hantro_debug & BIT(level)) \ - pr_info("%s:%d: " fmt, \ + trace_printk("%s:%d: " fmt, \ __func__, __LINE__, ##args); \ } while (0)
On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex@denx.de> wrote: > > On 8/25/23 10:52, Chen-Yu Tsai wrote: > > On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: > >> > >> On 8/25/23 09:09, Chen-Yu Tsai wrote: > >>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > >>>> > >>>> On 8/24/23 12:39, Adam Ford wrote: > >>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >>>>>> > >>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >>>>>> done by genpd. Check whether the .reset op is defined before calling it > >>>>>> to avoid NULL pointer dereference. > >>>>>> > >>>>>> Note that the Fixes tag is set to the commit which removed the reset op > >>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit > >>>>>> all the implementations did define the .reset op. > >>>>> > >>>>> I am surprised I didn't have issues when I was testing the 8MQ and > >>>>> 8MM, but this makes sense. > >>>> > >>>> You need to trigger the VPU watchdog to trigger the crash, that means > >>>> you have to get the VPU into some weird state where it fails to decode > >>>> frame. Then it triggers the reset and ... boom. > >>>> > >>>> See this patch, that contains a gstreamer invocation to generate such > >>>> trigger condition input data: > >>>> > >>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > >>>> source is narrower than destination > >>>> > >>>> " > >>>> To generate input test data to trigger this bug, use e.g.: > >>>> $ gst-launch-1.0 videotestsrc ! > >>>> video/x-raw,width=272,height=256,format=I420 ! \ > >>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > >>>> To trigger the bug upon decoding (note that the NV12 must be forced, as > >>>> that assures the output data would pass the G2 postproc): > >>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > >>>> vp9parse ! \ > >>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > >>>> ! fbdevsink > >>>> " > >>> > >>> Does it completely recover afterwards? In my previous trials the hardware > >>> ended up in some bizzare state: while decoding succeeds, the output's md5sum > >>> didn't match up. > >> > >> Have you got a testcase that triggers this, one I can try ? > >> > >> I am not entirely sure whether this is happening here as well or not, > >> but I can imagine that the power domain went down and back up between > >> tests, so the VPU would be power cycled (and therefore reset) that way. > >> So, I think it is worth testing that. > > > > This was last year while I was writing HEVC decoding code for Chromium. > > IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test > > suite does cause our stack to crash, but Gstreamer seemed to handle it > > OK. It could be that the Chromium decoder stack is passing bad values to > > the decoder. > > That can be easily tested with ftrace enabled. I was just tracking down > an issue with gstreamer and added the following patch to the hantro > driver. Then: > > echo > /sys/kernel/debug/tracing/trace > <run fail test> > cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace > echo > /sys/kernel/debug/tracing/trace > <run pass test> > cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace > # remove time stamps etc. > diff /tmp/{fail,pass}.trace > > You should see whether some register programming differs between > gstreamer and chromium. I ended up using VISL to compare the controls set. I did find a bug. It still hard hangs after a couple frames, so I guess I'd need to use your method, but do printk instead. BTW, I wonder if we shouldn't add a reset op, if only just to stop the hardware? That is, do the same two register writes as in the Hantro G2 interrupt handler. ChenYu > diff --git a/drivers/media/platform/verisilicon/hantro.h > b/drivers/media/platform/verisilicon/hantro.h > index dea35a501ba30..529f1ab478ec8 100644 > --- a/drivers/media/platform/verisilicon/hantro.h > +++ b/drivers/media/platform/verisilicon/hantro.h > @@ -353,8 +353,7 @@ extern int hantro_debug; > > #define vpu_debug(level, fmt, args...) \ > do { \ > - if (hantro_debug & BIT(level)) \ > - pr_info("%s:%d: " fmt, \ > + trace_printk("%s:%d: " fmt, \ > __func__, __LINE__, ##args); \ > } while (0)
On 8/30/23 05:38, Chen-Yu Tsai wrote: > On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex@denx.de> wrote: >> >> On 8/25/23 10:52, Chen-Yu Tsai wrote: >>> On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: >>>> >>>> On 8/25/23 09:09, Chen-Yu Tsai wrote: >>>>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: >>>>>> >>>>>> On 8/24/23 12:39, Adam Ford wrote: >>>>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: >>>>>>>> >>>>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is >>>>>>>> done by genpd. Check whether the .reset op is defined before calling it >>>>>>>> to avoid NULL pointer dereference. >>>>>>>> >>>>>>>> Note that the Fixes tag is set to the commit which removed the reset op >>>>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit >>>>>>>> all the implementations did define the .reset op. >>>>>>> >>>>>>> I am surprised I didn't have issues when I was testing the 8MQ and >>>>>>> 8MM, but this makes sense. >>>>>> >>>>>> You need to trigger the VPU watchdog to trigger the crash, that means >>>>>> you have to get the VPU into some weird state where it fails to decode >>>>>> frame. Then it triggers the reset and ... boom. >>>>>> >>>>>> See this patch, that contains a gstreamer invocation to generate such >>>>>> trigger condition input data: >>>>>> >>>>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if >>>>>> source is narrower than destination >>>>>> >>>>>> " >>>>>> To generate input test data to trigger this bug, use e.g.: >>>>>> $ gst-launch-1.0 videotestsrc ! >>>>>> video/x-raw,width=272,height=256,format=I420 ! \ >>>>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 >>>>>> To trigger the bug upon decoding (note that the NV12 must be forced, as >>>>>> that assures the output data would pass the G2 postproc): >>>>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! >>>>>> vp9parse ! \ >>>>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert >>>>>> ! fbdevsink >>>>>> " >>>>> >>>>> Does it completely recover afterwards? In my previous trials the hardware >>>>> ended up in some bizzare state: while decoding succeeds, the output's md5sum >>>>> didn't match up. >>>> >>>> Have you got a testcase that triggers this, one I can try ? >>>> >>>> I am not entirely sure whether this is happening here as well or not, >>>> but I can imagine that the power domain went down and back up between >>>> tests, so the VPU would be power cycled (and therefore reset) that way. >>>> So, I think it is worth testing that. >>> >>> This was last year while I was writing HEVC decoding code for Chromium. >>> IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test >>> suite does cause our stack to crash, but Gstreamer seemed to handle it >>> OK. It could be that the Chromium decoder stack is passing bad values to >>> the decoder. >> >> That can be easily tested with ftrace enabled. I was just tracking down >> an issue with gstreamer and added the following patch to the hantro >> driver. Then: >> >> echo > /sys/kernel/debug/tracing/trace >> <run fail test> >> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace >> echo > /sys/kernel/debug/tracing/trace >> <run pass test> >> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace >> # remove time stamps etc. >> diff /tmp/{fail,pass}.trace >> >> You should see whether some register programming differs between >> gstreamer and chromium. > > I ended up using VISL to compare the controls set. I did find a bug. > It still hard hangs after a couple frames, so I guess I'd need to use > your method, but do printk instead. > > BTW, I wonder if we shouldn't add a reset op, if only just to stop the > hardware? That is, do the same two register writes as in the Hantro G2 > interrupt handler. You mean these two ? 38 vdpu_write(vpu, 0, G2_REG_INTERRUPT); 39 vdpu_write(vpu, G2_REG_CONFIG_DEC_CLK_GATE_E, G2_REG_CONFIG); As far as I understand this, that only clears IRQ and gates the clock off, but it doesn't reset the IP state, does it ?
On Thu, Aug 31, 2023 at 3:13 AM Marek Vasut <marex@denx.de> wrote: > > On 8/30/23 05:38, Chen-Yu Tsai wrote: > > On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex@denx.de> wrote: > >> > >> On 8/25/23 10:52, Chen-Yu Tsai wrote: > >>> On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex@denx.de> wrote: > >>>> > >>>> On 8/25/23 09:09, Chen-Yu Tsai wrote: > >>>>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex@denx.de> wrote: > >>>>>> > >>>>>> On 8/24/23 12:39, Adam Ford wrote: > >>>>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex@denx.de> wrote: > >>>>>>>> > >>>>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is > >>>>>>>> done by genpd. Check whether the .reset op is defined before calling it > >>>>>>>> to avoid NULL pointer dereference. > >>>>>>>> > >>>>>>>> Note that the Fixes tag is set to the commit which removed the reset op > >>>>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit > >>>>>>>> all the implementations did define the .reset op. > >>>>>>> > >>>>>>> I am surprised I didn't have issues when I was testing the 8MQ and > >>>>>>> 8MM, but this makes sense. > >>>>>> > >>>>>> You need to trigger the VPU watchdog to trigger the crash, that means > >>>>>> you have to get the VPU into some weird state where it fails to decode > >>>>>> frame. Then it triggers the reset and ... boom. > >>>>>> > >>>>>> See this patch, that contains a gstreamer invocation to generate such > >>>>>> trigger condition input data: > >>>>>> > >>>>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if > >>>>>> source is narrower than destination > >>>>>> > >>>>>> " > >>>>>> To generate input test data to trigger this bug, use e.g.: > >>>>>> $ gst-launch-1.0 videotestsrc ! > >>>>>> video/x-raw,width=272,height=256,format=I420 ! \ > >>>>>> vp9enc ! matroskamux ! filesink location=/tmp/test.vp9 > >>>>>> To trigger the bug upon decoding (note that the NV12 must be forced, as > >>>>>> that assures the output data would pass the G2 postproc): > >>>>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux ! > >>>>>> vp9parse ! \ > >>>>>> v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert > >>>>>> ! fbdevsink > >>>>>> " > >>>>> > >>>>> Does it completely recover afterwards? In my previous trials the hardware > >>>>> ended up in some bizzare state: while decoding succeeds, the output's md5sum > >>>>> didn't match up. > >>>> > >>>> Have you got a testcase that triggers this, one I can try ? > >>>> > >>>> I am not entirely sure whether this is happening here as well or not, > >>>> but I can imagine that the power domain went down and back up between > >>>> tests, so the VPU would be power cycled (and therefore reset) that way. > >>>> So, I think it is worth testing that. > >>> > >>> This was last year while I was writing HEVC decoding code for Chromium. > >>> IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test > >>> suite does cause our stack to crash, but Gstreamer seemed to handle it > >>> OK. It could be that the Chromium decoder stack is passing bad values to > >>> the decoder. > >> > >> That can be easily tested with ftrace enabled. I was just tracking down > >> an issue with gstreamer and added the following patch to the hantro > >> driver. Then: > >> > >> echo > /sys/kernel/debug/tracing/trace > >> <run fail test> > >> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace > >> echo > /sys/kernel/debug/tracing/trace > >> <run pass test> > >> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace > >> # remove time stamps etc. > >> diff /tmp/{fail,pass}.trace > >> > >> You should see whether some register programming differs between > >> gstreamer and chromium. > > > > I ended up using VISL to compare the controls set. I did find a bug. > > It still hard hangs after a couple frames, so I guess I'd need to use > > your method, but do printk instead. > > > > BTW, I wonder if we shouldn't add a reset op, if only just to stop the > > hardware? That is, do the same two register writes as in the Hantro G2 > > interrupt handler. > > You mean these two ? > > 38 vdpu_write(vpu, 0, G2_REG_INTERRUPT); > 39 vdpu_write(vpu, G2_REG_CONFIG_DEC_CLK_GATE_E, G2_REG_CONFIG); Yes. > As far as I understand this, that only clears IRQ and gates the clock > off, but it doesn't reset the IP state, does it ? That's right, but it would stop the hardware from continuing to do whatever it is doing before it gets shut down through runtime PM. I'm not sure it would make that much of a difference, but it did seem like something that could be done. ChenYu
diff --git a/drivers/media/platform/verisilicon/hantro_drv.c b/drivers/media/platform/verisilicon/hantro_drv.c index 423fc85d79ee3..50ec24c753e9e 100644 --- a/drivers/media/platform/verisilicon/hantro_drv.c +++ b/drivers/media/platform/verisilicon/hantro_drv.c @@ -125,7 +125,8 @@ void hantro_watchdog(struct work_struct *work) ctx = v4l2_m2m_get_curr_priv(vpu->m2m_dev); if (ctx) { vpu_err("frame processing timed out!\n"); - ctx->codec_ops->reset(ctx); + if (ctx->codec_ops->reset) + ctx->codec_ops->reset(ctx); hantro_job_finish(vpu, ctx, VB2_BUF_STATE_ERROR); } }