Message ID | 20230306062633.200427-1-zyytlz.wz@163.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Hans Verkuil |
Headers |
Received: from vger.kernel.org ([23.128.96.18]) by www.linuxtv.org with esmtp (Exim 4.92) (envelope-from <linux-media-owner@vger.kernel.org>) id 1pZ4KW-00C7ea-IA; Mon, 06 Mar 2023 06:28:16 +0000 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229613AbjCFG2N (ORCPT <rfc822;mkrufky@linuxtv.org> + 1 other); Mon, 6 Mar 2023 01:28:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbjCFG2M (ORCPT <rfc822;linux-media@vger.kernel.org>); Mon, 6 Mar 2023 01:28:12 -0500 Received: from m12.mail.163.com (m12.mail.163.com [220.181.12.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5FD90A5C4; Sun, 5 Mar 2023 22:28:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=y+5KY JiUtormz/jrT3bvfPhN0fTpLqDXBrS1X7+kgf8=; b=QRs+RFmtFj29262cuJfm3 qnUZvbSvRH5p6riP5bN+xrPl5/Efefk8ywOtw3yP/shz+JFzx8TE0l5IMN0biQWP nZ0/AsSiiMaWawwXCZA5n+Rnl/rsVmZf9/O3Jp1DaBa85YLdVaa7MBHtWcEBLg51 drWq+xyBTDn5UEEEL2ADkA= Received: from leanderwang-LC2.localdomain (unknown [111.206.145.21]) by zwqz-smtp-mta-g0-1 (Coremail) with SMTP id _____wBHf_OahwVk540ZCQ--.40472S2; Mon, 06 Mar 2023 14:26:34 +0800 (CST) From: Zheng Wang <zyytlz.wz@163.com> To: mchehab@kernel.org Cc: bin.liu@mediatek.com, matthias.bgg@gmail.com, angelogioacchino.delregno@collabora.com, linux-media@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mediatek@lists.infradead.org, hackerzheng666@gmail.com, 1395428693sheep@gmail.com, alex000young@gmail.com, Zheng Wang <zyytlz.wz@163.com> Subject: [RESEND PATCH] media: mtk-jpeg: Fix use after free bug due to uncanceled work Date: Mon, 6 Mar 2023 14:26:33 +0800 Message-Id: <20230306062633.200427-1-zyytlz.wz@163.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: _____wBHf_OahwVk540ZCQ--.40472S2 X-Coremail-Antispam: 1Uf129KBjvJXoW7uF17uw15Jr4xtw4xCFy7trb_yoW8Xry7pr ZxK3yDCrWUWrs0qr1UJ3WUAF1rCw1rKa1xWr17uw4Iv3y3Jrs7JryFya48tFWIyF92k3Wr Jr10q3s7GrWDZFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x0zi-6pkUUUUU= X-Originating-IP: [111.206.145.21] X-CM-SenderInfo: h2113zf2oz6qqrwthudrp/xtbBzgIqU2I0Xi19SAAAsl X-Spam-Status: No, score=-0.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_VALIDITY_RPBL,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-media.vger.kernel.org> X-Mailing-List: linux-media@vger.kernel.org X-LSpam-Score: -2.5 (--) X-LSpam-Report: No, score=-2.5 required=5.0 tests=BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,FREEMAIL_FORGED_FROMDOMAIN=0.001,FREEMAIL_FROM=0.001,HEADER_FROM_DIFFERENT_DOMAINS=0.5,MAILING_LIST_MULTI=-1 autolearn=ham autolearn_force=no |
Series |
[RESEND] media: mtk-jpeg: Fix use after free bug due to uncanceled work
|
|
Commit Message
Zheng Wang
March 6, 2023, 6:26 a.m. UTC
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
and mtk_jpeg_enc_device_run may be called to start the
work.
If we remove the module which will call mtk_jpeg_remove
to make cleanup, there may be a unfinished work. The
possible sequence is as follows, which will cause a
typical UAF bug.
Fix it by canceling the work before cleanup in the mtk_jpeg_remove
CPU0 CPU1
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Hi, Is there anyone who can help with this? I can provide more details like invoking chain if needed. Thanks, Zheng Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > and mtk_jpeg_enc_device_run may be called to start the > work. > If we remove the module which will call mtk_jpeg_remove > to make cleanup, there may be a unfinished work. The > possible sequence is as follows, which will cause a > typical UAF bug. > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > CPU0 CPU1 > > |mtk_jpeg_job_timeout_work > mtk_jpeg_remove | > v4l2_m2m_release | > kfree(m2m_dev); | > | > | v4l2_m2m_get_curr_priv > | m2m_dev->curr_ctx //use > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > --- > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > index 969516a940ba..364513e7897e 100644 > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev) > static int mtk_jpeg_remove(struct platform_device *pdev) > { > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); > - > + cancel_delayed_work(&jpeg->job_timeout_work); > pm_runtime_disable(&pdev->dev); > video_unregister_device(jpeg->vdev); > v4l2_m2m_release(jpeg->m2m_dev); > -- > 2.25.1 >
Il 07/03/23 10:27, Zheng Hacker ha scritto: > Hi, > > Is there anyone who can help with this? I can provide more details > like invoking chain if needed. > Providing more details is always good. Please do. Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. Regards, Angelo > Thanks, > Zheng > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: >> >> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with >> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run >> and mtk_jpeg_enc_device_run may be called to start the >> work. >> If we remove the module which will call mtk_jpeg_remove >> to make cleanup, there may be a unfinished work. The >> possible sequence is as follows, which will cause a >> typical UAF bug. >> >> Fix it by canceling the work before cleanup in the mtk_jpeg_remove >> >> CPU0 CPU1 >> >> |mtk_jpeg_job_timeout_work >> mtk_jpeg_remove | >> v4l2_m2m_release | >> kfree(m2m_dev); | >> | >> | v4l2_m2m_get_curr_priv >> | m2m_dev->curr_ctx //use >> >> Signed-off-by: Zheng Wang <zyytlz.wz@163.com> >> --- >> drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c >> index 969516a940ba..364513e7897e 100644 >> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c >> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c >> @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev) >> static int mtk_jpeg_remove(struct platform_device *pdev) >> { >> struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); >> - >> + cancel_delayed_work(&jpeg->job_timeout_work); >> pm_runtime_disable(&pdev->dev); >> video_unregister_device(jpeg->vdev); >> v4l2_m2m_release(jpeg->m2m_dev); >> -- >> 2.25.1 >>
Dear Angelo and Zheng, Thanks for your patch and comments. Dear Kyrie, Please help to check this, thanks. Best Regards On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno wrote: > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > Hi, > > > > Is there anyone who can help with this? I can provide more details > > like invoking chain if needed. > > > > Providing more details is always good. Please do. > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. > > Regards, > Angelo > > > Thanks, > > Zheng > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > and mtk_jpeg_enc_device_run may be called to start the > > > work. > > > If we remove the module which will call mtk_jpeg_remove > > > to make cleanup, there may be a unfinished work. The > > > possible sequence is as follows, which will cause a > > > typical UAF bug. > > > > > > Fix it by canceling the work before cleanup in the > > > mtk_jpeg_remove > > > > > > CPU0 CPU1 > > > > > > |mtk_jpeg_job_timeout_work > > > mtk_jpeg_remove | > > > v4l2_m2m_release | > > > kfree(m2m_dev); | > > > | > > > | v4l2_m2m_get_curr_priv > > > | m2m_dev->curr_ctx //use > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > --- > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > index 969516a940ba..364513e7897e 100644 > > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct > > > platform_device *pdev) > > > static int mtk_jpeg_remove(struct platform_device *pdev) > > > { > > > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); > > > - > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > pm_runtime_disable(&pdev->dev); > > > video_unregister_device(jpeg->vdev); > > > v4l2_m2m_release(jpeg->m2m_dev); > > > -- > > > 2.25.1 > > > > > >
The timer function was inited in mtk_jpeg_probe with mtk_jpeg_job_timeout_work function. And the worker is started in mtk_jpeg_dec_device_run. There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which may cancel the worker. They are used as IRQ handler function which is saved as function pointer in a variable. In mtk_jpeg_probe, they are registered by devm_request_irq: ret = devm_request_irq(&pdev->dev, jpeg_irq, jpeg->variant->irq_handler, 0, pdev->name, jpeg); if (ret) { dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n", jpeg_irq, ret); return ret; } However, if we remove the module without triigering the irq, the worker will never be removed. As for the schedule, mtk_jpeg_dec_device_run and mtk_jpeg_enc_device_run will start the worker. The schedule invoking is quite complicated. As far as I know, the invoking chain is as follows: v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run the v4l2_m2m_device_run_work is also a worker which is inited in v4l2_m2m_init and started in v4l2_m2m_schedule_next_job. Before calling remove function, the mtk_jpeg_release was invoked to release the related resource. v4l2_m2m_cancel_job will cancel the job by calling m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). But this will only cancel the current queue by list_del(&m2m_dev->curr_ctx->queue); I think this can not cancel the posted task mentioned before. So I think if mtk_jpeg_job_timeout_work is working on, and use jpeg->m2m_dev after freeing it in mtk_jpeg_remove, it wll cause UAF bug. static int mtk_jpeg_release(struct file *file) { struct mtk_jpeg_dev *jpeg = video_drvdata(file); struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data); mutex_lock(&jpeg->lock); v4l2_ctrl_handler_free(&ctx->ctrl_hdl); [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); v4l2_fh_del(&ctx->fh); v4l2_fh_exit(&ctx->fh); kfree(ctx); mutex_unlock(&jpeg->lock); return 0; } void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) { /* wait until the current context is dequeued from job_queue */ [2] v4l2_m2m_cancel_job(m2m_ctx); vb2_queue_release(&m2m_ctx->cap_q_ctx.q); vb2_queue_release(&m2m_ctx->out_q_ctx.q); kfree(m2m_ctx); } Note that, all of this is static analysis, which may be false positive. Feel free to tell me if there is something I've missed. Regard, Zheng Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月7日周二 17:27写道: > > Hi, > > Is there anyone who can help with this? I can provide more details > like invoking chain if needed. > > Thanks, > Zheng > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > and mtk_jpeg_enc_device_run may be called to start the > > work. > > If we remove the module which will call mtk_jpeg_remove > > to make cleanup, there may be a unfinished work. The > > possible sequence is as follows, which will cause a > > typical UAF bug. > > > > Fix it by canceling the work before cleanup in the mtk_jpeg_remove > > > > CPU0 CPU1 > > > > |mtk_jpeg_job_timeout_work > > mtk_jpeg_remove | > > v4l2_m2m_release | > > kfree(m2m_dev); | > > | > > | v4l2_m2m_get_curr_priv > > | m2m_dev->curr_ctx //use > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > --- > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > index 969516a940ba..364513e7897e 100644 > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev) > > static int mtk_jpeg_remove(struct platform_device *pdev) > > { > > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); > > - > > + cancel_delayed_work(&jpeg->job_timeout_work); > > pm_runtime_disable(&pdev->dev); > > video_unregister_device(jpeg->vdev); > > v4l2_m2m_release(jpeg->m2m_dev); > > -- > > 2.25.1 > >
The timer function was set in mtk_jpeg_probe with mtk_jpeg_job_timeout_work function. And the worker is started in mtk_jpeg_dec_device_run. There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which may cancel the worker. They are used as IRQ handler function which is saved as function pointer in a variable. In mtk_jpeg_probe, they are registered by devm_request_irq: ret = devm_request_irq(&pdev->dev, jpeg_irq, jpeg->variant->irq_handler, 0, pdev->name, jpeg); if (ret) { dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n", jpeg_irq, ret); return ret; } However, if we remove the module without triggering the irq, the worker will never be removed. As for the schedule, mtk_jpeg_dec_device_run and mtk_jpeg_enc_device_run will start the worker. The schedule invoking is quite complicated. As far as I know, the invoking chain is as follows: v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run the v4l2_m2m_device_run_work function is also a worker which is set in v4l2_m2m_init and started in v4l2_m2m_schedule_next_job. Before calling remove function, the mtk_jpeg_release was invoked to release the related resource. v4l2_m2m_cancel_job will cancel the job by calling m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). But this will only cancel the current queue by list_del(&m2m_dev->curr_ctx->queue); I think this can not cancel the posted task mentioned before. So I think if mtk_jpeg_job_timeout_work is working on, and using jpeg->m2m_dev after freeing it in mtk_jpeg_remove, it will cause a UAF bug. static int mtk_jpeg_release(struct file *file) { struct mtk_jpeg_dev *jpeg = video_drvdata(file); struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data); mutex_lock(&jpeg->lock); v4l2_ctrl_handler_free(&ctx->ctrl_hdl); [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); v4l2_fh_del(&ctx->fh); v4l2_fh_exit(&ctx->fh); kfree(ctx); mutex_unlock(&jpeg->lock); return 0; } void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) { /* wait until the current context is dequeued from job_queue */ [2] v4l2_m2m_cancel_job(m2m_ctx); vb2_queue_release(&m2m_ctx->cap_q_ctx.q); vb2_queue_release(&m2m_ctx->out_q_ctx.q); kfree(m2m_ctx); } Note that all of this is static analysis, which may be false positive. Feel free to tell me if there is something I've missed. Regard, Zheng Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > Dear Angelo and Zheng, > > Thanks for your patch and comments. > > Dear Kyrie, > > Please help to check this, thanks. > > Best Regards > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno wrote: > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > Hi, > > > > > > Is there anyone who can help with this? I can provide more details > > > like invoking chain if needed. > > > > > > > Providing more details is always good. Please do. > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. > > > > Regards, > > Angelo > > > > > Thanks, > > > Zheng > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > > and mtk_jpeg_enc_device_run may be called to start the > > > > work. > > > > If we remove the module which will call mtk_jpeg_remove > > > > to make cleanup, there may be a unfinished work. The > > > > possible sequence is as follows, which will cause a > > > > typical UAF bug. > > > > > > > > Fix it by canceling the work before cleanup in the > > > > mtk_jpeg_remove > > > > > > > > CPU0 CPU1 > > > > > > > > |mtk_jpeg_job_timeout_work > > > > mtk_jpeg_remove | > > > > v4l2_m2m_release | > > > > kfree(m2m_dev); | > > > > | > > > > | v4l2_m2m_get_curr_priv > > > > | m2m_dev->curr_ctx //use > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > --- > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > index 969516a940ba..364513e7897e 100644 > > > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct > > > > platform_device *pdev) > > > > static int mtk_jpeg_remove(struct platform_device *pdev) > > > > { > > > > struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); > > > > - > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > pm_runtime_disable(&pdev->dev); > > > > video_unregister_device(jpeg->vdev); > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > -- > > > > 2.25.1 > > > > > > > > > >
On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > The timer function was set in mtk_jpeg_probe with > mtk_jpeg_job_timeout_work function. > And the worker is started in mtk_jpeg_dec_device_run. > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which > may cancel the worker. > They are used as IRQ handler function which is saved as function > pointer in a variable. > In mtk_jpeg_probe, they are registered by devm_request_irq: > > ret = devm_request_irq(&pdev->dev, > jpeg_irq, > jpeg->variant->irq_handler, > 0, > pdev->name, jpeg); > if (ret) { > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n", > jpeg_irq, ret); > return ret; > } > > However, if we remove the module without triggering the irq, the > worker will never be removed. > > As for the schedule, mtk_jpeg_dec_device_run and > mtk_jpeg_enc_device_run will start the worker. > The schedule invoking is quite complicated. As far as I know, the > invoking chain is as follows: > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > the v4l2_m2m_device_run_work function is also a worker which is set > in > v4l2_m2m_init and started in > v4l2_m2m_schedule_next_job. > > Before calling remove function, the mtk_jpeg_release was invoked to > release the related resource. > > v4l2_m2m_cancel_job will cancel the job by calling > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > But this will only cancel the current queue by > list_del(&m2m_dev->curr_ctx->queue); > > I think this can not cancel the posted task mentioned before. So I > think if mtk_jpeg_job_timeout_work > > is working on, and using jpeg->m2m_dev after freeing it in > mtk_jpeg_remove, it will cause a UAF bug. > > static int mtk_jpeg_release(struct file *file) > { > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data); > > mutex_lock(&jpeg->lock); > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > v4l2_fh_del(&ctx->fh); > v4l2_fh_exit(&ctx->fh); > kfree(ctx); > mutex_unlock(&jpeg->lock); > return 0; > } > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > { > /* wait until the current context is dequeued from job_queue */ > [2] v4l2_m2m_cancel_job(m2m_ctx); > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > kfree(m2m_ctx); > } > > Note that all of this is static analysis, which may be false > positive. > Feel free to tell me if there is something I've missed. > > Regard, > Zheng Dear Zheng, You set up an application scenario: cpu1 is using the mtk-jpeg driver and timeout work has been scheduled. At the same time cpu0 wanted to remove the mtk-jpeg driver, which caused the UAF bug. I wonder if such an irrational application scenario could exist. This scenario, as you described, not only leads to the problems you mentioned, but also to output&capture memory leaks and unreleased resources, such as spinlock. Typically, if we want to remove the driver, we firstly do stop streaming, which ensures that the worker has been canceled. I agree with your changes from the perspective of strengthening the robustness of the driver code. Regards, Kyrie. > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > Dear Angelo and Zheng, > > > > Thanks for your patch and comments. > > > > Dear Kyrie, > > > > Please help to check this, thanks. > > > > Best Regards > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno > > wrote: > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > Hi, > > > > > > > > Is there anyone who can help with this? I can provide more > > > > details > > > > like invoking chain if needed. > > > > > > > > > > Providing more details is always good. Please do. > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. > > > > > > Regards, > > > Angelo > > > > > > > Thanks, > > > > Zheng > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > > > and mtk_jpeg_enc_device_run may be called to start the > > > > > work. > > > > > If we remove the module which will call mtk_jpeg_remove > > > > > to make cleanup, there may be a unfinished work. The > > > > > possible sequence is as follows, which will cause a > > > > > typical UAF bug. > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > mtk_jpeg_remove > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > mtk_jpeg_remove | > > > > > v4l2_m2m_release | > > > > > kfree(m2m_dev); | > > > > > | > > > > > | v4l2_m2m_get_curr_priv > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > --- > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +- > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > diff --git > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > index 969516a940ba..364513e7897e 100644 > > > > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct > > > > > platform_device *pdev) > > > > > static int mtk_jpeg_remove(struct platform_device *pdev) > > > > > { > > > > > struct mtk_jpeg_dev *jpeg = > > > > > platform_get_drvdata(pdev); > > > > > - > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > pm_runtime_disable(&pdev->dev); > > > > > video_unregister_device(jpeg->vdev); > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > -- > > > > > 2.25.1 > > > > > > > > > > > > > >
Hi Kyrie, Thank you for your careful analysis and response. I still have some areas that I don't quite understand and would like to ask for clarification. That is, how do the function pointers for stop streaming, initialized as mtk_jpeg_enc_stop_streaming and mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I would greatly appreciate your response. Best regards, Zheng Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > The timer function was set in mtk_jpeg_probe with > > mtk_jpeg_job_timeout_work function. > > And the worker is started in mtk_jpeg_dec_device_run. > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) which > > may cancel the worker. > > They are used as IRQ handler function which is saved as function > > pointer in a variable. > > In mtk_jpeg_probe, they are registered by devm_request_irq: > > > > ret = devm_request_irq(&pdev->dev, > > jpeg_irq, > > jpeg->variant->irq_handler, > > 0, > > pdev->name, jpeg); > > if (ret) { > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n", > > jpeg_irq, ret); > > return ret; > > } > > > > However, if we remove the module without triggering the irq, the > > worker will never be removed. > > > > As for the schedule, mtk_jpeg_dec_device_run and > > mtk_jpeg_enc_device_run will start the worker. > > The schedule invoking is quite complicated. As far as I know, the > > invoking chain is as follows: > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > > > the v4l2_m2m_device_run_work function is also a worker which is set > > in > > v4l2_m2m_init and started in > > v4l2_m2m_schedule_next_job. > > > > Before calling remove function, the mtk_jpeg_release was invoked to > > release the related resource. > > > > v4l2_m2m_cancel_job will cancel the job by calling > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > But this will only cancel the current queue by > > list_del(&m2m_dev->curr_ctx->queue); > > > > I think this can not cancel the posted task mentioned before. So I > > think if mtk_jpeg_job_timeout_work > > > > is working on, and using jpeg->m2m_dev after freeing it in > > mtk_jpeg_remove, it will cause a UAF bug. > > > > static int mtk_jpeg_release(struct file *file) > > { > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file->private_data); > > > > mutex_lock(&jpeg->lock); > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > v4l2_fh_del(&ctx->fh); > > v4l2_fh_exit(&ctx->fh); > > kfree(ctx); > > mutex_unlock(&jpeg->lock); > > return 0; > > } > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > { > > /* wait until the current context is dequeued from job_queue */ > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > kfree(m2m_ctx); > > } > > > > Note that all of this is static analysis, which may be false > > positive. > > Feel free to tell me if there is something I've missed. > > > > Regard, > > Zheng > > Dear Zheng, > > You set up an application scenario: > cpu1 is using the mtk-jpeg driver and timeout work has been scheduled. > At the same time cpu0 wanted to remove the mtk-jpeg driver, which > caused the UAF bug. > I wonder if such an irrational application scenario could exist. This > scenario, as you described, not only leads to the problems you > mentioned, but also to output&capture memory leaks and unreleased > resources, such as spinlock. > Typically, if we want to remove the driver, we firstly do stop > streaming, which ensures that the worker has been canceled. > I agree with your changes from the perspective of strengthening the > robustness of the driver code. > > Regards, > Kyrie. > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > > > Dear Angelo and Zheng, > > > > > > Thanks for your patch and comments. > > > > > > Dear Kyrie, > > > > > > Please help to check this, thanks. > > > > > > Best Regards > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno > > > wrote: > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > Hi, > > > > > > > > > > Is there anyone who can help with this? I can provide more > > > > > details > > > > > like invoking chain if needed. > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. > > > > > > > > Regards, > > > > Angelo > > > > > > > > > Thanks, > > > > > Zheng > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > > > > and mtk_jpeg_enc_device_run may be called to start the > > > > > > work. > > > > > > If we remove the module which will call mtk_jpeg_remove > > > > > > to make cleanup, there may be a unfinished work. The > > > > > > possible sequence is as follows, which will cause a > > > > > > typical UAF bug. > > > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > > mtk_jpeg_remove > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > mtk_jpeg_remove | > > > > > > v4l2_m2m_release | > > > > > > kfree(m2m_dev); | > > > > > > | > > > > > > | v4l2_m2m_get_curr_priv > > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > --- > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct > > > > > > platform_device *pdev) > > > > > > static int mtk_jpeg_remove(struct platform_device *pdev) > > > > > > { > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > platform_get_drvdata(pdev); > > > > > > - > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > video_unregister_device(jpeg->vdev); > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > -- > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > >
On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > Hi Kyrie, > > Thank you for your careful analysis and response. I still have some > areas that I don't quite understand and would like to ask for > clarification. That is, how do the function pointers for stop > streaming, initialized as mtk_jpeg_enc_stop_streaming and > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I > would greatly appreciate your response. > > Best regards, > Zheng Dear zheng, For stop streaming, what I mean is that stoppping jpeg decoding or encoding. Ok, let me introduce the sw flow of stop streaming: Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will call v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is running), it will wait event, the event is wake up by v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by jpeg dec/enc irq handler, which means that the waitting would result mtk hw to finish dec/enc, irq will occur and irq handler would cancel timeout worker. The follow is shown as blow. v4l2_m2m_ioctl_streamoff v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec_irq wait evnet <------ wake up ------v4l2_m2m_job_finish cancel timeout work As mentioned above, if it is normal stop streaming action, there will be no happen that the timeout worker does not canceled. But if mtk_jpeg_remove is called directly without above flow, it would cause lots of issues. Regards, Kyrie. > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > The timer function was set in mtk_jpeg_probe with > > > mtk_jpeg_job_timeout_work function. > > > And the worker is started in mtk_jpeg_dec_device_run. > > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) > > > which > > > may cancel the worker. > > > They are used as IRQ handler function which is saved as function > > > pointer in a variable. > > > In mtk_jpeg_probe, they are registered by devm_request_irq: > > > > > > ret = devm_request_irq(&pdev->dev, > > > jpeg_irq, > > > jpeg->variant->irq_handler, > > > 0, > > > pdev->name, jpeg); > > > if (ret) { > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n", > > > jpeg_irq, ret); > > > return ret; > > > } > > > > > > However, if we remove the module without triggering the irq, the > > > worker will never be removed. > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > mtk_jpeg_enc_device_run will start the worker. > > > The schedule invoking is quite complicated. As far as I know, the > > > invoking chain is as follows: > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > > > > > the v4l2_m2m_device_run_work function is also a worker which is > > > set > > > in > > > v4l2_m2m_init and started in > > > v4l2_m2m_schedule_next_job. > > > > > > Before calling remove function, the mtk_jpeg_release was invoked > > > to > > > release the related resource. > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > But this will only cancel the current queue by > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > I think this can not cancel the posted task mentioned before. So > > > I > > > think if mtk_jpeg_job_timeout_work > > > > > > is working on, and using jpeg->m2m_dev after freeing it in > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > static int mtk_jpeg_release(struct file *file) > > > { > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > >private_data); > > > > > > mutex_lock(&jpeg->lock); > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > v4l2_fh_del(&ctx->fh); > > > v4l2_fh_exit(&ctx->fh); > > > kfree(ctx); > > > mutex_unlock(&jpeg->lock); > > > return 0; > > > } > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > { > > > /* wait until the current context is dequeued from job_queue */ > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > kfree(m2m_ctx); > > > } > > > > > > Note that all of this is static analysis, which may be false > > > positive. > > > Feel free to tell me if there is something I've missed. > > > > > > Regard, > > > Zheng > > > > Dear Zheng, > > > > You set up an application scenario: > > cpu1 is using the mtk-jpeg driver and timeout work has been > > scheduled. > > At the same time cpu0 wanted to remove the mtk-jpeg driver, which > > caused the UAF bug. > > I wonder if such an irrational application scenario could exist. > > This > > scenario, as you described, not only leads to the problems you > > mentioned, but also to output&capture memory leaks and unreleased > > resources, such as spinlock. > > Typically, if we want to remove the driver, we firstly do stop > > streaming, which ensures that the worker has been canceled. > > I agree with your changes from the perspective of strengthening the > > robustness of the driver code. > > > > Regards, > > Kyrie. > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > > > > > Dear Angelo and Zheng, > > > > > > > > Thanks for your patch and comments. > > > > > > > > Dear Kyrie, > > > > > > > > Please help to check this, thanks. > > > > > > > > Best Regards > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno > > > > wrote: > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > Hi, > > > > > > > > > > > > Is there anyone who can help with this? I can provide more > > > > > > details > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. > > > > > > > > > > Regards, > > > > > Angelo > > > > > > > > > > > Thanks, > > > > > > Zheng > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > > > > > and mtk_jpeg_enc_device_run may be called to start the > > > > > > > work. > > > > > > > If we remove the module which will call mtk_jpeg_remove > > > > > > > to make cleanup, there may be a unfinished work. The > > > > > > > possible sequence is as follows, which will cause a > > > > > > > typical UAF bug. > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > mtk_jpeg_remove | > > > > > > > v4l2_m2m_release | > > > > > > > kfree(m2m_dev); | > > > > > > > | > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > --- > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | > > > > > > > 2 +- > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > --- > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > +++ > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct > > > > > > > platform_device *pdev) > > > > > > > static int mtk_jpeg_remove(struct platform_device > > > > > > > *pdev) > > > > > > > { > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > platform_get_drvdata(pdev); > > > > > > > - > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > -- > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > >
Hi Kyrie, Thanks for your quick response and detailed information. I understand more about the design with your explanation. As you say, there shouldn't be any issues with the normal flow. And if the attacker use some tricks to enhance the race window and call remove function directly as [1] mentioned, this might trigger uaf bug unexpectedly. Besides, I found another code path which may call v4l2_m2m_cancel_job and finally make the situation looks safe. The invoking chain is mtk_jpeg_release v4l2_m2m_ctx_release v4l2_m2m_cancel_job If the attacker have to call mtk_jpeg_release function before the remove function, it might be harmless. But I'm not sure about that. [1] https://www.usenix.org/system/files/sec21-lee-yoochan.pdf Best regards, Zheng Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > Hi Kyrie, > > > > Thank you for your careful analysis and response. I still have some > > areas that I don't quite understand and would like to ask for > > clarification. That is, how do the function pointers for stop > > streaming, initialized as mtk_jpeg_enc_stop_streaming and > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I > > would greatly appreciate your response. > > > > Best regards, > > Zheng > > Dear zheng, > > For stop streaming, what I mean is that stoppping jpeg decoding or > encoding. > Ok, let me introduce the sw flow of stop streaming: > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will call > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is > running), it will wait event, the event is wake up by > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by jpeg > dec/enc irq handler, which means that the waitting would result mtk hw > to finish dec/enc, irq will occur and irq handler would cancel timeout > worker. The follow is shown as blow. > v4l2_m2m_ioctl_streamoff > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec_irq > wait evnet <------ wake up ------v4l2_m2m_job_finish > cancel timeout work > > As mentioned above, if it is normal stop streaming action, there will > be no happen that the timeout worker does not canceled. > > But if mtk_jpeg_remove is called directly without above flow, it would > cause lots of issues. > > Regards, > Kyrie. > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > > The timer function was set in mtk_jpeg_probe with > > > > mtk_jpeg_job_timeout_work function. > > > > And the worker is started in mtk_jpeg_dec_device_run. > > > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) > > > > which > > > > may cancel the worker. > > > > They are used as IRQ handler function which is saved as function > > > > pointer in a variable. > > > > In mtk_jpeg_probe, they are registered by devm_request_irq: > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > jpeg_irq, > > > > jpeg->variant->irq_handler, > > > > 0, > > > > pdev->name, jpeg); > > > > if (ret) { > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n", > > > > jpeg_irq, ret); > > > > return ret; > > > > } > > > > > > > > However, if we remove the module without triggering the irq, the > > > > worker will never be removed. > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > mtk_jpeg_enc_device_run will start the worker. > > > > The schedule invoking is quite complicated. As far as I know, the > > > > invoking chain is as follows: > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > > > > > > > the v4l2_m2m_device_run_work function is also a worker which is > > > > set > > > > in > > > > v4l2_m2m_init and started in > > > > v4l2_m2m_schedule_next_job. > > > > > > > > Before calling remove function, the mtk_jpeg_release was invoked > > > > to > > > > release the related resource. > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > But this will only cancel the current queue by > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > I think this can not cancel the posted task mentioned before. So > > > > I > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it in > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > { > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > > >private_data); > > > > > > > > mutex_lock(&jpeg->lock); > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > v4l2_fh_del(&ctx->fh); > > > > v4l2_fh_exit(&ctx->fh); > > > > kfree(ctx); > > > > mutex_unlock(&jpeg->lock); > > > > return 0; > > > > } > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > > { > > > > /* wait until the current context is dequeued from job_queue */ > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > kfree(m2m_ctx); > > > > } > > > > > > > > Note that all of this is static analysis, which may be false > > > > positive. > > > > Feel free to tell me if there is something I've missed. > > > > > > > > Regard, > > > > Zheng > > > > > > Dear Zheng, > > > > > > You set up an application scenario: > > > cpu1 is using the mtk-jpeg driver and timeout work has been > > > scheduled. > > > At the same time cpu0 wanted to remove the mtk-jpeg driver, which > > > caused the UAF bug. > > > I wonder if such an irrational application scenario could exist. > > > This > > > scenario, as you described, not only leads to the problems you > > > mentioned, but also to output&capture memory leaks and unreleased > > > resources, such as spinlock. > > > Typically, if we want to remove the driver, we firstly do stop > > > streaming, which ensures that the worker has been canceled. > > > I agree with your changes from the perspective of strengthening the > > > robustness of the driver code. > > > > > > Regards, > > > Kyrie. > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > Dear Kyrie, > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > Best Regards > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno > > > > > wrote: > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > Hi, > > > > > > > > > > > > > > Is there anyone who can help with this? I can provide more > > > > > > > details > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. > > > > > > > > > > > > Regards, > > > > > > Angelo > > > > > > > > > > > > > Thanks, > > > > > > > Zheng > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > > > > > > and mtk_jpeg_enc_device_run may be called to start the > > > > > > > > work. > > > > > > > > If we remove the module which will call mtk_jpeg_remove > > > > > > > > to make cleanup, there may be a unfinished work. The > > > > > > > > possible sequence is as follows, which will cause a > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > > mtk_jpeg_remove | > > > > > > > > v4l2_m2m_release | > > > > > > > > kfree(m2m_dev); | > > > > > > > > | > > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > > --- > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | > > > > > > > > 2 +- > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > --- > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > +++ > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct > > > > > > > > platform_device *pdev) > > > > > > > > static int mtk_jpeg_remove(struct platform_device > > > > > > > > *pdev) > > > > > > > > { > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > - > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > -- > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > >
Hi Kyrie, After reviewing the code, I found anothe possible code path. As I am not familiar with the module. It has high possibility it's wrong. Could please help me check this? Very much appreciated for your valuable time. In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started in mtk_jpeg_multicore_dec_device_run, which made it running on cpu1. Inside the mtk_jpeg_multicore_dec_device_run, it will call schedule_delayed_work to start the timeout_work, which will make it running on cpu2. Meanwhile, we can call mtk_jpeg_release to cancel the job. But there might be a race between mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call v4l2_m2m_job_finish too early to wake up the event. The remove will go on, the other race is as described earlier. cpu0 cpu1 cpu2 (1)->device_run mtk_jpeg_multicore _dec_device_run queue_work (jpeg->workqueue, &ctx->jpeg_work); (2)mtk_jpegdec_worker (3)mtk_jpeg_release v4l2_m2m_cancel_job wait event schedule_delayed_work (4)mtk_jpeg_job_timeout_work (5)v4l2_m2m_job_finish wake up (6)mtk_jpeg_remove v4l2_m2m_release kfree(m2m_dev) (7)v4l2_m2m_get_curr_priv Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > Hi Kyrie, > > > > Thank you for your careful analysis and response. I still have some > > areas that I don't quite understand and would like to ask for > > clarification. That is, how do the function pointers for stop > > streaming, initialized as mtk_jpeg_enc_stop_streaming and > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? I > > would greatly appreciate your response. > > > > Best regards, > > Zheng > > Dear zheng, > > For stop streaming, what I mean is that stoppping jpeg decoding or > encoding. > Ok, let me introduce the sw flow of stop streaming: > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will call > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is > running), it will wait event, the event is wake up by > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by jpeg > dec/enc irq handler, which means that the waitting would result mtk hw > to finish dec/enc, irq will occur and irq handler would cancel timeout > worker. The follow is shown as blow. > v4l2_m2m_ioctl_streamoff > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec_irq > wait evnet <------ wake up ------v4l2_m2m_job_finish > cancel timeout work > > As mentioned above, if it is normal stop streaming action, there will > be no happen that the timeout worker does not canceled. > > But if mtk_jpeg_remove is called directly without above flow, it would > cause lots of issues. > > Regards, > Kyrie. > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > > The timer function was set in mtk_jpeg_probe with > > > > mtk_jpeg_job_timeout_work function. > > > > And the worker is started in mtk_jpeg_dec_device_run. > > > > There are two functions (mtk_jpeg_enc_irq and mtk_jpeg_dec_irq) > > > > which > > > > may cancel the worker. > > > > They are used as IRQ handler function which is saved as function > > > > pointer in a variable. > > > > In mtk_jpeg_probe, they are registered by devm_request_irq: > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > jpeg_irq, > > > > jpeg->variant->irq_handler, > > > > 0, > > > > pdev->name, jpeg); > > > > if (ret) { > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d (%d)\n", > > > > jpeg_irq, ret); > > > > return ret; > > > > } > > > > > > > > However, if we remove the module without triggering the irq, the > > > > worker will never be removed. > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > mtk_jpeg_enc_device_run will start the worker. > > > > The schedule invoking is quite complicated. As far as I know, the > > > > invoking chain is as follows: > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > > > > > > > the v4l2_m2m_device_run_work function is also a worker which is > > > > set > > > > in > > > > v4l2_m2m_init and started in > > > > v4l2_m2m_schedule_next_job. > > > > > > > > Before calling remove function, the mtk_jpeg_release was invoked > > > > to > > > > release the related resource. > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > But this will only cancel the current queue by > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > I think this can not cancel the posted task mentioned before. So > > > > I > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it in > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > { > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > > >private_data); > > > > > > > > mutex_lock(&jpeg->lock); > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > v4l2_fh_del(&ctx->fh); > > > > v4l2_fh_exit(&ctx->fh); > > > > kfree(ctx); > > > > mutex_unlock(&jpeg->lock); > > > > return 0; > > > > } > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > > { > > > > /* wait until the current context is dequeued from job_queue */ > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > kfree(m2m_ctx); > > > > } > > > > > > > > Note that all of this is static analysis, which may be false > > > > positive. > > > > Feel free to tell me if there is something I've missed. > > > > > > > > Regard, > > > > Zheng > > > > > > Dear Zheng, > > > > > > You set up an application scenario: > > > cpu1 is using the mtk-jpeg driver and timeout work has been > > > scheduled. > > > At the same time cpu0 wanted to remove the mtk-jpeg driver, which > > > caused the UAF bug. > > > I wonder if such an irrational application scenario could exist. > > > This > > > scenario, as you described, not only leads to the problems you > > > mentioned, but also to output&capture memory leaks and unreleased > > > resources, such as spinlock. > > > Typically, if we want to remove the driver, we firstly do stop > > > streaming, which ensures that the worker has been canceled. > > > I agree with your changes from the perspective of strengthening the > > > robustness of the driver code. > > > > > > Regards, > > > Kyrie. > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > Dear Kyrie, > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > Best Regards > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del Regno > > > > > wrote: > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > Hi, > > > > > > > > > > > > > > Is there anyone who can help with this? I can provide more > > > > > > > details > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk-jpeg. > > > > > > > > > > > > Regards, > > > > > > Angelo > > > > > > > > > > > > > Thanks, > > > > > > > Zheng > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with > > > > > > > > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run > > > > > > > > and mtk_jpeg_enc_device_run may be called to start the > > > > > > > > work. > > > > > > > > If we remove the module which will call mtk_jpeg_remove > > > > > > > > to make cleanup, there may be a unfinished work. The > > > > > > > > possible sequence is as follows, which will cause a > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > > mtk_jpeg_remove | > > > > > > > > v4l2_m2m_release | > > > > > > > > kfree(m2m_dev); | > > > > > > > > | > > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > > --- > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | > > > > > > > > 2 +- > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > --- > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > +++ > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct > > > > > > > > platform_device *pdev) > > > > > > > > static int mtk_jpeg_remove(struct platform_device > > > > > > > > *pdev) > > > > > > > > { > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > - > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > -- > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > >
On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > Hi Kyrie, > > After reviewing the code, I found anothe possible code path. As I am > not familiar with the module. It has high possibility it's wrong. > Could please help me check this? Very much appreciated for your > valuable time. > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started > in > mtk_jpeg_multicore_dec_device_run, which made it running on cpu1. > Inside the mtk_jpeg_multicore_dec_device_run, it will call > schedule_delayed_work to start the timeout_work, which will make it > running on cpu2. Meanwhile, we can call > mtk_jpeg_release to cancel the job. But there might be a race between > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > v4l2_m2m_job_finish too early to wake up the event. > The remove will go on, the other race is as described earlier. > > cpu0 cpu1 cpu2 > (1)->device_run > mtk_jpeg_multicore > _dec_device_run > queue_work > (jpeg->workqueue, > &ctx->jpeg_work); > (2)mtk_jpegdec_worker > (3)mtk_jpeg_release > v4l2_m2m_cancel_job > wait event > > schedule_delayed_work > (4)mtk_jpeg_job_timeout_w > ork > (5)v4l2_m2m_job_finish > wake up > (6)mtk_jpeg_remove > v4l2_m2m_release > kfree(m2m_dev) > (7)v4l2_m2m_get_curr_priv > Dear zheng, The mtk_jpeg_multicore_dec_device_run function is used for multi-hw jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work, mtk_jpegdec_worker is scheduled in this function. The mtk_jpeg_dec_device_run function is used for single hw jpeg decoding, which schedules mtk_jpeg_job_timeout_work. A driver is either a single hw driver or a multi-hw driver and cannot represent both at the same time. mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be scheduled at the same time. So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not cause competition between the mtk_jpegdec_worker and v4l2_m2m_cancel_job. Regards, Kyrie. > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > > Hi Kyrie, > > > > > > Thank you for your careful analysis and response. I still have > > > some > > > areas that I don't quite understand and would like to ask for > > > clarification. That is, how do the function pointers for stop > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? > > > I > > > would greatly appreciate your response. > > > > > > Best regards, > > > Zheng > > > > Dear zheng, > > > > For stop streaming, what I mean is that stoppping jpeg decoding or > > encoding. > > Ok, let me introduce the sw flow of stop streaming: > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will > > call > > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is > > running), it will wait event, the event is wake up by > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by > > jpeg > > dec/enc irq handler, which means that the waitting would result mtk > > hw > > to finish dec/enc, irq will occur and irq handler would cancel > > timeout > > worker. The follow is shown as blow. > > v4l2_m2m_ioctl_streamoff > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec > > _irq > > wait evnet <------ wake up ------v4l2_m2m_job_finish > > cancel timeout work > > > > As mentioned above, if it is normal stop streaming action, there > > will > > be no happen that the timeout worker does not canceled. > > > > But if mtk_jpeg_remove is called directly without above flow, it > > would > > cause lots of issues. > > > > Regards, > > Kyrie. > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > > > The timer function was set in mtk_jpeg_probe with > > > > > mtk_jpeg_job_timeout_work function. > > > > > And the worker is started in mtk_jpeg_dec_device_run. > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > mtk_jpeg_dec_irq) > > > > > which > > > > > may cancel the worker. > > > > > They are used as IRQ handler function which is saved as > > > > > function > > > > > pointer in a variable. > > > > > In mtk_jpeg_probe, they are registered by devm_request_irq: > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > jpeg_irq, > > > > > jpeg->variant->irq_handler, > > > > > 0, > > > > > pdev->name, jpeg); > > > > > if (ret) { > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d > > > > > (%d)\n", > > > > > jpeg_irq, ret); > > > > > return ret; > > > > > } > > > > > > > > > > However, if we remove the module without triggering the irq, > > > > > the > > > > > worker will never be removed. > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > The schedule invoking is quite complicated. As far as I know, > > > > > the > > > > > invoking chain is as follows: > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > > > > > > > > > the v4l2_m2m_device_run_work function is also a worker which > > > > > is > > > > > set > > > > > in > > > > > v4l2_m2m_init and started in > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > Before calling remove function, the mtk_jpeg_release was > > > > > invoked > > > > > to > > > > > release the related resource. > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > But this will only cancel the current queue by > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > I think this can not cancel the posted task mentioned before. > > > > > So > > > > > I > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it in > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > { > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > > > > private_data); > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > v4l2_fh_del(&ctx->fh); > > > > > v4l2_fh_exit(&ctx->fh); > > > > > kfree(ctx); > > > > > mutex_unlock(&jpeg->lock); > > > > > return 0; > > > > > } > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > > > { > > > > > /* wait until the current context is dequeued from > > > > > job_queue */ > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > kfree(m2m_ctx); > > > > > } > > > > > > > > > > Note that all of this is static analysis, which may be false > > > > > positive. > > > > > Feel free to tell me if there is something I've missed. > > > > > > > > > > Regard, > > > > > Zheng > > > > > > > > Dear Zheng, > > > > > > > > You set up an application scenario: > > > > cpu1 is using the mtk-jpeg driver and timeout work has been > > > > scheduled. > > > > At the same time cpu0 wanted to remove the mtk-jpeg driver, > > > > which > > > > caused the UAF bug. > > > > I wonder if such an irrational application scenario could > > > > exist. > > > > This > > > > scenario, as you described, not only leads to the problems you > > > > mentioned, but also to output&capture memory leaks and > > > > unreleased > > > > resources, such as spinlock. > > > > Typically, if we want to remove the driver, we firstly do stop > > > > streaming, which ensures that the worker has been canceled. > > > > I agree with your changes from the perspective of strengthening > > > > the > > > > robustness of the driver code. > > > > > > > > Regards, > > > > Kyrie. > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > Best Regards > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del > > > > > > Regno > > > > > > wrote: > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > Hi, > > > > > > > > > > > > > > > > Is there anyone who can help with this? I can provide > > > > > > > > more > > > > > > > > details > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk- > > > > > > > jpeg. > > > > > > > > > > > > > > Regards, > > > > > > > Angelo > > > > > > > > > > > > > > > Thanks, > > > > > > > > Zheng > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound > > > > > > > > > with > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > and mtk_jpeg_enc_device_run may be called to start > > > > > > > > > the > > > > > > > > > work. > > > > > > > > > If we remove the module which will call > > > > > > > > > mtk_jpeg_remove > > > > > > > > > to make cleanup, there may be a unfinished work. The > > > > > > > > > possible sequence is as follows, which will cause a > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > v4l2_m2m_release | > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > | > > > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > --- > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > > | > > > > > > > > > 2 +- > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > c > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > c > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > --- > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > c > > > > > > > > > +++ > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > c > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > platform_device *pdev) > > > > > > > > > static int mtk_jpeg_remove(struct platform_device > > > > > > > > > *pdev) > > > > > > > > > { > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > - > > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > > -- > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Dear Kyrie, Sorry for my misunderstanding. I've seen the timeout worker is bound with mtk_jpegdec_timeout_work rather than mtk_jpeg_job_timeout_work.So the competition won't happen. Back to the beginning scene, I still don't know if it's a UAF issue or not. The normal schedule is very strong to me. If we can call mtk_jpeg_remove directly without calling mtk_jpeg_release, The UAF is still possible. Otherwis, I think it's safe here. Best regards, Zheng Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > Hi Kyrie, > > > > After reviewing the code, I found anothe possible code path. As I am > > not familiar with the module. It has high possibility it's wrong. > > Could please help me check this? Very much appreciated for your > > valuable time. > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started > > in > > mtk_jpeg_multicore_dec_device_run, which made it running on cpu1. > > Inside the mtk_jpeg_multicore_dec_device_run, it will call > > schedule_delayed_work to start the timeout_work, which will make it > > running on cpu2. Meanwhile, we can call > > mtk_jpeg_release to cancel the job. But there might be a race between > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > v4l2_m2m_job_finish too early to wake up the event. > > The remove will go on, the other race is as described earlier. > > > > cpu0 cpu1 cpu2 > > (1)->device_run > > mtk_jpeg_multicore > > _dec_device_run > > queue_work > > (jpeg->workqueue, > > &ctx->jpeg_work); > > (2)mtk_jpegdec_worker > > (3)mtk_jpeg_release > > v4l2_m2m_cancel_job > > wait event > > > > schedule_delayed_work > > (4)mtk_jpeg_job_timeout_w > > ork > > (5)v4l2_m2m_job_finish > > wake up > > (6)mtk_jpeg_remove > > v4l2_m2m_release > > kfree(m2m_dev) > > (7)v4l2_m2m_get_curr_priv > > > Dear zheng, > > The mtk_jpeg_multicore_dec_device_run function is used for multi-hw > jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work, > mtk_jpegdec_worker is scheduled in this function. > > The mtk_jpeg_dec_device_run function is used for single hw jpeg > decoding, which schedules mtk_jpeg_job_timeout_work. > > A driver is either a single hw driver or a multi-hw driver and cannot > represent both at the same time. > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be scheduled at > the same time. > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not cause > competition between the mtk_jpegdec_worker and v4l2_m2m_cancel_job. > > Regards, > Kyrie. > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > > > Hi Kyrie, > > > > > > > > Thank you for your careful analysis and response. I still have > > > > some > > > > areas that I don't quite understand and would like to ask for > > > > clarification. That is, how do the function pointers for stop > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? > > > > I > > > > would greatly appreciate your response. > > > > > > > > Best regards, > > > > Zheng > > > > > > Dear zheng, > > > > > > For stop streaming, what I mean is that stoppping jpeg decoding or > > > encoding. > > > Ok, let me introduce the sw flow of stop streaming: > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will > > > call > > > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is > > > running), it will wait event, the event is wake up by > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by > > > jpeg > > > dec/enc irq handler, which means that the waitting would result mtk > > > hw > > > to finish dec/enc, irq will occur and irq handler would cancel > > > timeout > > > worker. The follow is shown as blow. > > > v4l2_m2m_ioctl_streamoff > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec > > > _irq > > > wait evnet <------ wake up ------v4l2_m2m_job_finish > > > cancel timeout work > > > > > > As mentioned above, if it is normal stop streaming action, there > > > will > > > be no happen that the timeout worker does not canceled. > > > > > > But if mtk_jpeg_remove is called directly without above flow, it > > > would > > > cause lots of issues. > > > > > > Regards, > > > Kyrie. > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > > > > The timer function was set in mtk_jpeg_probe with > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > And the worker is started in mtk_jpeg_dec_device_run. > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > mtk_jpeg_dec_irq) > > > > > > which > > > > > > may cancel the worker. > > > > > > They are used as IRQ handler function which is saved as > > > > > > function > > > > > > pointer in a variable. > > > > > > In mtk_jpeg_probe, they are registered by devm_request_irq: > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > jpeg_irq, > > > > > > jpeg->variant->irq_handler, > > > > > > 0, > > > > > > pdev->name, jpeg); > > > > > > if (ret) { > > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d > > > > > > (%d)\n", > > > > > > jpeg_irq, ret); > > > > > > return ret; > > > > > > } > > > > > > > > > > > > However, if we remove the module without triggering the irq, > > > > > > the > > > > > > worker will never be removed. > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > The schedule invoking is quite complicated. As far as I know, > > > > > > the > > > > > > invoking chain is as follows: > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a worker which > > > > > > is > > > > > > set > > > > > > in > > > > > > v4l2_m2m_init and started in > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > Before calling remove function, the mtk_jpeg_release was > > > > > > invoked > > > > > > to > > > > > > release the related resource. > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > I think this can not cancel the posted task mentioned before. > > > > > > So > > > > > > I > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it in > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > { > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > > > > > private_data); > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > kfree(ctx); > > > > > > mutex_unlock(&jpeg->lock); > > > > > > return 0; > > > > > > } > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > > > > { > > > > > > /* wait until the current context is dequeued from > > > > > > job_queue */ > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > } > > > > > > > > > > > > Note that all of this is static analysis, which may be false > > > > > > positive. > > > > > > Feel free to tell me if there is something I've missed. > > > > > > > > > > > > Regard, > > > > > > Zheng > > > > > > > > > > Dear Zheng, > > > > > > > > > > You set up an application scenario: > > > > > cpu1 is using the mtk-jpeg driver and timeout work has been > > > > > scheduled. > > > > > At the same time cpu0 wanted to remove the mtk-jpeg driver, > > > > > which > > > > > caused the UAF bug. > > > > > I wonder if such an irrational application scenario could > > > > > exist. > > > > > This > > > > > scenario, as you described, not only leads to the problems you > > > > > mentioned, but also to output&capture memory leaks and > > > > > unreleased > > > > > resources, such as spinlock. > > > > > Typically, if we want to remove the driver, we firstly do stop > > > > > streaming, which ensures that the worker has been canceled. > > > > > I agree with your changes from the perspective of strengthening > > > > > the > > > > > robustness of the driver code. > > > > > > > > > > Regards, > > > > > Kyrie. > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del > > > > > > > Regno > > > > > > > wrote: > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I can provide > > > > > > > > > more > > > > > > > > > details > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk- > > > > > > > > jpeg. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Angelo > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound > > > > > > > > > > with > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > and mtk_jpeg_enc_device_run may be called to start > > > > > > > > > > the > > > > > > > > > > work. > > > > > > > > > > If we remove the module which will call > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > to make cleanup, there may be a unfinished work. The > > > > > > > > > > possible sequence is as follows, which will cause a > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > | > > > > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > > > | > > > > > > > > > > 2 +- > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > c > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > c > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > --- > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > c > > > > > > > > > > +++ > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > c > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > platform_device *pdev) > > > > > > > > > > static int mtk_jpeg_remove(struct platform_device > > > > > > > > > > *pdev) > > > > > > > > > > { > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > - > > > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > > > -- > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Dear Kyrie, Guenter gave some advice about this issue in [1]. As we are not familiar with the code here, could you please see if the bug really exists? [1] https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ Best regards, Zheng Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > Dear Kyrie, > > Sorry for my misunderstanding. I've seen the timeout worker is bound > with mtk_jpegdec_timeout_work rather than mtk_jpeg_job_timeout_work.So > the competition won't happen. > > Back to the beginning scene, I still don't know if it's a UAF issue or > not. The normal schedule is very strong to me. If we can call > mtk_jpeg_remove directly without calling mtk_jpeg_release, > The UAF is still possible. Otherwis, I think it's safe here. > > Best regards, > Zheng > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > Hi Kyrie, > > > > > > After reviewing the code, I found anothe possible code path. As I am > > > not familiar with the module. It has high possibility it's wrong. > > > Could please help me check this? Very much appreciated for your > > > valuable time. > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and started > > > in > > > mtk_jpeg_multicore_dec_device_run, which made it running on cpu1. > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call > > > schedule_delayed_work to start the timeout_work, which will make it > > > running on cpu2. Meanwhile, we can call > > > mtk_jpeg_release to cancel the job. But there might be a race between > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > v4l2_m2m_job_finish too early to wake up the event. > > > The remove will go on, the other race is as described earlier. > > > > > > cpu0 cpu1 cpu2 > > > (1)->device_run > > > mtk_jpeg_multicore > > > _dec_device_run > > > queue_work > > > (jpeg->workqueue, > > > &ctx->jpeg_work); > > > (2)mtk_jpegdec_worker > > > (3)mtk_jpeg_release > > > v4l2_m2m_cancel_job > > > wait event > > > > > > schedule_delayed_work > > > (4)mtk_jpeg_job_timeout_w > > > ork > > > (5)v4l2_m2m_job_finish > > > wake up > > > (6)mtk_jpeg_remove > > > v4l2_m2m_release > > > kfree(m2m_dev) > > > (7)v4l2_m2m_get_curr_priv > > > > > Dear zheng, > > > > The mtk_jpeg_multicore_dec_device_run function is used for multi-hw > > jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work, > > mtk_jpegdec_worker is scheduled in this function. > > > > The mtk_jpeg_dec_device_run function is used for single hw jpeg > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > A driver is either a single hw driver or a multi-hw driver and cannot > > represent both at the same time. > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be scheduled at > > the same time. > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not cause > > competition between the mtk_jpegdec_worker and v4l2_m2m_cancel_job. > > > > Regards, > > Kyrie. > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > > > > Hi Kyrie, > > > > > > > > > > Thank you for your careful analysis and response. I still have > > > > > some > > > > > areas that I don't quite understand and would like to ask for > > > > > clarification. That is, how do the function pointers for stop > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is canceled? > > > > > I > > > > > would greatly appreciate your response. > > > > > > > > > > Best regards, > > > > > Zheng > > > > > > > > Dear zheng, > > > > > > > > For stop streaming, what I mean is that stoppping jpeg decoding or > > > > encoding. > > > > Ok, let me introduce the sw flow of stop streaming: > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which will > > > > call > > > > v4l2_m2m_cancel_job, if it finds a job running(as you note, cpu1 is > > > > running), it will wait event, the event is wake up by > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is called by > > > > jpeg > > > > dec/enc irq handler, which means that the waitting would result mtk > > > > hw > > > > to finish dec/enc, irq will occur and irq handler would cancel > > > > timeout > > > > worker. The follow is shown as blow. > > > > v4l2_m2m_ioctl_streamoff > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jpeg_dec > > > > _irq > > > > wait evnet <------ wake up ------v4l2_m2m_job_finish > > > > cancel timeout work > > > > > > > > As mentioned above, if it is normal stop streaming action, there > > > > will > > > > be no happen that the timeout worker does not canceled. > > > > > > > > But if mtk_jpeg_remove is called directly without above flow, it > > > > would > > > > cause lots of issues. > > > > > > > > Regards, > > > > Kyrie. > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > > > > > The timer function was set in mtk_jpeg_probe with > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > And the worker is started in mtk_jpeg_dec_device_run. > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > mtk_jpeg_dec_irq) > > > > > > > which > > > > > > > may cancel the worker. > > > > > > > They are used as IRQ handler function which is saved as > > > > > > > function > > > > > > > pointer in a variable. > > > > > > > In mtk_jpeg_probe, they are registered by devm_request_irq: > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > jpeg_irq, > > > > > > > jpeg->variant->irq_handler, > > > > > > > 0, > > > > > > > pdev->name, jpeg); > > > > > > > if (ret) { > > > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq %d > > > > > > > (%d)\n", > > > > > > > jpeg_irq, ret); > > > > > > > return ret; > > > > > > > } > > > > > > > > > > > > > > However, if we remove the module without triggering the irq, > > > > > > > the > > > > > > > worker will never be removed. > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > The schedule invoking is quite complicated. As far as I know, > > > > > > > the > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work->v4l2_m2m_try_run > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a worker which > > > > > > > is > > > > > > > set > > > > > > > in > > > > > > > v4l2_m2m_init and started in > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > Before calling remove function, the mtk_jpeg_release was > > > > > > > invoked > > > > > > > to > > > > > > > release the related resource. > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > I think this can not cancel the posted task mentioned before. > > > > > > > So > > > > > > > I > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it in > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > { > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > > > > > > private_data); > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > kfree(ctx); > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > > > > > { > > > > > > > /* wait until the current context is dequeued from > > > > > > > job_queue */ > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > } > > > > > > > > > > > > > > Note that all of this is static analysis, which may be false > > > > > > > positive. > > > > > > > Feel free to tell me if there is something I've missed. > > > > > > > > > > > > > > Regard, > > > > > > > Zheng > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > You set up an application scenario: > > > > > > cpu1 is using the mtk-jpeg driver and timeout work has been > > > > > > scheduled. > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg driver, > > > > > > which > > > > > > caused the UAF bug. > > > > > > I wonder if such an irrational application scenario could > > > > > > exist. > > > > > > This > > > > > > scenario, as you described, not only leads to the problems you > > > > > > mentioned, but also to output&capture memory leaks and > > > > > > unreleased > > > > > > resources, such as spinlock. > > > > > > Typically, if we want to remove the driver, we firstly do stop > > > > > > streaming, which ensures that the worker has been canceled. > > > > > > I agree with your changes from the perspective of strengthening > > > > > > the > > > > > > robustness of the driver code. > > > > > > > > > > > > Regards, > > > > > > Kyrie. > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 18:23写道: > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino Del > > > > > > > > Regno > > > > > > > > wrote: > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I can provide > > > > > > > > > > more > > > > > > > > > > details > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing mtk- > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 14:28写道: > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound > > > > > > > > > > > with > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called to start > > > > > > > > > > > the > > > > > > > > > > > work. > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > to make cleanup, there may be a unfinished work. The > > > > > > > > > > > possible sequence is as follows, which will cause a > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in the > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > | > > > > > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > > > > > | m2m_dev->curr_ctx //use > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c > > > > > > > > > > > | > > > > > > > > > > > 2 +- > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > > c > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > > c > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > --- > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > > c > > > > > > > > > > > +++ > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core. > > > > > > > > > > > c > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > static int mtk_jpeg_remove(struct platform_device > > > > > > > > > > > *pdev) > > > > > > > > > > > { > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > - > > > > > > > > > > > + cancel_delayed_work(&jpeg->job_timeout_work); > > > > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > > > > -- > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote: > Dear Kyrie, > > Guenter gave some advice about this issue in [1]. As we are not > familiar with the code here, could you please see if the bug really > exists? > > [1] > https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ > > Best regards, > Zheng Dear zheng, I have no more questions for your patch, thank you for your contribution to mtk jpeg driver. Regards, Kyrie. > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > > > Dear Kyrie, > > > > Sorry for my misunderstanding. I've seen the timeout worker is > > bound > > with mtk_jpegdec_timeout_work rather than > > mtk_jpeg_job_timeout_work.So > > the competition won't happen. > > > > Back to the beginning scene, I still don't know if it's a UAF issue > > or > > not. The normal schedule is very strong to me. If we can call > > mtk_jpeg_remove directly without calling mtk_jpeg_release, > > The UAF is still possible. Otherwis, I think it's safe here. > > > > Best regards, > > Zheng > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > > Hi Kyrie, > > > > > > > > After reviewing the code, I found anothe possible code path. As > > > > I am > > > > not familiar with the module. It has high possibility it's > > > > wrong. > > > > Could please help me check this? Very much appreciated for your > > > > valuable time. > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and > > > > started > > > > in > > > > mtk_jpeg_multicore_dec_device_run, which made it running on > > > > cpu1. > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call > > > > schedule_delayed_work to start the timeout_work, which will > > > > make it > > > > running on cpu2. Meanwhile, we can call > > > > mtk_jpeg_release to cancel the job. But there might be a race > > > > between > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > > v4l2_m2m_job_finish too early to wake up the event. > > > > The remove will go on, the other race is as described earlier. > > > > > > > > cpu0 cpu1 cpu2 > > > > (1)->device_run > > > > mtk_jpeg_multicore > > > > _dec_device_run > > > > queue_work > > > > (jpeg->workqueue, > > > > &ctx->jpeg_work); > > > > (2)mtk_jpegdec_worker > > > > (3)mtk_jpeg_release > > > > v4l2_m2m_cancel_job > > > > wait event > > > > > > > > schedule_delayed_work > > > > (4)mtk_jpeg_job_tim > > > > eout_w > > > > ork > > > > (5)v4l2_m2m_job_finish > > > > wake up > > > > (6)mtk_jpeg_remove > > > > v4l2_m2m_release > > > > kfree(m2m_dev) > > > > (7)v4l2_m2m_get_cur > > > > r_priv > > > > > > > > > > Dear zheng, > > > > > > The mtk_jpeg_multicore_dec_device_run function is used for multi- > > > hw > > > jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work, > > > mtk_jpegdec_worker is scheduled in this function. > > > > > > The mtk_jpeg_dec_device_run function is used for single hw jpeg > > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > > > A driver is either a single hw driver or a multi-hw driver and > > > cannot > > > represent both at the same time. > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be > > > scheduled at > > > the same time. > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not > > > cause > > > competition between the mtk_jpegdec_worker and > > > v4l2_m2m_cancel_job. > > > > > > Regards, > > > Kyrie. > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > > > > > Hi Kyrie, > > > > > > > > > > > > Thank you for your careful analysis and response. I still > > > > > > have > > > > > > some > > > > > > areas that I don't quite understand and would like to ask > > > > > > for > > > > > > clarification. That is, how do the function pointers for > > > > > > stop > > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is > > > > > > canceled? > > > > > > I > > > > > > would greatly appreciate your response. > > > > > > > > > > > > Best regards, > > > > > > Zheng > > > > > > > > > > Dear zheng, > > > > > > > > > > For stop streaming, what I mean is that stoppping jpeg > > > > > decoding or > > > > > encoding. > > > > > Ok, let me introduce the sw flow of stop streaming: > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which > > > > > will > > > > > call > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you note, > > > > > cpu1 is > > > > > running), it will wait event, the event is wake up by > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is > > > > > called by > > > > > jpeg > > > > > dec/enc irq handler, which means that the waitting would > > > > > result mtk > > > > > hw > > > > > to finish dec/enc, irq will occur and irq handler would > > > > > cancel > > > > > timeout > > > > > worker. The follow is shown as blow. > > > > > v4l2_m2m_ioctl_streamoff > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jp > > > > > eg_dec > > > > > _irq > > > > > wait evnet <------ wake up ------v4l2_m2m_job_finish > > > > > cancel timeout work > > > > > > > > > > As mentioned above, if it is normal stop streaming action, > > > > > there > > > > > will > > > > > be no happen that the timeout worker does not canceled. > > > > > > > > > > But if mtk_jpeg_remove is called directly without above flow, > > > > > it > > > > > would > > > > > cause lots of issues. > > > > > > > > > > Regards, > > > > > Kyrie. > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > > > > > > The timer function was set in mtk_jpeg_probe with > > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > > And the worker is started in mtk_jpeg_dec_device_run. > > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > > mtk_jpeg_dec_irq) > > > > > > > > which > > > > > > > > may cancel the worker. > > > > > > > > They are used as IRQ handler function which is saved as > > > > > > > > function > > > > > > > > pointer in a variable. > > > > > > > > In mtk_jpeg_probe, they are registered by > > > > > > > > devm_request_irq: > > > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > > jpeg_irq, > > > > > > > > jpeg->variant->irq_handler, > > > > > > > > 0, > > > > > > > > pdev->name, jpeg); > > > > > > > > if (ret) { > > > > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq > > > > > > > > %d > > > > > > > > (%d)\n", > > > > > > > > jpeg_irq, ret); > > > > > > > > return ret; > > > > > > > > } > > > > > > > > > > > > > > > > However, if we remove the module without triggering the > > > > > > > > irq, > > > > > > > > the > > > > > > > > worker will never be removed. > > > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > > The schedule invoking is quite complicated. As far as I > > > > > > > > know, > > > > > > > > the > > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work- > > > > > > > > >v4l2_m2m_try_run > > > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a worker > > > > > > > > which > > > > > > > > is > > > > > > > > set > > > > > > > > in > > > > > > > > v4l2_m2m_init and started in > > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > > > Before calling remove function, the mtk_jpeg_release > > > > > > > > was > > > > > > > > invoked > > > > > > > > to > > > > > > > > release the related resource. > > > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > > > I think this can not cancel the posted task mentioned > > > > > > > > before. > > > > > > > > So > > > > > > > > I > > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it > > > > > > > > in > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > > { > > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > > > > > > > private_data); > > > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > > kfree(ctx); > > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > > > > > > { > > > > > > > > /* wait until the current context is dequeued from > > > > > > > > job_queue */ > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > > } > > > > > > > > > > > > > > > > Note that all of this is static analysis, which may be > > > > > > > > false > > > > > > > > positive. > > > > > > > > Feel free to tell me if there is something I've missed. > > > > > > > > > > > > > > > > Regard, > > > > > > > > Zheng > > > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > > > You set up an application scenario: > > > > > > > cpu1 is using the mtk-jpeg driver and timeout work has > > > > > > > been > > > > > > > scheduled. > > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg > > > > > > > driver, > > > > > > > which > > > > > > > caused the UAF bug. > > > > > > > I wonder if such an irrational application scenario could > > > > > > > exist. > > > > > > > This > > > > > > > scenario, as you described, not only leads to the > > > > > > > problems you > > > > > > > mentioned, but also to output&capture memory leaks and > > > > > > > unreleased > > > > > > > resources, such as spinlock. > > > > > > > Typically, if we want to remove the driver, we firstly do > > > > > > > stop > > > > > > > streaming, which ensures that the worker has been > > > > > > > canceled. > > > > > > > I agree with your changes from the perspective of > > > > > > > strengthening > > > > > > > the > > > > > > > robustness of the driver code. > > > > > > > > > > > > > > Regards, > > > > > > > Kyrie. > > > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 > > > > > > > > 18:23写道: > > > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino > > > > > > > > > Del > > > > > > > > > Regno > > > > > > > > > wrote: > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I can > > > > > > > > > > > provide > > > > > > > > > > > more > > > > > > > > > > > details > > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing > > > > > > > > > > mtk- > > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 > > > > > > > > > > > 14:28写道: > > > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is > > > > > > > > > > > > bound > > > > > > > > > > > > with > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called to > > > > > > > > > > > > start > > > > > > > > > > > > the > > > > > > > > > > > > work. > > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > to make cleanup, there may be a unfinished > > > > > > > > > > > > work. The > > > > > > > > > > > > possible sequence is as follows, which will > > > > > > > > > > > > cause a > > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in > > > > > > > > > > > > the > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > > | > > > > > > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > > > > > > | m2m_dev->curr_ctx > > > > > > > > > > > > //use > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_c > > > > > > > > > > > > ore.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 +- > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > _core. > > > > > > > > > > > > c > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > _core. > > > > > > > > > > > > c > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > > --- > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > _core. > > > > > > > > > > > > c > > > > > > > > > > > > +++ > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > _core. > > > > > > > > > > > > c > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > > static int mtk_jpeg_remove(struct > > > > > > > > > > > > platform_device > > > > > > > > > > > > *pdev) > > > > > > > > > > > > { > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > > - > > > > > > > > > > > > + cancel_delayed_work(&jpeg- > > > > > > > > > > > > >job_timeout_work); > > > > > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > > > > > -- > > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月14日周二 10:49写道: > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote: > > Dear Kyrie, > > > > Guenter gave some advice about this issue in [1]. As we are not > > familiar with the code here, could you please see if the bug really > > exists? > > > > [1] > > https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ > > > > Best regards, > > Zheng > Dear zheng, > > I have no more questions for your patch, thank you for your > contribution to mtk jpeg driver. > Thanks for your detailed review and explanation about the code. I have learned a lot from it. Have a nice day :) Best regards, Zheng > > > > > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > > > > > Dear Kyrie, > > > > > > Sorry for my misunderstanding. I've seen the timeout worker is > > > bound > > > with mtk_jpegdec_timeout_work rather than > > > mtk_jpeg_job_timeout_work.So > > > the competition won't happen. > > > > > > Back to the beginning scene, I still don't know if it's a UAF issue > > > or > > > not. The normal schedule is very strong to me. If we can call > > > mtk_jpeg_remove directly without calling mtk_jpeg_release, > > > The UAF is still possible. Otherwis, I think it's safe here. > > > > > > Best regards, > > > Zheng > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > > > Hi Kyrie, > > > > > > > > > > After reviewing the code, I found anothe possible code path. As > > > > > I am > > > > > not familiar with the module. It has high possibility it's > > > > > wrong. > > > > > Could please help me check this? Very much appreciated for your > > > > > valuable time. > > > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and > > > > > started > > > > > in > > > > > mtk_jpeg_multicore_dec_device_run, which made it running on > > > > > cpu1. > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call > > > > > schedule_delayed_work to start the timeout_work, which will > > > > > make it > > > > > running on cpu2. Meanwhile, we can call > > > > > mtk_jpeg_release to cancel the job. But there might be a race > > > > > between > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > > > v4l2_m2m_job_finish too early to wake up the event. > > > > > The remove will go on, the other race is as described earlier. > > > > > > > > > > cpu0 cpu1 cpu2 > > > > > (1)->device_run > > > > > mtk_jpeg_multicore > > > > > _dec_device_run > > > > > queue_work > > > > > (jpeg->workqueue, > > > > > &ctx->jpeg_work); > > > > > (2)mtk_jpegdec_worker > > > > > (3)mtk_jpeg_release > > > > > v4l2_m2m_cancel_job > > > > > wait event > > > > > > > > > > schedule_delayed_work > > > > > (4)mtk_jpeg_job_tim > > > > > eout_w > > > > > ork > > > > > (5)v4l2_m2m_job_finish > > > > > wake up > > > > > (6)mtk_jpeg_remove > > > > > v4l2_m2m_release > > > > > kfree(m2m_dev) > > > > > (7)v4l2_m2m_get_cur > > > > > r_priv > > > > > > > > > > > > > Dear zheng, > > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used for multi- > > > > hw > > > > jpeg decoding. Instead of scheduling mtk_jpeg_job_timeout_work, > > > > mtk_jpegdec_worker is scheduled in this function. > > > > > > > > The mtk_jpeg_dec_device_run function is used for single hw jpeg > > > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > > > > > A driver is either a single hw driver or a multi-hw driver and > > > > cannot > > > > represent both at the same time. > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be > > > > scheduled at > > > > the same time. > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would not > > > > cause > > > > competition between the mtk_jpegdec_worker and > > > > v4l2_m2m_cancel_job. > > > > > > > > Regards, > > > > Kyrie. > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > Thank you for your careful analysis and response. I still > > > > > > > have > > > > > > > some > > > > > > > areas that I don't quite understand and would like to ask > > > > > > > for > > > > > > > clarification. That is, how do the function pointers for > > > > > > > stop > > > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming and > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is > > > > > > > canceled? > > > > > > > I > > > > > > > would greatly appreciate your response. > > > > > > > > > > > > > > Best regards, > > > > > > > Zheng > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > For stop streaming, what I mean is that stoppping jpeg > > > > > > decoding or > > > > > > encoding. > > > > > > Ok, let me introduce the sw flow of stop streaming: > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, which > > > > > > will > > > > > > call > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you note, > > > > > > cpu1 is > > > > > > running), it will wait event, the event is wake up by > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is > > > > > > called by > > > > > > jpeg > > > > > > dec/enc irq handler, which means that the waitting would > > > > > > result mtk > > > > > > hw > > > > > > to finish dec/enc, irq will occur and irq handler would > > > > > > cancel > > > > > > timeout > > > > > > worker. The follow is shown as blow. > > > > > > v4l2_m2m_ioctl_streamoff > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mtk_jp > > > > > > eg_dec > > > > > > _irq > > > > > > wait evnet <------ wake up ------v4l2_m2m_job_finish > > > > > > cancel timeout work > > > > > > > > > > > > As mentioned above, if it is normal stop streaming action, > > > > > > there > > > > > > will > > > > > > be no happen that the timeout worker does not canceled. > > > > > > > > > > > > But if mtk_jpeg_remove is called directly without above flow, > > > > > > it > > > > > > would > > > > > > cause lots of issues. > > > > > > > > > > > > Regards, > > > > > > Kyrie. > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 10:02写道: > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker wrote: > > > > > > > > > The timer function was set in mtk_jpeg_probe with > > > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > > > And the worker is started in mtk_jpeg_dec_device_run. > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > > > mtk_jpeg_dec_irq) > > > > > > > > > which > > > > > > > > > may cancel the worker. > > > > > > > > > They are used as IRQ handler function which is saved as > > > > > > > > > function > > > > > > > > > pointer in a variable. > > > > > > > > > In mtk_jpeg_probe, they are registered by > > > > > > > > > devm_request_irq: > > > > > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > > > jpeg_irq, > > > > > > > > > jpeg->variant->irq_handler, > > > > > > > > > 0, > > > > > > > > > pdev->name, jpeg); > > > > > > > > > if (ret) { > > > > > > > > > dev_err(&pdev->dev, "Failed to request jpeg_irq > > > > > > > > > %d > > > > > > > > > (%d)\n", > > > > > > > > > jpeg_irq, ret); > > > > > > > > > return ret; > > > > > > > > > } > > > > > > > > > > > > > > > > > > However, if we remove the module without triggering the > > > > > > > > > irq, > > > > > > > > > the > > > > > > > > > worker will never be removed. > > > > > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > > > The schedule invoking is quite complicated. As far as I > > > > > > > > > know, > > > > > > > > > the > > > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work- > > > > > > > > > >v4l2_m2m_try_run > > > > > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a worker > > > > > > > > > which > > > > > > > > > is > > > > > > > > > set > > > > > > > > > in > > > > > > > > > v4l2_m2m_init and started in > > > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > > > > > Before calling remove function, the mtk_jpeg_release > > > > > > > > > was > > > > > > > > > invoked > > > > > > > > > to > > > > > > > > > release the related resource. > > > > > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > > > > > I think this can not cancel the posted task mentioned > > > > > > > > > before. > > > > > > > > > So > > > > > > > > > I > > > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after freeing it > > > > > > > > > in > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > > > { > > > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > > > > > > struct mtk_jpeg_ctx *ctx = mtk_jpeg_fh_to_ctx(file- > > > > > > > > > > private_data); > > > > > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > > > kfree(ctx); > > > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > > > return 0; > > > > > > > > > } > > > > > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx *m2m_ctx) > > > > > > > > > { > > > > > > > > > /* wait until the current context is dequeued from > > > > > > > > > job_queue */ > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > > > } > > > > > > > > > > > > > > > > > > Note that all of this is static analysis, which may be > > > > > > > > > false > > > > > > > > > positive. > > > > > > > > > Feel free to tell me if there is something I've missed. > > > > > > > > > > > > > > > > > > Regard, > > > > > > > > > Zheng > > > > > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > > > > > You set up an application scenario: > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout work has > > > > > > > > been > > > > > > > > scheduled. > > > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg > > > > > > > > driver, > > > > > > > > which > > > > > > > > caused the UAF bug. > > > > > > > > I wonder if such an irrational application scenario could > > > > > > > > exist. > > > > > > > > This > > > > > > > > scenario, as you described, not only leads to the > > > > > > > > problems you > > > > > > > > mentioned, but also to output&capture memory leaks and > > > > > > > > unreleased > > > > > > > > resources, such as spinlock. > > > > > > > > Typically, if we want to remove the driver, we firstly do > > > > > > > > stop > > > > > > > > streaming, which ensures that the worker has been > > > > > > > > canceled. > > > > > > > > I agree with your changes from the perspective of > > > > > > > > strengthening > > > > > > > > the > > > > > > > > robustness of the driver code. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> 于2023年3月7日周二 > > > > > > > > > 18:23写道: > > > > > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, AngeloGioacchino > > > > > > > > > > Del > > > > > > > > > > Regno > > > > > > > > > > wrote: > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I can > > > > > > > > > > > > provide > > > > > > > > > > > > more > > > > > > > > > > > > details > > > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please do. > > > > > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's doing > > > > > > > > > > > mtk- > > > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 > > > > > > > > > > > > 14:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work is > > > > > > > > > > > > > bound > > > > > > > > > > > > > with > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called to > > > > > > > > > > > > > start > > > > > > > > > > > > > the > > > > > > > > > > > > > work. > > > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > to make cleanup, there may be a unfinished > > > > > > > > > > > > > work. The > > > > > > > > > > > > > possible sequence is as follows, which will > > > > > > > > > > > > > cause a > > > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup in > > > > > > > > > > > > > the > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_work > > > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > > > | > > > > > > > > > > > > > | v4l2_m2m_get_curr_priv > > > > > > > > > > > > > | m2m_dev->curr_ctx > > > > > > > > > > > > > //use > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jpeg_c > > > > > > > > > > > > > ore.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 +- > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > > _core. > > > > > > > > > > > > > c > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > > _core. > > > > > > > > > > > > > c > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > > > --- > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > > _core. > > > > > > > > > > > > > c > > > > > > > > > > > > > +++ > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_jpeg > > > > > > > > > > > > > _core. > > > > > > > > > > > > > c > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > > > static int mtk_jpeg_remove(struct > > > > > > > > > > > > > platform_device > > > > > > > > > > > > > *pdev) > > > > > > > > > > > > > { > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > > > - > > > > > > > > > > > > > + cancel_delayed_work(&jpeg- > > > > > > > > > > > > > >job_timeout_work); > > > > > > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > > > > > > video_unregister_device(jpeg->vdev); > > > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > > > > > > -- > > > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote: > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月14日周二 10:49写道: > > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote: > > > Dear Kyrie, > > > > > > Guenter gave some advice about this issue in [1]. As we are not > > > familiar with the code here, could you please see if the bug > > > really > > > exists? > > > > > > [1] > > > https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ > > > > > > Best regards, > > > Zheng > > > > Dear zheng, > > > > I have no more questions for your patch, thank you for your > > contribution to mtk jpeg driver. > > > > Thanks for your detailed review and explanation about the code. I > have > learned a lot > from it. Have a nice day :) > > Best regards, > Zheng Dear Zheng, I think that cancel_delayed_work, added in remove function, should change to cancel_delayed_work_sync to solve UAF. The corresponding operation flow is shown as follow: CPU0 CPU1 |mtk_jpeg_job_timeout_work mtk_jpeg_remove | cancel_delayed_work | v4l2_m2m_release | kfree(m2m_dev); | | | v4l2_m2m_get_curr_priv | m2m_dev->curr_ctx //use If cancel_delayed_work is called in async mode like above, and the m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this calling may cause UAF. Thanks. > > > > > > > > > > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > > > > > > > Dear Kyrie, > > > > > > > > Sorry for my misunderstanding. I've seen the timeout worker is > > > > bound > > > > with mtk_jpegdec_timeout_work rather than > > > > mtk_jpeg_job_timeout_work.So > > > > the competition won't happen. > > > > > > > > Back to the beginning scene, I still don't know if it's a UAF > > > > issue > > > > or > > > > not. The normal schedule is very strong to me. If we can call > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release, > > > > The UAF is still possible. Otherwis, I think it's safe here. > > > > > > > > Best regards, > > > > Zheng > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > > > > Hi Kyrie, > > > > > > > > > > > > After reviewing the code, I found anothe possible code > > > > > > path. As > > > > > > I am > > > > > > not familiar with the module. It has high possibility it's > > > > > > wrong. > > > > > > Could please help me check this? Very much appreciated for > > > > > > your > > > > > > valuable time. > > > > > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and > > > > > > started > > > > > > in > > > > > > mtk_jpeg_multicore_dec_device_run, which made it running on > > > > > > cpu1. > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call > > > > > > schedule_delayed_work to start the timeout_work, which > > > > > > will > > > > > > make it > > > > > > running on cpu2. Meanwhile, we can call > > > > > > mtk_jpeg_release to cancel the job. But there might be a > > > > > > race > > > > > > between > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > > > > v4l2_m2m_job_finish too early to wake up the event. > > > > > > The remove will go on, the other race is as described > > > > > > earlier. > > > > > > > > > > > > cpu0 cpu1 cpu2 > > > > > > (1)->device_run > > > > > > mtk_jpeg_multicore > > > > > > _dec_device_run > > > > > > queue_work > > > > > > (jpeg->workqueue, > > > > > > &ctx->jpeg_work); > > > > > > (2)mtk_jpegdec_worker > > > > > > (3)mtk_jpeg_release > > > > > > v4l2_m2m_cancel_job > > > > > > wait event > > > > > > > > > > > > schedule_delayed_work > > > > > > (4)mtk_jpeg_job > > > > > > _tim > > > > > > eout_w > > > > > > ork > > > > > > (5)v4l2_m2m_job_finish > > > > > > wake up > > > > > > (6)mtk_jpeg_remove > > > > > > v4l2_m2m_release > > > > > > kfree(m2m_dev) > > > > > > (7)v4l2_m2m_get > > > > > > _cur > > > > > > r_priv > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used for > > > > > multi- > > > > > hw > > > > > jpeg decoding. Instead of scheduling > > > > > mtk_jpeg_job_timeout_work, > > > > > mtk_jpegdec_worker is scheduled in this function. > > > > > > > > > > The mtk_jpeg_dec_device_run function is used for single hw > > > > > jpeg > > > > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > > > > > > > A driver is either a single hw driver or a multi-hw driver > > > > > and > > > > > cannot > > > > > represent both at the same time. > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be > > > > > scheduled at > > > > > the same time. > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would > > > > > not > > > > > cause > > > > > competition between the mtk_jpegdec_worker and > > > > > v4l2_m2m_cancel_job. > > > > > > > > > > Regards, > > > > > Kyrie. > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > > > > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > Thank you for your careful analysis and response. I > > > > > > > > still > > > > > > > > have > > > > > > > > some > > > > > > > > areas that I don't quite understand and would like to > > > > > > > > ask > > > > > > > > for > > > > > > > > clarification. That is, how do the function pointers > > > > > > > > for > > > > > > > > stop > > > > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming > > > > > > > > and > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is > > > > > > > > canceled? > > > > > > > > I > > > > > > > > would greatly appreciate your response. > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Zheng > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > For stop streaming, what I mean is that stoppping jpeg > > > > > > > decoding or > > > > > > > encoding. > > > > > > > Ok, let me introduce the sw flow of stop streaming: > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, > > > > > > > which > > > > > > > will > > > > > > > call > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you > > > > > > > note, > > > > > > > cpu1 is > > > > > > > running), it will wait event, the event is wake up by > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is > > > > > > > called by > > > > > > > jpeg > > > > > > > dec/enc irq handler, which means that the waitting would > > > > > > > result mtk > > > > > > > hw > > > > > > > to finish dec/enc, irq will occur and irq handler would > > > > > > > cancel > > > > > > > timeout > > > > > > > worker. The follow is shown as blow. > > > > > > > v4l2_m2m_ioctl_streamoff > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mt > > > > > > > k_jp > > > > > > > eg_dec > > > > > > > _irq > > > > > > > wait evnet <------ wake up ------ > > > > > > > v4l2_m2m_job_finish > > > > > > > cancel timeout > > > > > > > work > > > > > > > > > > > > > > As mentioned above, if it is normal stop streaming > > > > > > > action, > > > > > > > there > > > > > > > will > > > > > > > be no happen that the timeout worker does not canceled. > > > > > > > > > > > > > > But if mtk_jpeg_remove is called directly without above > > > > > > > flow, > > > > > > > it > > > > > > > would > > > > > > > cause lots of issues. > > > > > > > > > > > > > > Regards, > > > > > > > Kyrie. > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > 10:02写道: > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker > > > > > > > > > wrote: > > > > > > > > > > The timer function was set in mtk_jpeg_probe with > > > > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > > > > And the worker is started in > > > > > > > > > > mtk_jpeg_dec_device_run. > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > > > > mtk_jpeg_dec_irq) > > > > > > > > > > which > > > > > > > > > > may cancel the worker. > > > > > > > > > > They are used as IRQ handler function which is > > > > > > > > > > saved as > > > > > > > > > > function > > > > > > > > > > pointer in a variable. > > > > > > > > > > In mtk_jpeg_probe, they are registered by > > > > > > > > > > devm_request_irq: > > > > > > > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > > > > jpeg_irq, > > > > > > > > > > jpeg->variant->irq_handler, > > > > > > > > > > 0, > > > > > > > > > > pdev->name, jpeg); > > > > > > > > > > if (ret) { > > > > > > > > > > dev_err(&pdev->dev, "Failed to request > > > > > > > > > > jpeg_irq > > > > > > > > > > %d > > > > > > > > > > (%d)\n", > > > > > > > > > > jpeg_irq, ret); > > > > > > > > > > return ret; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > However, if we remove the module without triggering > > > > > > > > > > the > > > > > > > > > > irq, > > > > > > > > > > the > > > > > > > > > > worker will never be removed. > > > > > > > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > > > > The schedule invoking is quite complicated. As far > > > > > > > > > > as I > > > > > > > > > > know, > > > > > > > > > > the > > > > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work- > > > > > > > > > > > v4l2_m2m_try_run > > > > > > > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a > > > > > > > > > > worker > > > > > > > > > > which > > > > > > > > > > is > > > > > > > > > > set > > > > > > > > > > in > > > > > > > > > > v4l2_m2m_init and started in > > > > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > > > > > > > Before calling remove function, > > > > > > > > > > the mtk_jpeg_release > > > > > > > > > > was > > > > > > > > > > invoked > > > > > > > > > > to > > > > > > > > > > release the related resource. > > > > > > > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > > > > > > > I think this can not cancel the posted task > > > > > > > > > > mentioned > > > > > > > > > > before. > > > > > > > > > > So > > > > > > > > > > I > > > > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after > > > > > > > > > > freeing it > > > > > > > > > > in > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > > > > { > > > > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > > > > > > > struct mtk_jpeg_ctx *ctx = > > > > > > > > > > mtk_jpeg_fh_to_ctx(file- > > > > > > > > > > > private_data); > > > > > > > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > > > > kfree(ctx); > > > > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > > > > return 0; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx > > > > > > > > > > *m2m_ctx) > > > > > > > > > > { > > > > > > > > > > /* wait until the current context is dequeued > > > > > > > > > > from > > > > > > > > > > job_queue */ > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > Note that all of this is static analysis, which may > > > > > > > > > > be > > > > > > > > > > false > > > > > > > > > > positive. > > > > > > > > > > Feel free to tell me if there is something I've > > > > > > > > > > missed. > > > > > > > > > > > > > > > > > > > > Regard, > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > > > > > > > You set up an application scenario: > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout work > > > > > > > > > has > > > > > > > > > been > > > > > > > > > scheduled. > > > > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg > > > > > > > > > driver, > > > > > > > > > which > > > > > > > > > caused the UAF bug. > > > > > > > > > I wonder if such an irrational application scenario > > > > > > > > > could > > > > > > > > > exist. > > > > > > > > > This > > > > > > > > > scenario, as you described, not only leads to the > > > > > > > > > problems you > > > > > > > > > mentioned, but also to output&capture memory leaks > > > > > > > > > and > > > > > > > > > unreleased > > > > > > > > > resources, such as spinlock. > > > > > > > > > Typically, if we want to remove the driver, we > > > > > > > > > firstly do > > > > > > > > > stop > > > > > > > > > streaming, which ensures that the worker has been > > > > > > > > > canceled. > > > > > > > > > I agree with your changes from the perspective of > > > > > > > > > strengthening > > > > > > > > > the > > > > > > > > > robustness of the driver code. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> > > > > > > > > > > 于2023年3月7日周二 > > > > > > > > > > 18:23写道: > > > > > > > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, > > > > > > > > > > > AngeloGioacchino > > > > > > > > > > > Del > > > > > > > > > > > Regno > > > > > > > > > > > wrote: > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I can > > > > > > > > > > > > > provide > > > > > > > > > > > > > more > > > > > > > > > > > > > details > > > > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please > > > > > > > > > > > > do. > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's > > > > > > > > > > > > doing > > > > > > > > > > > > mtk- > > > > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 > > > > > > > > > > > > > 14:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work > > > > > > > > > > > > > > is > > > > > > > > > > > > > > bound > > > > > > > > > > > > > > with > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called > > > > > > > > > > > > > > to > > > > > > > > > > > > > > start > > > > > > > > > > > > > > the > > > > > > > > > > > > > > work. > > > > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > to make cleanup, there may be a unfinished > > > > > > > > > > > > > > work. The > > > > > > > > > > > > > > possible sequence is as follows, which will > > > > > > > > > > > > > > cause a > > > > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup > > > > > > > > > > > > > > in > > > > > > > > > > > > > > the > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_ > > > > > > > > > > > > > > work > > > > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > > > > | > > > > > > > > > > > > > > | > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv > > > > > > > > > > > > > > | m2m_dev->curr_ctx > > > > > > > > > > > > > > //use > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang < > > > > > > > > > > > > > > zyytlz.wz@163.com> > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jp > > > > > > > > > > > > > > eg_c > > > > > > > > > > > > > > ore.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 +- > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 > > > > > > > > > > > > > > deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > c > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > c > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > c > > > > > > > > > > > > > > +++ > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > c > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct > > > > > > > > > > > > > > platform_device > > > > > > > > > > > > > > *pdev) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > > > > - > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg- > > > > > > > > > > > > > > > job_timeout_work); > > > > > > > > > > > > > > > > > > > > > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > > > > > > > video_unregister_device(jpeg- > > > > > > > > > > > > > > >vdev); > > > > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Hi Kyrie, I think your analysis is right, I'll make the change in the next version. Best regards, Zheng Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年5月12日周五 09:24写道: > > On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote: > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月14日周二 10:49写道: > > > > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote: > > > > Dear Kyrie, > > > > > > > > Guenter gave some advice about this issue in [1]. As we are not > > > > familiar with the code here, could you please see if the bug > > > > really > > > > exists? > > > > > > > > [1] > > > > > https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ > > > > > > > > Best regards, > > > > Zheng > > > > > > Dear zheng, > > > > > > I have no more questions for your patch, thank you for your > > > contribution to mtk jpeg driver. > > > > > > > Thanks for your detailed review and explanation about the code. I > > have > > learned a lot > > from it. Have a nice day :) > > > > Best regards, > > Zheng > Dear Zheng, > > I think that cancel_delayed_work, added in remove function, should > change to cancel_delayed_work_sync to solve UAF. The corresponding > operation flow is shown as follow: > CPU0 CPU1 > |mtk_jpeg_job_timeout_work > mtk_jpeg_remove | > cancel_delayed_work | > v4l2_m2m_release | > kfree(m2m_dev); | > | > | v4l2_m2m_get_curr_priv > | m2m_dev->curr_ctx //use > > If cancel_delayed_work is called in async mode like above, and the > m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this > calling may cause UAF. > > Thanks. > > > > > > > > > > > > > > > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > > > > > > > > > Dear Kyrie, > > > > > > > > > > Sorry for my misunderstanding. I've seen the timeout worker is > > > > > bound > > > > > with mtk_jpegdec_timeout_work rather than > > > > > mtk_jpeg_job_timeout_work.So > > > > > the competition won't happen. > > > > > > > > > > Back to the beginning scene, I still don't know if it's a UAF > > > > > issue > > > > > or > > > > > not. The normal schedule is very strong to me. If we can call > > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release, > > > > > The UAF is still possible. Otherwis, I think it's safe here. > > > > > > > > > > Best regards, > > > > > Zheng > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > > > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > After reviewing the code, I found anothe possible code > > > > > > > path. As > > > > > > > I am > > > > > > > not familiar with the module. It has high possibility it's > > > > > > > wrong. > > > > > > > Could please help me check this? Very much appreciated for > > > > > > > your > > > > > > > valuable time. > > > > > > > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open and > > > > > > > started > > > > > > > in > > > > > > > mtk_jpeg_multicore_dec_device_run, which made it running on > > > > > > > cpu1. > > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will call > > > > > > > schedule_delayed_work to start the timeout_work, which > > > > > > > will > > > > > > > make it > > > > > > > running on cpu2. Meanwhile, we can call > > > > > > > mtk_jpeg_release to cancel the job. But there might be a > > > > > > > race > > > > > > > between > > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > > > > > v4l2_m2m_job_finish too early to wake up the event. > > > > > > > The remove will go on, the other race is as described > > > > > > > earlier. > > > > > > > > > > > > > > cpu0 cpu1 cpu2 > > > > > > > (1)->device_run > > > > > > > mtk_jpeg_multicore > > > > > > > _dec_device_run > > > > > > > queue_work > > > > > > > (jpeg->workqueue, > > > > > > > &ctx->jpeg_work); > > > > > > > (2)mtk_jpegdec_worker > > > > > > > (3)mtk_jpeg_release > > > > > > > v4l2_m2m_cancel_job > > > > > > > wait event > > > > > > > > > > > > > > schedule_delayed_work > > > > > > > (4)mtk_jpeg_job > > > > > > > _tim > > > > > > > eout_w > > > > > > > ork > > > > > > > (5)v4l2_m2m_job_finish > > > > > > > wake up > > > > > > > (6)mtk_jpeg_remove > > > > > > > v4l2_m2m_release > > > > > > > kfree(m2m_dev) > > > > > > > (7)v4l2_m2m_get > > > > > > > _cur > > > > > > > r_priv > > > > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used for > > > > > > multi- > > > > > > hw > > > > > > jpeg decoding. Instead of scheduling > > > > > > mtk_jpeg_job_timeout_work, > > > > > > mtk_jpegdec_worker is scheduled in this function. > > > > > > > > > > > > The mtk_jpeg_dec_device_run function is used for single hw > > > > > > jpeg > > > > > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > > > > > > > > > A driver is either a single hw driver or a multi-hw driver > > > > > > and > > > > > > cannot > > > > > > represent both at the same time. > > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot be > > > > > > scheduled at > > > > > > the same time. > > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish would > > > > > > not > > > > > > cause > > > > > > competition between the mtk_jpegdec_worker and > > > > > > v4l2_m2m_cancel_job. > > > > > > > > > > > > Regards, > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 11:32写道: > > > > > > > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker wrote: > > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > > > Thank you for your careful analysis and response. I > > > > > > > > > still > > > > > > > > > have > > > > > > > > > some > > > > > > > > > areas that I don't quite understand and would like to > > > > > > > > > ask > > > > > > > > > for > > > > > > > > > clarification. That is, how do the function pointers > > > > > > > > > for > > > > > > > > > stop > > > > > > > > > streaming, initialized as mtk_jpeg_enc_stop_streaming > > > > > > > > > and > > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker is > > > > > > > > > canceled? > > > > > > > > > I > > > > > > > > > would greatly appreciate your response. > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Zheng > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > > > For stop streaming, what I mean is that stoppping jpeg > > > > > > > > decoding or > > > > > > > > encoding. > > > > > > > > Ok, let me introduce the sw flow of stop streaming: > > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, > > > > > > > > which > > > > > > > > will > > > > > > > > call > > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you > > > > > > > > note, > > > > > > > > cpu1 is > > > > > > > > running), it will wait event, the event is wake up by > > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish is > > > > > > > > called by > > > > > > > > jpeg > > > > > > > > dec/enc irq handler, which means that the waitting would > > > > > > > > result mtk > > > > > > > > hw > > > > > > > > to finish dec/enc, irq will occur and irq handler would > > > > > > > > cancel > > > > > > > > timeout > > > > > > > > worker. The follow is shown as blow. > > > > > > > > v4l2_m2m_ioctl_streamoff > > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_irq/mt > > > > > > > > k_jp > > > > > > > > eg_dec > > > > > > > > _irq > > > > > > > > wait evnet <------ wake up ------ > > > > > > > > v4l2_m2m_job_finish > > > > > > > > cancel timeout > > > > > > > > work > > > > > > > > > > > > > > > > As mentioned above, if it is normal stop streaming > > > > > > > > action, > > > > > > > > there > > > > > > > > will > > > > > > > > be no happen that the timeout worker does not canceled. > > > > > > > > > > > > > > > > But if mtk_jpeg_remove is called directly without above > > > > > > > > flow, > > > > > > > > it > > > > > > > > would > > > > > > > > cause lots of issues. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > > 10:02写道: > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker > > > > > > > > > > wrote: > > > > > > > > > > > The timer function was set in mtk_jpeg_probe with > > > > > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > > > > > And the worker is started in > > > > > > > > > > > mtk_jpeg_dec_device_run. > > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > > > > > mtk_jpeg_dec_irq) > > > > > > > > > > > which > > > > > > > > > > > may cancel the worker. > > > > > > > > > > > They are used as IRQ handler function which is > > > > > > > > > > > saved as > > > > > > > > > > > function > > > > > > > > > > > pointer in a variable. > > > > > > > > > > > In mtk_jpeg_probe, they are registered by > > > > > > > > > > > devm_request_irq: > > > > > > > > > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > > > > > jpeg_irq, > > > > > > > > > > > jpeg->variant->irq_handler, > > > > > > > > > > > 0, > > > > > > > > > > > pdev->name, jpeg); > > > > > > > > > > > if (ret) { > > > > > > > > > > > dev_err(&pdev->dev, "Failed to request > > > > > > > > > > > jpeg_irq > > > > > > > > > > > %d > > > > > > > > > > > (%d)\n", > > > > > > > > > > > jpeg_irq, ret); > > > > > > > > > > > return ret; > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > However, if we remove the module without triggering > > > > > > > > > > > the > > > > > > > > > > > irq, > > > > > > > > > > > the > > > > > > > > > > > worker will never be removed. > > > > > > > > > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run and > > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > > > > > The schedule invoking is quite complicated. As far > > > > > > > > > > > as I > > > > > > > > > > > know, > > > > > > > > > > > the > > > > > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work- > > > > > > > > > > > > v4l2_m2m_try_run > > > > > > > > > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a > > > > > > > > > > > worker > > > > > > > > > > > which > > > > > > > > > > > is > > > > > > > > > > > set > > > > > > > > > > > in > > > > > > > > > > > v4l2_m2m_init and started in > > > > > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > > > > > > > > > Before calling remove function, > > > > > > > > > > > the mtk_jpeg_release > > > > > > > > > > > was > > > > > > > > > > > invoked > > > > > > > > > > > to > > > > > > > > > > > release the related resource. > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by calling > > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > > > > > > > > > I think this can not cancel the posted task > > > > > > > > > > > mentioned > > > > > > > > > > > before. > > > > > > > > > > > So > > > > > > > > > > > I > > > > > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after > > > > > > > > > > > freeing it > > > > > > > > > > > in > > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > > > > > { > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = video_drvdata(file); > > > > > > > > > > > struct mtk_jpeg_ctx *ctx = > > > > > > > > > > > mtk_jpeg_fh_to_ctx(file- > > > > > > > > > > > > private_data); > > > > > > > > > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > > > > > kfree(ctx); > > > > > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > > > > > return 0; > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx > > > > > > > > > > > *m2m_ctx) > > > > > > > > > > > { > > > > > > > > > > > /* wait until the current context is dequeued > > > > > > > > > > > from > > > > > > > > > > > job_queue */ > > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > Note that all of this is static analysis, which may > > > > > > > > > > > be > > > > > > > > > > > false > > > > > > > > > > > positive. > > > > > > > > > > > Feel free to tell me if there is something I've > > > > > > > > > > > missed. > > > > > > > > > > > > > > > > > > > > > > Regard, > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > > > > > > > > > You set up an application scenario: > > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout work > > > > > > > > > > has > > > > > > > > > > been > > > > > > > > > > scheduled. > > > > > > > > > > At the same time cpu0 wanted to remove the mtk-jpeg > > > > > > > > > > driver, > > > > > > > > > > which > > > > > > > > > > caused the UAF bug. > > > > > > > > > > I wonder if such an irrational application scenario > > > > > > > > > > could > > > > > > > > > > exist. > > > > > > > > > > This > > > > > > > > > > scenario, as you described, not only leads to the > > > > > > > > > > problems you > > > > > > > > > > mentioned, but also to output&capture memory leaks > > > > > > > > > > and > > > > > > > > > > unreleased > > > > > > > > > > resources, such as spinlock. > > > > > > > > > > Typically, if we want to remove the driver, we > > > > > > > > > > firstly do > > > > > > > > > > stop > > > > > > > > > > streaming, which ensures that the worker has been > > > > > > > > > > canceled. > > > > > > > > > > I agree with your changes from the perspective of > > > > > > > > > > strengthening > > > > > > > > > > the > > > > > > > > > > robustness of the driver code. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> > > > > > > > > > > > 于2023年3月7日周二 > > > > > > > > > > > 18:23写道: > > > > > > > > > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, > > > > > > > > > > > > AngeloGioacchino > > > > > > > > > > > > Del > > > > > > > > > > > > Regno > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I can > > > > > > > > > > > > > > provide > > > > > > > > > > > > > > more > > > > > > > > > > > > > > details > > > > > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. Please > > > > > > > > > > > > > do. > > > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: he's > > > > > > > > > > > > > doing > > > > > > > > > > > > > mtk- > > > > > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> 于2023年3月6日周一 > > > > > > > > > > > > > > 14:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg->job_timeout_work > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > bound > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be called > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > start > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > work. > > > > > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > to make cleanup, there may be a unfinished > > > > > > > > > > > > > > > work. The > > > > > > > > > > > > > > > possible sequence is as follows, which will > > > > > > > > > > > > > > > cause a > > > > > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before cleanup > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_timeout_ > > > > > > > > > > > > > > > work > > > > > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv > > > > > > > > > > > > > > > | m2m_dev->curr_ctx > > > > > > > > > > > > > > > //use > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang < > > > > > > > > > > > > > > > zyytlz.wz@163.com> > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mtk_jp > > > > > > > > > > > > > > > eg_c > > > > > > > > > > > > > > > ore.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 +- > > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 > > > > > > > > > > > > > > > deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > +++ > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/mtk_ > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct > > > > > > > > > > > > > > > platform_device > > > > > > > > > > > > > > > *pdev) > > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg- > > > > > > > > > > > > > > > > job_timeout_work); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > pm_runtime_disable(&pdev->dev); > > > > > > > > > > > > > > > video_unregister_device(jpeg- > > > > > > > > > > > > > > > >vdev); > > > > > > > > > > > > > > > v4l2_m2m_release(jpeg->m2m_dev); > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Mon, 2023-05-15 at 10:57 +0800, Zheng Hacker wrote: > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > > Hi Kyrie, > > I think your analysis is right, I'll make the change in the next > version. > > Best regards, > Zheng > Dear Zheng, Could you please update the next version at any convenient time? Thanks. Bset Regards, Kyrie. > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年5月12日周五 09:24写道: > > > > On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote: > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月14日周二 10:49写道: > > > > > > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote: > > > > > Dear Kyrie, > > > > > > > > > > Guenter gave some advice about this issue in [1]. As we are > > > > > not > > > > > familiar with the code here, could you please see if the bug > > > > > really > > > > > exists? > > > > > > > > > > [1] > > > > > > > > > https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ > > > > > > > > > > Best regards, > > > > > Zheng > > > > > > > > Dear zheng, > > > > > > > > I have no more questions for your patch, thank you for your > > > > contribution to mtk jpeg driver. > > > > > > > > > > Thanks for your detailed review and explanation about the code. I > > > have > > > learned a lot > > > from it. Have a nice day :) > > > > > > Best regards, > > > Zheng > > > > Dear Zheng, > > > > I think that cancel_delayed_work, added in remove function, should > > change to cancel_delayed_work_sync to solve UAF. The corresponding > > operation flow is shown as follow: > > CPU0 CPU1 > > |mtk_jpeg_job_timeout_work > > mtk_jpeg_remove | > > cancel_delayed_work | > > v4l2_m2m_release | > > kfree(m2m_dev); | > > | > > | v4l2_m2m_get_curr_priv > > | m2m_dev->curr_ctx //use > > > > If cancel_delayed_work is called in async mode like above, and the > > m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this > > calling may cause UAF. > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > Sorry for my misunderstanding. I've seen the timeout worker > > > > > > is > > > > > > bound > > > > > > with mtk_jpegdec_timeout_work rather than > > > > > > mtk_jpeg_job_timeout_work.So > > > > > > the competition won't happen. > > > > > > > > > > > > Back to the beginning scene, I still don't know if it's a > > > > > > UAF > > > > > > issue > > > > > > or > > > > > > not. The normal schedule is very strong to me. If we can > > > > > > call > > > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release, > > > > > > The UAF is still possible. Otherwis, I think it's safe > > > > > > here. > > > > > > > > > > > > Best regards, > > > > > > Zheng > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > > > > > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > After reviewing the code, I found anothe possible code > > > > > > > > path. As > > > > > > > > I am > > > > > > > > not familiar with the module. It has high possibility > > > > > > > > it's > > > > > > > > wrong. > > > > > > > > Could please help me check this? Very much appreciated > > > > > > > > for > > > > > > > > your > > > > > > > > valuable time. > > > > > > > > > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open > > > > > > > > and > > > > > > > > started > > > > > > > > in > > > > > > > > mtk_jpeg_multicore_dec_device_run, which made it > > > > > > > > running on > > > > > > > > cpu1. > > > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will > > > > > > > > call > > > > > > > > schedule_delayed_work to start the timeout_work, which > > > > > > > > will > > > > > > > > make it > > > > > > > > running on cpu2. Meanwhile, we can call > > > > > > > > mtk_jpeg_release to cancel the job. But there might be > > > > > > > > a > > > > > > > > race > > > > > > > > between > > > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > > > > > > v4l2_m2m_job_finish too early to wake up the event. > > > > > > > > The remove will go on, the other race is as described > > > > > > > > earlier. > > > > > > > > > > > > > > > > cpu0 cpu1 cpu2 > > > > > > > > (1)->device_run > > > > > > > > mtk_jpeg_multicore > > > > > > > > _dec_device_run > > > > > > > > queue_work > > > > > > > > (jpeg->workqueue, > > > > > > > > &ctx->jpeg_work); > > > > > > > > (2)mtk_jpegdec_worker > > > > > > > > (3)mtk_jpeg_release > > > > > > > > v4l2_m2m_cancel_job > > > > > > > > wait event > > > > > > > > > > > > > > > > schedule_delayed_work > > > > > > > > (4)mtk_jpeg > > > > > > > > _job > > > > > > > > _tim > > > > > > > > eout_w > > > > > > > > ork > > > > > > > > (5)v4l2_m2m_job_finish > > > > > > > > wake up > > > > > > > > (6)mtk_jpeg_remove > > > > > > > > v4l2_m2m_release > > > > > > > > kfree(m2m_dev) > > > > > > > > (7)v4l2_m2m > > > > > > > > _get > > > > > > > > _cur > > > > > > > > r_priv > > > > > > > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used > > > > > > > for > > > > > > > multi- > > > > > > > hw > > > > > > > jpeg decoding. Instead of scheduling > > > > > > > mtk_jpeg_job_timeout_work, > > > > > > > mtk_jpegdec_worker is scheduled in this function. > > > > > > > > > > > > > > The mtk_jpeg_dec_device_run function is used for single > > > > > > > hw > > > > > > > jpeg > > > > > > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > > > > > > > > > > > A driver is either a single hw driver or a multi-hw > > > > > > > driver > > > > > > > and > > > > > > > cannot > > > > > > > represent both at the same time. > > > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot > > > > > > > be > > > > > > > scheduled at > > > > > > > the same time. > > > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish > > > > > > > would > > > > > > > not > > > > > > > cause > > > > > > > competition between the mtk_jpegdec_worker and > > > > > > > v4l2_m2m_cancel_job. > > > > > > > > > > > > > > Regards, > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > 11:32写道: > > > > > > > > > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker > > > > > > > > > wrote: > > > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > > > > > Thank you for your careful analysis and response. I > > > > > > > > > > still > > > > > > > > > > have > > > > > > > > > > some > > > > > > > > > > areas that I don't quite understand and would like > > > > > > > > > > to > > > > > > > > > > ask > > > > > > > > > > for > > > > > > > > > > clarification. That is, how do the function > > > > > > > > > > pointers > > > > > > > > > > for > > > > > > > > > > stop > > > > > > > > > > streaming, initialized as > > > > > > > > > > mtk_jpeg_enc_stop_streaming > > > > > > > > > > and > > > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker > > > > > > > > > > is > > > > > > > > > > canceled? > > > > > > > > > > I > > > > > > > > > > would greatly appreciate your response. > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > > > > > For stop streaming, what I mean is that stoppping > > > > > > > > > jpeg > > > > > > > > > decoding or > > > > > > > > > encoding. > > > > > > > > > Ok, let me introduce the sw flow of stop streaming: > > > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, > > > > > > > > > which > > > > > > > > > will > > > > > > > > > call > > > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you > > > > > > > > > note, > > > > > > > > > cpu1 is > > > > > > > > > running), it will wait event, the event is wake up by > > > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish > > > > > > > > > is > > > > > > > > > called by > > > > > > > > > jpeg > > > > > > > > > dec/enc irq handler, which means that the waitting > > > > > > > > > would > > > > > > > > > result mtk > > > > > > > > > hw > > > > > > > > > to finish dec/enc, irq will occur and irq handler > > > > > > > > > would > > > > > > > > > cancel > > > > > > > > > timeout > > > > > > > > > worker. The follow is shown as blow. > > > > > > > > > v4l2_m2m_ioctl_streamoff > > > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_ir > > > > > > > > > q/mt > > > > > > > > > k_jp > > > > > > > > > eg_dec > > > > > > > > > _irq > > > > > > > > > wait evnet <------ wake up ------ > > > > > > > > > v4l2_m2m_job_finish > > > > > > > > > cancel > > > > > > > > > timeout > > > > > > > > > work > > > > > > > > > > > > > > > > > > As mentioned above, if it is normal stop streaming > > > > > > > > > action, > > > > > > > > > there > > > > > > > > > will > > > > > > > > > be no happen that the timeout worker does not > > > > > > > > > canceled. > > > > > > > > > > > > > > > > > > But if mtk_jpeg_remove is called directly without > > > > > > > > > above > > > > > > > > > flow, > > > > > > > > > it > > > > > > > > > would > > > > > > > > > cause lots of issues. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > > > 10:02写道: > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker > > > > > > > > > > > wrote: > > > > > > > > > > > > The timer function was set in mtk_jpeg_probe > > > > > > > > > > > > with > > > > > > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > > > > > > And the worker is started in > > > > > > > > > > > > mtk_jpeg_dec_device_run. > > > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > > > > > > mtk_jpeg_dec_irq) > > > > > > > > > > > > which > > > > > > > > > > > > may cancel the worker. > > > > > > > > > > > > They are used as IRQ handler function which is > > > > > > > > > > > > saved as > > > > > > > > > > > > function > > > > > > > > > > > > pointer in a variable. > > > > > > > > > > > > In mtk_jpeg_probe, they are registered by > > > > > > > > > > > > devm_request_irq: > > > > > > > > > > > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > > > > > > jpeg_irq, > > > > > > > > > > > > jpeg->variant->irq_handler, > > > > > > > > > > > > 0, > > > > > > > > > > > > pdev->name, jpeg); > > > > > > > > > > > > if (ret) { > > > > > > > > > > > > dev_err(&pdev->dev, "Failed to request > > > > > > > > > > > > jpeg_irq > > > > > > > > > > > > %d > > > > > > > > > > > > (%d)\n", > > > > > > > > > > > > jpeg_irq, ret); > > > > > > > > > > > > return ret; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > However, if we remove the module without > > > > > > > > > > > > triggering > > > > > > > > > > > > the > > > > > > > > > > > > irq, > > > > > > > > > > > > the > > > > > > > > > > > > worker will never be removed. > > > > > > > > > > > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run > > > > > > > > > > > > and > > > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > > > > > > The schedule invoking is quite complicated. As > > > > > > > > > > > > far > > > > > > > > > > > > as I > > > > > > > > > > > > know, > > > > > > > > > > > > the > > > > > > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work- > > > > > > > > > > > > > v4l2_m2m_try_run > > > > > > > > > > > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a > > > > > > > > > > > > worker > > > > > > > > > > > > which > > > > > > > > > > > > is > > > > > > > > > > > > set > > > > > > > > > > > > in > > > > > > > > > > > > v4l2_m2m_init and started in > > > > > > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > > > > > > > > > > > Before calling remove function, > > > > > > > > > > > > the mtk_jpeg_release > > > > > > > > > > > > was > > > > > > > > > > > > invoked > > > > > > > > > > > > to > > > > > > > > > > > > release the related resource. > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by > > > > > > > > > > > > calling > > > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > > > > > > > > > > > I think this can not cancel the posted task > > > > > > > > > > > > mentioned > > > > > > > > > > > > before. > > > > > > > > > > > > So > > > > > > > > > > > > I > > > > > > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after > > > > > > > > > > > > freeing it > > > > > > > > > > > > in > > > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > > > > > > { > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > video_drvdata(file); > > > > > > > > > > > > struct mtk_jpeg_ctx *ctx = > > > > > > > > > > > > mtk_jpeg_fh_to_ctx(file- > > > > > > > > > > > > > private_data); > > > > > > > > > > > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > > > > > > kfree(ctx); > > > > > > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > > > > > > return 0; > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx > > > > > > > > > > > > *m2m_ctx) > > > > > > > > > > > > { > > > > > > > > > > > > /* wait until the current context is dequeued > > > > > > > > > > > > from > > > > > > > > > > > > job_queue */ > > > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > Note that all of this is static analysis, which > > > > > > > > > > > > may > > > > > > > > > > > > be > > > > > > > > > > > > false > > > > > > > > > > > > positive. > > > > > > > > > > > > Feel free to tell me if there is something I've > > > > > > > > > > > > missed. > > > > > > > > > > > > > > > > > > > > > > > > Regard, > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > > > > > > > > > > > You set up an application scenario: > > > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout > > > > > > > > > > > work > > > > > > > > > > > has > > > > > > > > > > > been > > > > > > > > > > > scheduled. > > > > > > > > > > > At the same time cpu0 wanted to remove the mtk- > > > > > > > > > > > jpeg > > > > > > > > > > > driver, > > > > > > > > > > > which > > > > > > > > > > > caused the UAF bug. > > > > > > > > > > > I wonder if such an irrational application > > > > > > > > > > > scenario > > > > > > > > > > > could > > > > > > > > > > > exist. > > > > > > > > > > > This > > > > > > > > > > > scenario, as you described, not only leads to the > > > > > > > > > > > problems you > > > > > > > > > > > mentioned, but also to output&capture memory > > > > > > > > > > > leaks > > > > > > > > > > > and > > > > > > > > > > > unreleased > > > > > > > > > > > resources, such as spinlock. > > > > > > > > > > > Typically, if we want to remove the driver, we > > > > > > > > > > > firstly do > > > > > > > > > > > stop > > > > > > > > > > > streaming, which ensures that the worker has been > > > > > > > > > > > canceled. > > > > > > > > > > > I agree with your changes from the perspective of > > > > > > > > > > > strengthening > > > > > > > > > > > the > > > > > > > > > > > robustness of the driver code. > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> > > > > > > > > > > > > 于2023年3月7日周二 > > > > > > > > > > > > 18:23写道: > > > > > > > > > > > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, > > > > > > > > > > > > > AngeloGioacchino > > > > > > > > > > > > > Del > > > > > > > > > > > > > Regno > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > provide > > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > details > > > > > > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. > > > > > > > > > > > > > > Please > > > > > > > > > > > > > > do. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: > > > > > > > > > > > > > > he's > > > > > > > > > > > > > > doing > > > > > > > > > > > > > > mtk- > > > > > > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > > > > > > > 于2023年3月6日周一 > > > > > > > > > > > > > > > 14:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg- > > > > > > > > > > > > > > > > >job_timeout_work > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > bound > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be > > > > > > > > > > > > > > > > called > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > start > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > work. > > > > > > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > to make cleanup, there may be a > > > > > > > > > > > > > > > > unfinished > > > > > > > > > > > > > > > > work. The > > > > > > > > > > > > > > > > possible sequence is as follows, which > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > cause a > > > > > > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before > > > > > > > > > > > > > > > > cleanup > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_time > > > > > > > > > > > > > > > > out_ > > > > > > > > > > > > > > > > work > > > > > > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv > > > > > > > > > > > > > > > > | m2m_dev- > > > > > > > > > > > > > > > > >curr_ctx > > > > > > > > > > > > > > > > //use > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang < > > > > > > > > > > > > > > > > zyytlz.wz@163.com> > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mt > > > > > > > > > > > > > > > > k_jp > > > > > > > > > > > > > > > > eg_c > > > > > > > > > > > > > > > > ore.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 +- > > > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 > > > > > > > > > > > > > > > > deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > +++ > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct > > > > > > > > > > > > > > > > platform_device > > > > > > > > > > > > > > > > *pdev) > > > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg- > > > > > > > > > > > > > > > > > job_timeout_work); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > pm_runtime_disable(&pdev- > > > > > > > > > > > > > > > > >dev); > > > > > > > > > > > > > > > > video_unregister_device(jpeg- > > > > > > > > > > > > > > > > > vdev); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_release(jpeg- > > > > > > > > > > > > > > > > >m2m_dev); > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Hi Kyrie, Thanks for your kind reminder. I'll write the patch as soon as I can. Best regards, Zheng Wang Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年6月15日周四 10:40写道: > > On Mon, 2023-05-15 at 10:57 +0800, Zheng Hacker wrote: > > External email : Please do not click links or open attachments until > > you have verified the sender or the content. > > > > > > Hi Kyrie, > > > > I think your analysis is right, I'll make the change in the next > > version. > > > > Best regards, > > Zheng > > > Dear Zheng, > > Could you please update the next version at any convenient time? > > Thanks. > > Bset Regards, > Kyrie. > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年5月12日周五 09:24写道: > > > > > > On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote: > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月14日周二 10:49写道: > > > > > > > > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote: > > > > > > Dear Kyrie, > > > > > > > > > > > > Guenter gave some advice about this issue in [1]. As we are > > > > > > not > > > > > > familiar with the code here, could you please see if the bug > > > > > > really > > > > > > exists? > > > > > > > > > > > > [1] > > > > > > > > > > > > > https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ > > > > > > > > > > > > Best regards, > > > > > > Zheng > > > > > > > > > > Dear zheng, > > > > > > > > > > I have no more questions for your patch, thank you for your > > > > > contribution to mtk jpeg driver. > > > > > > > > > > > > > Thanks for your detailed review and explanation about the code. I > > > > have > > > > learned a lot > > > > from it. Have a nice day :) > > > > > > > > Best regards, > > > > Zheng > > > > > > Dear Zheng, > > > > > > I think that cancel_delayed_work, added in remove function, should > > > change to cancel_delayed_work_sync to solve UAF. The corresponding > > > operation flow is shown as follow: > > > CPU0 CPU1 > > > |mtk_jpeg_job_timeout_work > > > mtk_jpeg_remove | > > > cancel_delayed_work | > > > v4l2_m2m_release | > > > kfree(m2m_dev); | > > > | > > > | v4l2_m2m_get_curr_priv > > > | m2m_dev->curr_ctx //use > > > > > > If cancel_delayed_work is called in async mode like above, and the > > > m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this > > > calling may cause UAF. > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > Sorry for my misunderstanding. I've seen the timeout worker > > > > > > > is > > > > > > > bound > > > > > > > with mtk_jpegdec_timeout_work rather than > > > > > > > mtk_jpeg_job_timeout_work.So > > > > > > > the competition won't happen. > > > > > > > > > > > > > > Back to the beginning scene, I still don't know if it's a > > > > > > > UAF > > > > > > > issue > > > > > > > or > > > > > > > not. The normal schedule is very strong to me. If we can > > > > > > > call > > > > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release, > > > > > > > The UAF is still possible. Otherwis, I think it's safe > > > > > > > here. > > > > > > > > > > > > > > Best regards, > > > > > > > Zheng > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > > > > > > > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > > > After reviewing the code, I found anothe possible code > > > > > > > > > path. As > > > > > > > > > I am > > > > > > > > > not familiar with the module. It has high possibility > > > > > > > > > it's > > > > > > > > > wrong. > > > > > > > > > Could please help me check this? Very much appreciated > > > > > > > > > for > > > > > > > > > your > > > > > > > > > valuable time. > > > > > > > > > > > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open > > > > > > > > > and > > > > > > > > > started > > > > > > > > > in > > > > > > > > > mtk_jpeg_multicore_dec_device_run, which made it > > > > > > > > > running on > > > > > > > > > cpu1. > > > > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will > > > > > > > > > call > > > > > > > > > schedule_delayed_work to start the timeout_work, which > > > > > > > > > will > > > > > > > > > make it > > > > > > > > > running on cpu2. Meanwhile, we can call > > > > > > > > > mtk_jpeg_release to cancel the job. But there might be > > > > > > > > > a > > > > > > > > > race > > > > > > > > > between > > > > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > > > > > > > v4l2_m2m_job_finish too early to wake up the event. > > > > > > > > > The remove will go on, the other race is as described > > > > > > > > > earlier. > > > > > > > > > > > > > > > > > > cpu0 cpu1 cpu2 > > > > > > > > > (1)->device_run > > > > > > > > > mtk_jpeg_multicore > > > > > > > > > _dec_device_run > > > > > > > > > queue_work > > > > > > > > > (jpeg->workqueue, > > > > > > > > > &ctx->jpeg_work); > > > > > > > > > (2)mtk_jpegdec_worker > > > > > > > > > (3)mtk_jpeg_release > > > > > > > > > v4l2_m2m_cancel_job > > > > > > > > > wait event > > > > > > > > > > > > > > > > > > schedule_delayed_work > > > > > > > > > (4)mtk_jpeg > > > > > > > > > _job > > > > > > > > > _tim > > > > > > > > > eout_w > > > > > > > > > ork > > > > > > > > > (5)v4l2_m2m_job_finish > > > > > > > > > wake up > > > > > > > > > (6)mtk_jpeg_remove > > > > > > > > > v4l2_m2m_release > > > > > > > > > kfree(m2m_dev) > > > > > > > > > (7)v4l2_m2m > > > > > > > > > _get > > > > > > > > > _cur > > > > > > > > > r_priv > > > > > > > > > > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used > > > > > > > > for > > > > > > > > multi- > > > > > > > > hw > > > > > > > > jpeg decoding. Instead of scheduling > > > > > > > > mtk_jpeg_job_timeout_work, > > > > > > > > mtk_jpegdec_worker is scheduled in this function. > > > > > > > > > > > > > > > > The mtk_jpeg_dec_device_run function is used for single > > > > > > > > hw > > > > > > > > jpeg > > > > > > > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > > > > > > > > > > > > > A driver is either a single hw driver or a multi-hw > > > > > > > > driver > > > > > > > > and > > > > > > > > cannot > > > > > > > > represent both at the same time. > > > > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot > > > > > > > > be > > > > > > > > scheduled at > > > > > > > > the same time. > > > > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish > > > > > > > > would > > > > > > > > not > > > > > > > > cause > > > > > > > > competition between the mtk_jpegdec_worker and > > > > > > > > v4l2_m2m_cancel_job. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > > 11:32写道: > > > > > > > > > > > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker > > > > > > > > > > wrote: > > > > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > > > > > > > Thank you for your careful analysis and response. I > > > > > > > > > > > still > > > > > > > > > > > have > > > > > > > > > > > some > > > > > > > > > > > areas that I don't quite understand and would like > > > > > > > > > > > to > > > > > > > > > > > ask > > > > > > > > > > > for > > > > > > > > > > > clarification. That is, how do the function > > > > > > > > > > > pointers > > > > > > > > > > > for > > > > > > > > > > > stop > > > > > > > > > > > streaming, initialized as > > > > > > > > > > > mtk_jpeg_enc_stop_streaming > > > > > > > > > > > and > > > > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker > > > > > > > > > > > is > > > > > > > > > > > canceled? > > > > > > > > > > > I > > > > > > > > > > > would greatly appreciate your response. > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > > > > > > > For stop streaming, what I mean is that stoppping > > > > > > > > > > jpeg > > > > > > > > > > decoding or > > > > > > > > > > encoding. > > > > > > > > > > Ok, let me introduce the sw flow of stop streaming: > > > > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, > > > > > > > > > > which > > > > > > > > > > will > > > > > > > > > > call > > > > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you > > > > > > > > > > note, > > > > > > > > > > cpu1 is > > > > > > > > > > running), it will wait event, the event is wake up by > > > > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish > > > > > > > > > > is > > > > > > > > > > called by > > > > > > > > > > jpeg > > > > > > > > > > dec/enc irq handler, which means that the waitting > > > > > > > > > > would > > > > > > > > > > result mtk > > > > > > > > > > hw > > > > > > > > > > to finish dec/enc, irq will occur and irq handler > > > > > > > > > > would > > > > > > > > > > cancel > > > > > > > > > > timeout > > > > > > > > > > worker. The follow is shown as blow. > > > > > > > > > > v4l2_m2m_ioctl_streamoff > > > > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_ir > > > > > > > > > > q/mt > > > > > > > > > > k_jp > > > > > > > > > > eg_dec > > > > > > > > > > _irq > > > > > > > > > > wait evnet <------ wake up ------ > > > > > > > > > > v4l2_m2m_job_finish > > > > > > > > > > cancel > > > > > > > > > > timeout > > > > > > > > > > work > > > > > > > > > > > > > > > > > > > > As mentioned above, if it is normal stop streaming > > > > > > > > > > action, > > > > > > > > > > there > > > > > > > > > > will > > > > > > > > > > be no happen that the timeout worker does not > > > > > > > > > > canceled. > > > > > > > > > > > > > > > > > > > > But if mtk_jpeg_remove is called directly without > > > > > > > > > > above > > > > > > > > > > flow, > > > > > > > > > > it > > > > > > > > > > would > > > > > > > > > > cause lots of issues. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > > > > 10:02写道: > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker > > > > > > > > > > > > wrote: > > > > > > > > > > > > > The timer function was set in mtk_jpeg_probe > > > > > > > > > > > > > with > > > > > > > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > > > > > > > And the worker is started in > > > > > > > > > > > > > mtk_jpeg_dec_device_run. > > > > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > > > > > > > mtk_jpeg_dec_irq) > > > > > > > > > > > > > which > > > > > > > > > > > > > may cancel the worker. > > > > > > > > > > > > > They are used as IRQ handler function which is > > > > > > > > > > > > > saved as > > > > > > > > > > > > > function > > > > > > > > > > > > > pointer in a variable. > > > > > > > > > > > > > In mtk_jpeg_probe, they are registered by > > > > > > > > > > > > > devm_request_irq: > > > > > > > > > > > > > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > > > > > > > jpeg_irq, > > > > > > > > > > > > > jpeg->variant->irq_handler, > > > > > > > > > > > > > 0, > > > > > > > > > > > > > pdev->name, jpeg); > > > > > > > > > > > > > if (ret) { > > > > > > > > > > > > > dev_err(&pdev->dev, "Failed to request > > > > > > > > > > > > > jpeg_irq > > > > > > > > > > > > > %d > > > > > > > > > > > > > (%d)\n", > > > > > > > > > > > > > jpeg_irq, ret); > > > > > > > > > > > > > return ret; > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > However, if we remove the module without > > > > > > > > > > > > > triggering > > > > > > > > > > > > > the > > > > > > > > > > > > > irq, > > > > > > > > > > > > > the > > > > > > > > > > > > > worker will never be removed. > > > > > > > > > > > > > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run > > > > > > > > > > > > > and > > > > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > > > > > > > The schedule invoking is quite complicated. As > > > > > > > > > > > > > far > > > > > > > > > > > > > as I > > > > > > > > > > > > > know, > > > > > > > > > > > > > the > > > > > > > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work- > > > > > > > > > > > > > > v4l2_m2m_try_run > > > > > > > > > > > > > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a > > > > > > > > > > > > > worker > > > > > > > > > > > > > which > > > > > > > > > > > > > is > > > > > > > > > > > > > set > > > > > > > > > > > > > in > > > > > > > > > > > > > v4l2_m2m_init and started in > > > > > > > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > > > > > > > > > > > > > Before calling remove function, > > > > > > > > > > > > > the mtk_jpeg_release > > > > > > > > > > > > > was > > > > > > > > > > > > > invoked > > > > > > > > > > > > > to > > > > > > > > > > > > > release the related resource. > > > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by > > > > > > > > > > > > > calling > > > > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > > > > > > > > > > > > > I think this can not cancel the posted task > > > > > > > > > > > > > mentioned > > > > > > > > > > > > > before. > > > > > > > > > > > > > So > > > > > > > > > > > > > I > > > > > > > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after > > > > > > > > > > > > > freeing it > > > > > > > > > > > > > in > > > > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > > > > > > > { > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > video_drvdata(file); > > > > > > > > > > > > > struct mtk_jpeg_ctx *ctx = > > > > > > > > > > > > > mtk_jpeg_fh_to_ctx(file- > > > > > > > > > > > > > > private_data); > > > > > > > > > > > > > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > > > > > > > kfree(ctx); > > > > > > > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > > > > > > > return 0; > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx > > > > > > > > > > > > > *m2m_ctx) > > > > > > > > > > > > > { > > > > > > > > > > > > > /* wait until the current context is dequeued > > > > > > > > > > > > > from > > > > > > > > > > > > > job_queue */ > > > > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > Note that all of this is static analysis, which > > > > > > > > > > > > > may > > > > > > > > > > > > > be > > > > > > > > > > > > > false > > > > > > > > > > > > > positive. > > > > > > > > > > > > > Feel free to tell me if there is something I've > > > > > > > > > > > > > missed. > > > > > > > > > > > > > > > > > > > > > > > > > > Regard, > > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > > > > > > > > > > > > > You set up an application scenario: > > > > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout > > > > > > > > > > > > work > > > > > > > > > > > > has > > > > > > > > > > > > been > > > > > > > > > > > > scheduled. > > > > > > > > > > > > At the same time cpu0 wanted to remove the mtk- > > > > > > > > > > > > jpeg > > > > > > > > > > > > driver, > > > > > > > > > > > > which > > > > > > > > > > > > caused the UAF bug. > > > > > > > > > > > > I wonder if such an irrational application > > > > > > > > > > > > scenario > > > > > > > > > > > > could > > > > > > > > > > > > exist. > > > > > > > > > > > > This > > > > > > > > > > > > scenario, as you described, not only leads to the > > > > > > > > > > > > problems you > > > > > > > > > > > > mentioned, but also to output&capture memory > > > > > > > > > > > > leaks > > > > > > > > > > > > and > > > > > > > > > > > > unreleased > > > > > > > > > > > > resources, such as spinlock. > > > > > > > > > > > > Typically, if we want to remove the driver, we > > > > > > > > > > > > firstly do > > > > > > > > > > > > stop > > > > > > > > > > > > streaming, which ensures that the worker has been > > > > > > > > > > > > canceled. > > > > > > > > > > > > I agree with your changes from the perspective of > > > > > > > > > > > > strengthening > > > > > > > > > > > > the > > > > > > > > > > > > robustness of the driver code. > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> > > > > > > > > > > > > > 于2023年3月7日周二 > > > > > > > > > > > > > 18:23写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, > > > > > > > > > > > > > > AngeloGioacchino > > > > > > > > > > > > > > Del > > > > > > > > > > > > > > Regno > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > provide > > > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > > details > > > > > > > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. > > > > > > > > > > > > > > > Please > > > > > > > > > > > > > > > do. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: > > > > > > > > > > > > > > > he's > > > > > > > > > > > > > > > doing > > > > > > > > > > > > > > > mtk- > > > > > > > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > > > > > > > > 于2023年3月6日周一 > > > > > > > > > > > > > > > > 14:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg- > > > > > > > > > > > > > > > > > >job_timeout_work > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > bound > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be > > > > > > > > > > > > > > > > > called > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > start > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > work. > > > > > > > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > to make cleanup, there may be a > > > > > > > > > > > > > > > > > unfinished > > > > > > > > > > > > > > > > > work. The > > > > > > > > > > > > > > > > > possible sequence is as follows, which > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > cause a > > > > > > > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before > > > > > > > > > > > > > > > > > cleanup > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_time > > > > > > > > > > > > > > > > > out_ > > > > > > > > > > > > > > > > > work > > > > > > > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv > > > > > > > > > > > > > > > > > | m2m_dev- > > > > > > > > > > > > > > > > > >curr_ctx > > > > > > > > > > > > > > > > > //use > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang < > > > > > > > > > > > > > > > > > zyytlz.wz@163.com> > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mt > > > > > > > > > > > > > > > > > k_jp > > > > > > > > > > > > > > > > > eg_c > > > > > > > > > > > > > > > > > ore.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 +- > > > > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 > > > > > > > > > > > > > > > > > deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > +++ > > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct > > > > > > > > > > > > > > > > > platform_device > > > > > > > > > > > > > > > > > *pdev) > > > > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg- > > > > > > > > > > > > > > > > > > job_timeout_work); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > pm_runtime_disable(&pdev- > > > > > > > > > > > > > > > > > >dev); > > > > > > > > > > > > > > > > > video_unregister_device(jpeg- > > > > > > > > > > > > > > > > > > vdev); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_release(jpeg- > > > > > > > > > > > > > > > > > >m2m_dev); > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Hello, I sent the patch-v2, but found some mail was rejected due to Network error. Here is the newest patch link: https://lore.kernel.org/all/20230624145050.2471885-1-zyytlz.wz@163.com/ Zheng Hacker <hackerzheng666@gmail.com> 于2023年6月15日周四 10:58写道: > > Hi Kyrie, > > Thanks for your kind reminder. I'll write the patch as soon as I can. > > Best regards, > Zheng Wang > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年6月15日周四 10:40写道: > > > > On Mon, 2023-05-15 at 10:57 +0800, Zheng Hacker wrote: > > > External email : Please do not click links or open attachments until > > > you have verified the sender or the content. > > > > > > > > > Hi Kyrie, > > > > > > I think your analysis is right, I'll make the change in the next > > > version. > > > > > > Best regards, > > > Zheng > > > > > Dear Zheng, > > > > Could you please update the next version at any convenient time? > > > > Thanks. > > > > Bset Regards, > > Kyrie. > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年5月12日周五 09:24写道: > > > > > > > > On Tue, 2023-03-14 at 11:37 +0800, Zheng Hacker wrote: > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月14日周二 10:49写道: > > > > > > > > > > > > On Thu, 2023-03-09 at 15:31 +0800, Zheng Hacker wrote: > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > Guenter gave some advice about this issue in [1]. As we are > > > > > > > not > > > > > > > familiar with the code here, could you please see if the bug > > > > > > > really > > > > > > > exists? > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > https://lore.kernel.org/all/e03134f9-9433-ab6b-170a-8ce752fccdeb@roeck-us.net/ > > > > > > > > > > > > > > Best regards, > > > > > > > Zheng > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > I have no more questions for your patch, thank you for your > > > > > > contribution to mtk jpeg driver. > > > > > > > > > > > > > > > > Thanks for your detailed review and explanation about the code. I > > > > > have > > > > > learned a lot > > > > > from it. Have a nice day :) > > > > > > > > > > Best regards, > > > > > Zheng > > > > > > > > Dear Zheng, > > > > > > > > I think that cancel_delayed_work, added in remove function, should > > > > change to cancel_delayed_work_sync to solve UAF. The corresponding > > > > operation flow is shown as follow: > > > > CPU0 CPU1 > > > > |mtk_jpeg_job_timeout_work > > > > mtk_jpeg_remove | > > > > cancel_delayed_work | > > > > v4l2_m2m_release | > > > > kfree(m2m_dev); | > > > > | > > > > | v4l2_m2m_get_curr_priv > > > > | m2m_dev->curr_ctx //use > > > > > > > > If cancel_delayed_work is called in async mode like above, and the > > > > m2m_dev is freed. But it is used in mtk_jpeg_job_timeout_work, this > > > > calling may cause UAF. > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zheng Hacker <hackerzheng666@gmail.com> 于2023年3月8日周三 17:11写道: > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > Sorry for my misunderstanding. I've seen the timeout worker > > > > > > > > is > > > > > > > > bound > > > > > > > > with mtk_jpegdec_timeout_work rather than > > > > > > > > mtk_jpeg_job_timeout_work.So > > > > > > > > the competition won't happen. > > > > > > > > > > > > > > > > Back to the beginning scene, I still don't know if it's a > > > > > > > > UAF > > > > > > > > issue > > > > > > > > or > > > > > > > > not. The normal schedule is very strong to me. If we can > > > > > > > > call > > > > > > > > mtk_jpeg_remove directly without calling mtk_jpeg_release, > > > > > > > > The UAF is still possible. Otherwis, I think it's safe > > > > > > > > here. > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Zheng > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 15:39写道: > > > > > > > > > > > > > > > > > > On Wed, 2023-03-08 at 14:10 +0800, Zheng Hacker wrote: > > > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > > > > > After reviewing the code, I found anothe possible code > > > > > > > > > > path. As > > > > > > > > > > I am > > > > > > > > > > not familiar with the module. It has high possibility > > > > > > > > > > it's > > > > > > > > > > wrong. > > > > > > > > > > Could please help me check this? Very much appreciated > > > > > > > > > > for > > > > > > > > > > your > > > > > > > > > > valuable time. > > > > > > > > > > > > > > > > > > > > In summary, mtk_jpegdec_worker was set in mtk_jpeg_open > > > > > > > > > > and > > > > > > > > > > started > > > > > > > > > > in > > > > > > > > > > mtk_jpeg_multicore_dec_device_run, which made it > > > > > > > > > > running on > > > > > > > > > > cpu1. > > > > > > > > > > Inside the mtk_jpeg_multicore_dec_device_run, it will > > > > > > > > > > call > > > > > > > > > > schedule_delayed_work to start the timeout_work, which > > > > > > > > > > will > > > > > > > > > > make it > > > > > > > > > > running on cpu2. Meanwhile, we can call > > > > > > > > > > mtk_jpeg_release to cancel the job. But there might be > > > > > > > > > > a > > > > > > > > > > race > > > > > > > > > > between > > > > > > > > > > mtk_jpegdec_worker and v4l2_m2m_cancel_job. It may call > > > > > > > > > > v4l2_m2m_job_finish too early to wake up the event. > > > > > > > > > > The remove will go on, the other race is as described > > > > > > > > > > earlier. > > > > > > > > > > > > > > > > > > > > cpu0 cpu1 cpu2 > > > > > > > > > > (1)->device_run > > > > > > > > > > mtk_jpeg_multicore > > > > > > > > > > _dec_device_run > > > > > > > > > > queue_work > > > > > > > > > > (jpeg->workqueue, > > > > > > > > > > &ctx->jpeg_work); > > > > > > > > > > (2)mtk_jpegdec_worker > > > > > > > > > > (3)mtk_jpeg_release > > > > > > > > > > v4l2_m2m_cancel_job > > > > > > > > > > wait event > > > > > > > > > > > > > > > > > > > > schedule_delayed_work > > > > > > > > > > (4)mtk_jpeg > > > > > > > > > > _job > > > > > > > > > > _tim > > > > > > > > > > eout_w > > > > > > > > > > ork > > > > > > > > > > (5)v4l2_m2m_job_finish > > > > > > > > > > wake up > > > > > > > > > > (6)mtk_jpeg_remove > > > > > > > > > > v4l2_m2m_release > > > > > > > > > > kfree(m2m_dev) > > > > > > > > > > (7)v4l2_m2m > > > > > > > > > > _get > > > > > > > > > > _cur > > > > > > > > > > r_priv > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > > > > > The mtk_jpeg_multicore_dec_device_run function is used > > > > > > > > > for > > > > > > > > > multi- > > > > > > > > > hw > > > > > > > > > jpeg decoding. Instead of scheduling > > > > > > > > > mtk_jpeg_job_timeout_work, > > > > > > > > > mtk_jpegdec_worker is scheduled in this function. > > > > > > > > > > > > > > > > > > The mtk_jpeg_dec_device_run function is used for single > > > > > > > > > hw > > > > > > > > > jpeg > > > > > > > > > decoding, which schedules mtk_jpeg_job_timeout_work. > > > > > > > > > > > > > > > > > > A driver is either a single hw driver or a multi-hw > > > > > > > > > driver > > > > > > > > > and > > > > > > > > > cannot > > > > > > > > > represent both at the same time. > > > > > > > > > mtk_jpeg_job_timeout_work and mtk_jpegdec_worker cannot > > > > > > > > > be > > > > > > > > > scheduled at > > > > > > > > > the same time. > > > > > > > > > So mtk_jpeg_job_timeout_work calls v4l2_m2m_job_finish > > > > > > > > > would > > > > > > > > > not > > > > > > > > > cause > > > > > > > > > competition between the mtk_jpegdec_worker and > > > > > > > > > v4l2_m2m_cancel_job. > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > > > 11:32写道: > > > > > > > > > > > > > > > > > > > > > > On Wed, 2023-03-08 at 10:20 +0800, Zheng Hacker > > > > > > > > > > > wrote: > > > > > > > > > > > > Hi Kyrie, > > > > > > > > > > > > > > > > > > > > > > > > Thank you for your careful analysis and response. I > > > > > > > > > > > > still > > > > > > > > > > > > have > > > > > > > > > > > > some > > > > > > > > > > > > areas that I don't quite understand and would like > > > > > > > > > > > > to > > > > > > > > > > > > ask > > > > > > > > > > > > for > > > > > > > > > > > > clarification. That is, how do the function > > > > > > > > > > > > pointers > > > > > > > > > > > > for > > > > > > > > > > > > stop > > > > > > > > > > > > streaming, initialized as > > > > > > > > > > > > mtk_jpeg_enc_stop_streaming > > > > > > > > > > > > and > > > > > > > > > > > > mtk_jpeg_dec_stop_streaming, ensure that the worker > > > > > > > > > > > > is > > > > > > > > > > > > canceled? > > > > > > > > > > > > I > > > > > > > > > > > > would greatly appreciate your response. > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > Dear zheng, > > > > > > > > > > > > > > > > > > > > > > For stop streaming, what I mean is that stoppping > > > > > > > > > > > jpeg > > > > > > > > > > > decoding or > > > > > > > > > > > encoding. > > > > > > > > > > > Ok, let me introduce the sw flow of stop streaming: > > > > > > > > > > > Firstly, the app will call v4l2_m2m_ioctl_streamoff, > > > > > > > > > > > which > > > > > > > > > > > will > > > > > > > > > > > call > > > > > > > > > > > v4l2_m2m_cancel_job, if it finds a job running(as you > > > > > > > > > > > note, > > > > > > > > > > > cpu1 is > > > > > > > > > > > running), it will wait event, the event is wake up by > > > > > > > > > > > v4l2_m2m_job_finish function. And v4l2_m2m_job_finish > > > > > > > > > > > is > > > > > > > > > > > called by > > > > > > > > > > > jpeg > > > > > > > > > > > dec/enc irq handler, which means that the waitting > > > > > > > > > > > would > > > > > > > > > > > result mtk > > > > > > > > > > > hw > > > > > > > > > > > to finish dec/enc, irq will occur and irq handler > > > > > > > > > > > would > > > > > > > > > > > cancel > > > > > > > > > > > timeout > > > > > > > > > > > worker. The follow is shown as blow. > > > > > > > > > > > v4l2_m2m_ioctl_streamoff > > > > > > > > > > > v4l2_m2m_cancel_job mtk_jpeg_enc_ir > > > > > > > > > > > q/mt > > > > > > > > > > > k_jp > > > > > > > > > > > eg_dec > > > > > > > > > > > _irq > > > > > > > > > > > wait evnet <------ wake up ------ > > > > > > > > > > > v4l2_m2m_job_finish > > > > > > > > > > > cancel > > > > > > > > > > > timeout > > > > > > > > > > > work > > > > > > > > > > > > > > > > > > > > > > As mentioned above, if it is normal stop streaming > > > > > > > > > > > action, > > > > > > > > > > > there > > > > > > > > > > > will > > > > > > > > > > > be no happen that the timeout worker does not > > > > > > > > > > > canceled. > > > > > > > > > > > > > > > > > > > > > > But if mtk_jpeg_remove is called directly without > > > > > > > > > > > above > > > > > > > > > > > flow, > > > > > > > > > > > it > > > > > > > > > > > would > > > > > > > > > > > cause lots of issues. > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > > > Kyrie Wu (吴晗) <Kyrie.Wu@mediatek.com> 于2023年3月8日周三 > > > > > > > > > > > > 10:02写道: > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 23:03 +0800, Zheng Hacker > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > The timer function was set in mtk_jpeg_probe > > > > > > > > > > > > > > with > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work function. > > > > > > > > > > > > > > And the worker is started in > > > > > > > > > > > > > > mtk_jpeg_dec_device_run. > > > > > > > > > > > > > > There are two functions (mtk_jpeg_enc_irq and > > > > > > > > > > > > > > mtk_jpeg_dec_irq) > > > > > > > > > > > > > > which > > > > > > > > > > > > > > may cancel the worker. > > > > > > > > > > > > > > They are used as IRQ handler function which is > > > > > > > > > > > > > > saved as > > > > > > > > > > > > > > function > > > > > > > > > > > > > > pointer in a variable. > > > > > > > > > > > > > > In mtk_jpeg_probe, they are registered by > > > > > > > > > > > > > > devm_request_irq: > > > > > > > > > > > > > > > > > > > > > > > > > > > > ret = devm_request_irq(&pdev->dev, > > > > > > > > > > > > > > jpeg_irq, > > > > > > > > > > > > > > jpeg->variant->irq_handler, > > > > > > > > > > > > > > 0, > > > > > > > > > > > > > > pdev->name, jpeg); > > > > > > > > > > > > > > if (ret) { > > > > > > > > > > > > > > dev_err(&pdev->dev, "Failed to request > > > > > > > > > > > > > > jpeg_irq > > > > > > > > > > > > > > %d > > > > > > > > > > > > > > (%d)\n", > > > > > > > > > > > > > > jpeg_irq, ret); > > > > > > > > > > > > > > return ret; > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, if we remove the module without > > > > > > > > > > > > > > triggering > > > > > > > > > > > > > > the > > > > > > > > > > > > > > irq, > > > > > > > > > > > > > > the > > > > > > > > > > > > > > worker will never be removed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > As for the schedule, mtk_jpeg_dec_device_run > > > > > > > > > > > > > > and > > > > > > > > > > > > > > mtk_jpeg_enc_device_run will start the worker. > > > > > > > > > > > > > > The schedule invoking is quite complicated. As > > > > > > > > > > > > > > far > > > > > > > > > > > > > > as I > > > > > > > > > > > > > > know, > > > > > > > > > > > > > > the > > > > > > > > > > > > > > invoking chain is as follows: > > > > > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_init->v4l2_m2m_device_run_work- > > > > > > > > > > > > > > > v4l2_m2m_try_run > > > > > > > > > > > > > > > > > > > > > > > > > > > > the v4l2_m2m_device_run_work function is also a > > > > > > > > > > > > > > worker > > > > > > > > > > > > > > which > > > > > > > > > > > > > > is > > > > > > > > > > > > > > set > > > > > > > > > > > > > > in > > > > > > > > > > > > > > v4l2_m2m_init and started in > > > > > > > > > > > > > > v4l2_m2m_schedule_next_job. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Before calling remove function, > > > > > > > > > > > > > > the mtk_jpeg_release > > > > > > > > > > > > > > was > > > > > > > > > > > > > > invoked > > > > > > > > > > > > > > to > > > > > > > > > > > > > > release the related resource. > > > > > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_cancel_job will cancel the job by > > > > > > > > > > > > > > calling > > > > > > > > > > > > > > m2m_dev->m2m_ops->job_abort(m2m_ctx->priv). > > > > > > > > > > > > > > > > > > > > > > > > > > > > But this will only cancel the current queue by > > > > > > > > > > > > > > list_del(&m2m_dev->curr_ctx->queue); > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this can not cancel the posted task > > > > > > > > > > > > > > mentioned > > > > > > > > > > > > > > before. > > > > > > > > > > > > > > So > > > > > > > > > > > > > > I > > > > > > > > > > > > > > think if mtk_jpeg_job_timeout_work > > > > > > > > > > > > > > > > > > > > > > > > > > > > is working on, and using jpeg->m2m_dev after > > > > > > > > > > > > > > freeing it > > > > > > > > > > > > > > in > > > > > > > > > > > > > > mtk_jpeg_remove, it will cause a UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > static int mtk_jpeg_release(struct file *file) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > > video_drvdata(file); > > > > > > > > > > > > > > struct mtk_jpeg_ctx *ctx = > > > > > > > > > > > > > > mtk_jpeg_fh_to_ctx(file- > > > > > > > > > > > > > > > private_data); > > > > > > > > > > > > > > > > > > > > > > > > > > > > mutex_lock(&jpeg->lock); > > > > > > > > > > > > > > v4l2_ctrl_handler_free(&ctx->ctrl_hdl); > > > > > > > > > > > > > > [1] v4l2_m2m_ctx_release(ctx->fh.m2m_ctx); > > > > > > > > > > > > > > v4l2_fh_del(&ctx->fh); > > > > > > > > > > > > > > v4l2_fh_exit(&ctx->fh); > > > > > > > > > > > > > > kfree(ctx); > > > > > > > > > > > > > > mutex_unlock(&jpeg->lock); > > > > > > > > > > > > > > return 0; > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > void v4l2_m2m_ctx_release(struct v4l2_m2m_ctx > > > > > > > > > > > > > > *m2m_ctx) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > /* wait until the current context is dequeued > > > > > > > > > > > > > > from > > > > > > > > > > > > > > job_queue */ > > > > > > > > > > > > > > [2] v4l2_m2m_cancel_job(m2m_ctx); > > > > > > > > > > > > > > > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->cap_q_ctx.q); > > > > > > > > > > > > > > vb2_queue_release(&m2m_ctx->out_q_ctx.q); > > > > > > > > > > > > > > > > > > > > > > > > > > > > kfree(m2m_ctx); > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > Note that all of this is static analysis, which > > > > > > > > > > > > > > may > > > > > > > > > > > > > > be > > > > > > > > > > > > > > false > > > > > > > > > > > > > > positive. > > > > > > > > > > > > > > Feel free to tell me if there is something I've > > > > > > > > > > > > > > missed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regard, > > > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > > > Dear Zheng, > > > > > > > > > > > > > > > > > > > > > > > > > > You set up an application scenario: > > > > > > > > > > > > > cpu1 is using the mtk-jpeg driver and timeout > > > > > > > > > > > > > work > > > > > > > > > > > > > has > > > > > > > > > > > > > been > > > > > > > > > > > > > scheduled. > > > > > > > > > > > > > At the same time cpu0 wanted to remove the mtk- > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > driver, > > > > > > > > > > > > > which > > > > > > > > > > > > > caused the UAF bug. > > > > > > > > > > > > > I wonder if such an irrational application > > > > > > > > > > > > > scenario > > > > > > > > > > > > > could > > > > > > > > > > > > > exist. > > > > > > > > > > > > > This > > > > > > > > > > > > > scenario, as you described, not only leads to the > > > > > > > > > > > > > problems you > > > > > > > > > > > > > mentioned, but also to output&capture memory > > > > > > > > > > > > > leaks > > > > > > > > > > > > > and > > > > > > > > > > > > > unreleased > > > > > > > > > > > > > resources, such as spinlock. > > > > > > > > > > > > > Typically, if we want to remove the driver, we > > > > > > > > > > > > > firstly do > > > > > > > > > > > > > stop > > > > > > > > > > > > > streaming, which ensures that the worker has been > > > > > > > > > > > > > canceled. > > > > > > > > > > > > > I agree with your changes from the perspective of > > > > > > > > > > > > > strengthening > > > > > > > > > > > > > the > > > > > > > > > > > > > robustness of the driver code. > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > Kyrie. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Irui Wang (王瑞) <Irui.Wang@mediatek.com> > > > > > > > > > > > > > > 于2023年3月7日周二 > > > > > > > > > > > > > > 18:23写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dear Angelo and Zheng, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your patch and comments. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Dear Kyrie, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please help to check this, thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 2023-03-07 at 10:49 +0100, > > > > > > > > > > > > > > > AngeloGioacchino > > > > > > > > > > > > > > > Del > > > > > > > > > > > > > > > Regno > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > Il 07/03/23 10:27, Zheng Hacker ha scritto: > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is there anyone who can help with this? I > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > provide > > > > > > > > > > > > > > > > > more > > > > > > > > > > > > > > > > > details > > > > > > > > > > > > > > > > > like invoking chain if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Providing more details is always good. > > > > > > > > > > > > > > > > Please > > > > > > > > > > > > > > > > do. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Meanwhile, adding Irui Wang to the loop: > > > > > > > > > > > > > > > > he's > > > > > > > > > > > > > > > > doing > > > > > > > > > > > > > > > > mtk- > > > > > > > > > > > > > > > > jpeg. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > > > > > > > Angelo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > Zheng > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Zheng Wang <zyytlz.wz@163.com> > > > > > > > > > > > > > > > > > 于2023年3月6日周一 > > > > > > > > > > > > > > > > > 14:28写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In mtk_jpeg_probe, &jpeg- > > > > > > > > > > > > > > > > > > >job_timeout_work > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > bound > > > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > mtk_jpeg_job_timeout_work. Then > > > > > > > > > > > > > > > > > > mtk_jpeg_dec_device_run > > > > > > > > > > > > > > > > > > and mtk_jpeg_enc_device_run may be > > > > > > > > > > > > > > > > > > called > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > start > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > work. > > > > > > > > > > > > > > > > > > If we remove the module which will call > > > > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > to make cleanup, there may be a > > > > > > > > > > > > > > > > > > unfinished > > > > > > > > > > > > > > > > > > work. The > > > > > > > > > > > > > > > > > > possible sequence is as follows, which > > > > > > > > > > > > > > > > > > will > > > > > > > > > > > > > > > > > > cause a > > > > > > > > > > > > > > > > > > typical UAF bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Fix it by canceling the work before > > > > > > > > > > > > > > > > > > cleanup > > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > mtk_jpeg_remove > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > CPU0 CPU1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |mtk_jpeg_job_time > > > > > > > > > > > > > > > > > > out_ > > > > > > > > > > > > > > > > > > work > > > > > > > > > > > > > > > > > > mtk_jpeg_remove | > > > > > > > > > > > > > > > > > > v4l2_m2m_release | > > > > > > > > > > > > > > > > > > kfree(m2m_dev); | > > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > > v4l2_m2m_get_curr_priv > > > > > > > > > > > > > > > > > > | m2m_dev- > > > > > > > > > > > > > > > > > > >curr_ctx > > > > > > > > > > > > > > > > > > //use > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zheng Wang < > > > > > > > > > > > > > > > > > > zyytlz.wz@163.com> > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > drivers/media/platform/mediatek/jpeg/mt > > > > > > > > > > > > > > > > > > k_jp > > > > > > > > > > > > > > > > > > eg_c > > > > > > > > > > > > > > > > > > ore.c > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2 +- > > > > > > > > > > > > > > > > > > 1 file changed, 1 insertion(+), 1 > > > > > > > > > > > > > > > > > > deletion(-) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > diff --git > > > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > > index 969516a940ba..364513e7897e 100644 > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > a/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > > +++ > > > > > > > > > > > > > > > > > > b/drivers/media/platform/mediatek/jpeg/ > > > > > > > > > > > > > > > > > > mtk_ > > > > > > > > > > > > > > > > > > jpeg > > > > > > > > > > > > > > > > > > _core. > > > > > > > > > > > > > > > > > > c > > > > > > > > > > > > > > > > > > @@ -1793,7 +1793,7 @@ static int > > > > > > > > > > > > > > > > > > mtk_jpeg_probe(struct > > > > > > > > > > > > > > > > > > platform_device *pdev) > > > > > > > > > > > > > > > > > > static int mtk_jpeg_remove(struct > > > > > > > > > > > > > > > > > > platform_device > > > > > > > > > > > > > > > > > > *pdev) > > > > > > > > > > > > > > > > > > { > > > > > > > > > > > > > > > > > > struct mtk_jpeg_dev *jpeg = > > > > > > > > > > > > > > > > > > platform_get_drvdata(pdev); > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > + cancel_delayed_work(&jpeg- > > > > > > > > > > > > > > > > > > > job_timeout_work); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > pm_runtime_disable(&pdev- > > > > > > > > > > > > > > > > > > >dev); > > > > > > > > > > > > > > > > > > video_unregister_device(jpeg- > > > > > > > > > > > > > > > > > > > vdev); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > v4l2_m2m_release(jpeg- > > > > > > > > > > > > > > > > > > >m2m_dev); > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > 2.25.1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c index 969516a940ba..364513e7897e 100644 --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev) static int mtk_jpeg_remove(struct platform_device *pdev) { struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev); - + cancel_delayed_work(&jpeg->job_timeout_work); pm_runtime_disable(&pdev->dev); video_unregister_device(jpeg->vdev); v4l2_m2m_release(jpeg->m2m_dev);