media: mtk-jpeg: Fix use after free bug due to uncanceled work

Message ID 20230302093715.811758-1-zyytlz.wz@163.com (mailing list archive)
State Superseded
Delegated to: Hans Verkuil
Headers
Series media: mtk-jpeg: Fix use after free bug due to uncanceled work |

Commit Message

Zheng Wang March 2, 2023, 9:37 a.m. UTC
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
and mtk_jpeg_enc_device_run may be called to start the
work.
If we remove the module which will call mtk_jpeg_remove
to make cleanup, there may be a unfinished work. The
possible sequence is as follows, which will cause a
typical UAF bug.

Fix it by canceling the work before cleanup in the mtk_jpeg_remove

CPU0                  CPU1

                    |mtk_jpeg_job_timeout_work
mtk_jpeg_remove     |
  v4l2_m2m_release  |
    kfree(m2m_dev); |
                    |
                    | v4l2_m2m_get_curr_priv
                    |   m2m_dev->curr_ctx //use

Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
---
 drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Guenter Roeck March 9, 2023, 12:27 a.m. UTC | #1
On Thu, Mar 02, 2023 at 05:37:15PM +0800, Zheng Wang wrote:
> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> and mtk_jpeg_enc_device_run may be called to start the
> work.
> If we remove the module which will call mtk_jpeg_remove
> to make cleanup, there may be a unfinished work. The
> possible sequence is as follows, which will cause a
> typical UAF bug.
> 
> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> 
> CPU0                  CPU1
> 
>                     |mtk_jpeg_job_timeout_work
> mtk_jpeg_remove     |
>   v4l2_m2m_release  |
>     kfree(m2m_dev); |
>                     |
>                     | v4l2_m2m_get_curr_priv
>                     |   m2m_dev->curr_ctx //use
> 
> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> ---
>  drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> index 969516a940ba..364513e7897e 100644
> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
>  static int mtk_jpeg_remove(struct platform_device *pdev)
>  {
>  	struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> -
> +	cancel_delayed_work(&jpeg->job_timeout_work);

The empty line is needed (coding style). Also, this doesn't cancel
the worker if it is already running. This should probably be
cancel_delayed_work_sync(). Even then the question is if it is
possible that new work is queued before the device is unregistered.

Guenter

>  	pm_runtime_disable(&pdev->dev);
>  	video_unregister_device(jpeg->vdev);
>  	v4l2_m2m_release(jpeg->m2m_dev);
> -- 
> 2.25.1
>
  
Zheng Hacker March 9, 2023, 3:58 a.m. UTC | #2
Hi,

Thanks for your reply. I think you're right. I don't know if there is
other method to stop new work from enqueing. Could you please give me
some advice about the fix?

Regards,
Zheng

Guenter Roeck <linux@roeck-us.net> 于2023年3月9日周四 08:27写道:
>
> On Thu, Mar 02, 2023 at 05:37:15PM +0800, Zheng Wang wrote:
> > In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
> > mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
> > and mtk_jpeg_enc_device_run may be called to start the
> > work.
> > If we remove the module which will call mtk_jpeg_remove
> > to make cleanup, there may be a unfinished work. The
> > possible sequence is as follows, which will cause a
> > typical UAF bug.
> >
> > Fix it by canceling the work before cleanup in the mtk_jpeg_remove
> >
> > CPU0                  CPU1
> >
> >                     |mtk_jpeg_job_timeout_work
> > mtk_jpeg_remove     |
> >   v4l2_m2m_release  |
> >     kfree(m2m_dev); |
> >                     |
> >                     | v4l2_m2m_get_curr_priv
> >                     |   m2m_dev->curr_ctx //use
> >
> > Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> > ---
> >  drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > index 969516a940ba..364513e7897e 100644
> > --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
> > @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
> >  static int mtk_jpeg_remove(struct platform_device *pdev)
> >  {
> >       struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
> > -
> > +     cancel_delayed_work(&jpeg->job_timeout_work);
>
> The empty line is needed (coding style). Also, this doesn't cancel
> the worker if it is already running. This should probably be
> cancel_delayed_work_sync(). Even then the question is if it is
> possible that new work is queued before the device is unregistered.
>
> Guenter
>
> >       pm_runtime_disable(&pdev->dev);
> >       video_unregister_device(jpeg->vdev);
> >       v4l2_m2m_release(jpeg->m2m_dev);
> > --
> > 2.25.1
> >
  
Guenter Roeck March 9, 2023, 5:30 a.m. UTC | #3
On 3/8/23 19:58, Zheng Hacker wrote:
> Hi,
> 
> Thanks for your reply. I think you're right. I don't know if there is
> other method to stop new work from enqueing. Could you please give me
> some advice about the fix?
> 

Top-posting is discouraged.

Anyway -
I don't know the code well enough to suggest a solution.
It all depends on the driver architecture. The maintainers might
have a better idea.

A worse problem appears to be that the worker is also canceled
from mtk_jpeg_enc_irq() and mtk_jpeg_dec_irq(). Those are non-threaded
interrupt handlers which, as far as I know, must not sleep and thus
can not call cancel_delayed_work_sync(). I have no idea how to solve
that problem either.

Guenter

> Regards,
> Zheng
> 
> Guenter Roeck <linux@roeck-us.net> 于2023年3月9日周四 08:27写道:
>>
>> On Thu, Mar 02, 2023 at 05:37:15PM +0800, Zheng Wang wrote:
>>> In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
>>> mtk_jpeg_job_timeout_work. Then mtk_jpeg_dec_device_run
>>> and mtk_jpeg_enc_device_run may be called to start the
>>> work.
>>> If we remove the module which will call mtk_jpeg_remove
>>> to make cleanup, there may be a unfinished work. The
>>> possible sequence is as follows, which will cause a
>>> typical UAF bug.
>>>
>>> Fix it by canceling the work before cleanup in the mtk_jpeg_remove
>>>
>>> CPU0                  CPU1
>>>
>>>                      |mtk_jpeg_job_timeout_work
>>> mtk_jpeg_remove     |
>>>    v4l2_m2m_release  |
>>>      kfree(m2m_dev); |
>>>                      |
>>>                      | v4l2_m2m_get_curr_priv
>>>                      |   m2m_dev->curr_ctx //use
>>>
>>> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
>>> ---
>>>   drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>>> index 969516a940ba..364513e7897e 100644
>>> --- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>>> +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
>>> @@ -1793,7 +1793,7 @@ static int mtk_jpeg_probe(struct platform_device *pdev)
>>>   static int mtk_jpeg_remove(struct platform_device *pdev)
>>>   {
>>>        struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
>>> -
>>> +     cancel_delayed_work(&jpeg->job_timeout_work);
>>
>> The empty line is needed (coding style). Also, this doesn't cancel
>> the worker if it is already running. This should probably be
>> cancel_delayed_work_sync(). Even then the question is if it is
>> possible that new work is queued before the device is unregistered.
>>
>> Guenter
>>
>>>        pm_runtime_disable(&pdev->dev);
>>>        video_unregister_device(jpeg->vdev);
>>>        v4l2_m2m_release(jpeg->m2m_dev);
>>> --
>>> 2.25.1
>>>
  
Zheng Hacker March 9, 2023, 7:18 a.m. UTC | #4
Guenter Roeck <linux@roeck-us.net> 于2023年3月9日周四 13:30写道:
>
> On 3/8/23 19:58, Zheng Hacker wrote:
> > Hi,
> >
> > Thanks for your reply. I think you're right. I don't know if there is
> > other method to stop new work from enqueing. Could you please give me
> > some advice about the fix?
> >
>
> Top-posting is discouraged.
>

Sorry I forgot that. Thanks for the kind reminder.

> Anyway -
> I don't know the code well enough to suggest a solution.
> It all depends on the driver architecture. The maintainers might
> have a better idea.
>

Yes, some related developers have reached out to me and discussed fixes with me.

> A worse problem appears to be that the worker is also canceled
> from mtk_jpeg_enc_irq() and mtk_jpeg_dec_irq(). Those are non-threaded
> interrupt handlers which, as far as I know, must not sleep and thus
> can not call cancel_delayed_work_sync(). I have no idea how to solve
> that problem either.
>

I'd be glad to pass along your thoughts and recommendations to the
relevant parties.

Best regards,
Zheng
  

Patch

diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 969516a940ba..364513e7897e 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1793,7 +1793,7 @@  static int mtk_jpeg_probe(struct platform_device *pdev)
 static int mtk_jpeg_remove(struct platform_device *pdev)
 {
 	struct mtk_jpeg_dev *jpeg = platform_get_drvdata(pdev);
-
+	cancel_delayed_work(&jpeg->job_timeout_work);
 	pm_runtime_disable(&pdev->dev);
 	video_unregister_device(jpeg->vdev);
 	v4l2_m2m_release(jpeg->m2m_dev);