[RFC] drm/radeon: rework to new fence interface

Message ID	5211F0C5.2040705@canonical.com (mailing list archive)
State	Not Applicable, archived
Headers	Message-ID: <5211F0C5.2040705@canonical.com> Date: Mon, 19 Aug 2013 12:17:41 +0200 From: Maarten Lankhorst <maarten.lankhorst@canonical.com> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 MIME-Version: 1.0 To: dri-devel@lists.freedesktop.org CC: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-media@vger.kernel.org, Alex Deucher <alexander.deucher@amd.com> Subject: [RFC PATCH] drm/radeon: rework to new fence interface References: <20130815124308.14812.58197.stgit@patser> In-Reply-To: <20130815124308.14812.58197.stgit@patser> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-media-owner@vger.kernel.org Precedence: bulk Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2013.8.19.100625

Message ID

5211F0C5.2040705@canonical.com (mailing list archive)

State

Not Applicable, archived

Headers

Message-ID: <5211F0C5.2040705@canonical.com>
Date: Mon, 19 Aug 2013 12:17:41 +0200
From: Maarten Lankhorst <maarten.lankhorst@canonical.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130623 Thunderbird/17.0.7
MIME-Version: 1.0
To: dri-devel@lists.freedesktop.org
CC: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-media@vger.kernel.org,
	Alex Deucher <alexander.deucher@amd.com>
Subject: [RFC PATCH] drm/radeon: rework to new fence interface
References: <20130815124308.14812.58197.stgit@patser>
In-Reply-To: <20130815124308.14812.58197.stgit@patser>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Sender: linux-media-owner@vger.kernel.org
Precedence: bulk

Commit Message

Maarten Lankhorst Aug. 19, 2013, 10:17 a.m. UTC

  Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christian König Aug. 19, 2013, 12:35 p.m. UTC | #1

Am 19.08.2013 12:17, schrieb Maarten Lankhorst:
> [SNIP]
> @@ -190,25 +225,24 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>   		}
>   	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>   
> -	if (wake) {
> +	if (wake)
>   		rdev->fence_drv[ring].last_activity = jiffies;
> -		wake_up_all(&rdev->fence_queue);
> -	}
> +	return wake;
>   }

Very bad idea, when sequence numbers change, you always want to wake up 
the whole fence queue here.

> [SNIP]
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a callback
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +
> +	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
> +	    !fence->rdev->ddev->irq_enabled)
> +		return false;
> +

Do I get that right that you rely on IRQs to be enabled and working 
here? Cause that would be a quite bad idea from the conceptual side.

> +	radeon_irq_kms_sw_irq_get(fence->rdev, fence->ring);
> +
> +	if (__radeon_fence_process(fence->rdev, fence->ring))
> +		wake_up_all_locked(&fence->rdev->fence_queue);
> +
> +	/* did fence get signaled after we enabled the sw irq? */
> +	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
> +		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +		return false;
> +	}
> +
> +	fence->fence_wake.flags = 0;
> +	fence->fence_wake.private = NULL;
> +	fence->fence_wake.func = radeon_fence_check_signaled;
> +	__add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
> +	fence_get(f);
> +
> +	return true;
> +}
> +
>   /**
>    * radeon_fence_signaled - check if a fence has signaled
>    *
>

Christian.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maarten Lankhorst Aug. 19, 2013, 7:37 p.m. UTC | #2

Op 19-08-13 14:35, Christian König schreef:
> Am 19.08.2013 12:17, schrieb Maarten Lankhorst:
>> [SNIP]
>> @@ -190,25 +225,24 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>           }
>>       } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>>   -    if (wake) {
>> +    if (wake)
>>           rdev->fence_drv[ring].last_activity = jiffies;
>> -        wake_up_all(&rdev->fence_queue);
>> -    }
>> +    return wake;
>>   }
>
> Very bad idea, when sequence numbers change, you always want to wake up the whole fence queue here.
Yes, and the callers of this function call wake_up_all or wake_up_all_locked themselves, based on the return value..

>> [SNIP]
>> +/**
>> + * radeon_fence_enable_signaling - enable signalling on fence
>> + * @fence: fence
>> + *
>> + * This function is called with fence_queue lock held, and adds a callback
>> + * to fence_queue that checks if this fence is signaled, and if so it
>> + * signals the fence and removes itself.
>> + */
>> +static bool radeon_fence_enable_signaling(struct fence *f)
>> +{
>> +    struct radeon_fence *fence = to_radeon_fence(f);
>> +
>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>> +        !fence->rdev->ddev->irq_enabled)
>> +        return false;
>> +
>
> Do I get that right that you rely on IRQs to be enabled and working here? Cause that would be a quite bad idea from the conceptual side.
For cross-device synchronization it would be nice to have working irqs, it allows signalling fences faster,
and it allows for callbacks on completion to be called. For internal usage it's no more required than it was before.
>> +    radeon_irq_kms_sw_irq_get(fence->rdev, fence->ring);
>> +
>> +    if (__radeon_fence_process(fence->rdev, fence->ring))
>> +        wake_up_all_locked(&fence->rdev->fence_queue);
>> +
>> +    /* did fence get signaled after we enabled the sw irq? */
>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
>> +        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
>> +        return false;
>> +    }
>> +
>> +    fence->fence_wake.flags = 0;
>> +    fence->fence_wake.private = NULL;
>> +    fence->fence_wake.func = radeon_fence_check_signaled;
>> +    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
>> +    fence_get(f);
>> +
>> +    return true;
>> +}
>> +
>>   /**
>>    * radeon_fence_signaled - check if a fence has signaled
>>    *
>>
>
> Christian.
>
~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian König Aug. 20, 2013, 8:37 a.m. UTC | #3

Am 19.08.2013 21:37, schrieb Maarten Lankhorst:
> Op 19-08-13 14:35, Christian König schreef:
>> Am 19.08.2013 12:17, schrieb Maarten Lankhorst:
>>> [SNIP]
>>> @@ -190,25 +225,24 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>>            }
>>>        } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>>>    -    if (wake) {
>>> +    if (wake)
>>>            rdev->fence_drv[ring].last_activity = jiffies;
>>> -        wake_up_all(&rdev->fence_queue);
>>> -    }
>>> +    return wake;
>>>    }
>> Very bad idea, when sequence numbers change, you always want to wake up the whole fence queue here.
> Yes, and the callers of this function call wake_up_all or wake_up_all_locked themselves, based on the return value..

And as I said that's a very bad idea. The fence processing shouldn't be 
called with any locks held and should be self responsible for activating 
any waiters.

>
>>> [SNIP]
>>> +/**
>>> + * radeon_fence_enable_signaling - enable signalling on fence
>>> + * @fence: fence
>>> + *
>>> + * This function is called with fence_queue lock held, and adds a callback
>>> + * to fence_queue that checks if this fence is signaled, and if so it
>>> + * signals the fence and removes itself.
>>> + */
>>> +static bool radeon_fence_enable_signaling(struct fence *f)
>>> +{
>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>> +
>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>> +        !fence->rdev->ddev->irq_enabled)
>>> +        return false;
>>> +
>> Do I get that right that you rely on IRQs to be enabled and working here? Cause that would be a quite bad idea from the conceptual side.
> For cross-device synchronization it would be nice to have working irqs, it allows signalling fences faster,
> and it allows for callbacks on completion to be called. For internal usage it's no more required than it was before.

That's a big NAK.

The fence processing is actually very fine tuned to avoid IRQs and as 
far as I can see you just leave them enabled by decrementing the atomic 
from IRQ context. Additional to that we need allot of special handling 
in case of a hardware lockup here, which isn't done if you abuse the 
fence interface like this.

Also your approach of leaking the IRQ context outside of the driver is a 
very bad idea from the conceptual side. Please don't modify the fence 
interface at all and instead use the wait functions already exposed by 
radeon_fence.c. If you need some kind of signaling mechanism then wait 
inside a workqueue instead.

Christian.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maarten Lankhorst Aug. 20, 2013, 9:36 a.m. UTC | #4

Op 20-08-13 10:37, Christian König schreef:
> Am 19.08.2013 21:37, schrieb Maarten Lankhorst:
>> Op 19-08-13 14:35, Christian König schreef:
>>> Am 19.08.2013 12:17, schrieb Maarten Lankhorst:
>>>> [SNIP]
>>>> @@ -190,25 +225,24 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>>>            }
>>>>        } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>>>>    -    if (wake) {
>>>> +    if (wake)
>>>>            rdev->fence_drv[ring].last_activity = jiffies;
>>>> -        wake_up_all(&rdev->fence_queue);
>>>> -    }
>>>> +    return wake;
>>>>    }
>>> Very bad idea, when sequence numbers change, you always want to wake up the whole fence queue here.
>> Yes, and the callers of this function call wake_up_all or wake_up_all_locked themselves, based on the return value..
>
> And as I said that's a very bad idea. The fence processing shouldn't be called with any locks held and should be self responsible for activating any waiters.
The call point (enable_signaling) only needs to know whether its own counter has passed or not. This prevents the race where the counter
has elapsed, but the irq was not yet enabled.

I don't really care if enable_signaling updates last_seq or not, it only needs to check if it's own fence has been signaled after enabling sw_irqs.

>>
>>>> [SNIP]
>>>> +/**
>>>> + * radeon_fence_enable_signaling - enable signalling on fence
>>>> + * @fence: fence
>>>> + *
>>>> + * This function is called with fence_queue lock held, and adds a callback
>>>> + * to fence_queue that checks if this fence is signaled, and if so it
>>>> + * signals the fence and removes itself.
>>>> + */
>>>> +static bool radeon_fence_enable_signaling(struct fence *f)
>>>> +{
>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>> +
>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>> +        !fence->rdev->ddev->irq_enabled)
>>>> +        return false;
>>>> +
>>> Do I get that right that you rely on IRQs to be enabled and working here? Cause that would be a quite bad idea from the conceptual side.
>> For cross-device synchronization it would be nice to have working irqs, it allows signalling fences faster,
>> and it allows for callbacks on completion to be called. For internal usage it's no more required than it was before.
>
> That's a big NAK.
>
> The fence processing is actually very fine tuned to avoid IRQs and as far as I can see you just leave them enabled by decrementing the atomic from IRQ context. Additional to that we need allot of special handling in case of a hardware lockup here, which isn't done if you abuse the fence interface like this.
I think it's not needed to leave the irq enabled, it's a leftover from when I was debugging the mac and no interrupt occurred at all.

> Also your approach of leaking the IRQ context outside of the driver is a very bad idea from the conceptual side. Please don't modify the fence interface at all and instead use the wait functions already exposed by radeon_fence.c. If you need some kind of signaling mechanism then wait inside a workqueue instead.
The fence takes up the role of a single shot workqueue here. Manually resetting the counter and calling wake_up_all would end up waking all active fences, there's no special handling needed inside radeon for this.
The fence api does provide a synchronous wait function, but this causes a stall of whomever waits on it. When I was testing this with intel I used the fence callback to poke a register in i915, this allowed it to not block until it hits the wait op in the command stream, and even then only if the callback was not called first.

It's documented that the callbacks can be called from any context and will be called with irqs disabled, so nothing scary should be done. The kernel provides enough debug mechanisms to find any violators.
PROVE_LOCKING and DEBUG_ATOMIC_SLEEP for example.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian König Aug. 20, 2013, 9:51 a.m. UTC | #5

Am 20.08.2013 11:36, schrieb Maarten Lankhorst:
[SNIP]

>>>>> [SNIP]
>>>>> +/**
>>>>> + * radeon_fence_enable_signaling - enable signalling on fence
>>>>> + * @fence: fence
>>>>> + *
>>>>> + * This function is called with fence_queue lock held, and adds a callback
>>>>> + * to fence_queue that checks if this fence is signaled, and if so it
>>>>> + * signals the fence and removes itself.
>>>>> + */
>>>>> +static bool radeon_fence_enable_signaling(struct fence *f)
>>>>> +{
>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>> +
>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>>> +        !fence->rdev->ddev->irq_enabled)
>>>>> +        return false;
>>>>> +
>>>> Do I get that right that you rely on IRQs to be enabled and working here? Cause that would be a quite bad idea from the conceptual side.
>>> For cross-device synchronization it would be nice to have working irqs, it allows signalling fences faster,
>>> and it allows for callbacks on completion to be called. For internal usage it's no more required than it was before.
>> That's a big NAK.
>>
>> The fence processing is actually very fine tuned to avoid IRQs and as far as I can see you just leave them enabled by decrementing the atomic from IRQ context. Additional to that we need allot of special handling in case of a hardware lockup here, which isn't done if you abuse the fence interface like this.
> I think it's not needed to leave the irq enabled, it's a leftover from when I was debugging the mac and no interrupt occurred at all.
>
>> Also your approach of leaking the IRQ context outside of the driver is a very bad idea from the conceptual side. Please don't modify the fence interface at all and instead use the wait functions already exposed by radeon_fence.c. If you need some kind of signaling mechanism then wait inside a workqueue instead.
> The fence takes up the role of a single shot workqueue here. Manually resetting the counter and calling wake_up_all would end up waking all active fences, there's no special handling needed inside radeon for this.

Yeah that's actually the point here, you NEED to activate ALL fences, 
otherwise the fence handling inside the driver won't work.

> The fence api does provide a synchronous wait function, but this causes a stall of whomever waits on it.

Which is perfectly fine. What actually is the use case of not stalling a 
process who wants to wait for something?

> When I was testing this with intel I used the fence callback to poke a register in i915, this allowed it to not block until it hits the wait op in the command stream, and even then only if the callback was not called first.
>
> It's documented that the callbacks can be called from any context and will be called with irqs disabled, so nothing scary should be done. The kernel provides enough debug mechanisms to find any violators.
> PROVE_LOCKING and DEBUG_ATOMIC_SLEEP for example.

No thanks, we even abandoned that concept internal in the driver. Please 
use the blocking wait functions instead.

Christian.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Maarten Lankhorst Aug. 20, 2013, 1:21 p.m. UTC | #6

Op 20-08-13 11:51, Christian König schreef:
> Am 20.08.2013 11:36, schrieb Maarten Lankhorst:
> [SNIP]
>
>>>>>> [SNIP]
>>>>>> +/**
>>>>>> + * radeon_fence_enable_signaling - enable signalling on fence
>>>>>> + * @fence: fence
>>>>>> + *
>>>>>> + * This function is called with fence_queue lock held, and adds a callback
>>>>>> + * to fence_queue that checks if this fence is signaled, and if so it
>>>>>> + * signals the fence and removes itself.
>>>>>> + */
>>>>>> +static bool radeon_fence_enable_signaling(struct fence *f)
>>>>>> +{
>>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>>> +
>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>>>> +        !fence->rdev->ddev->irq_enabled)
>>>>>> +        return false;
>>>>>> +
>>>>> Do I get that right that you rely on IRQs to be enabled and working here? Cause that would be a quite bad idea from the conceptual side.
>>>> For cross-device synchronization it would be nice to have working irqs, it allows signalling fences faster,
>>>> and it allows for callbacks on completion to be called. For internal usage it's no more required than it was before.
>>> That's a big NAK.
>>>
>>> The fence processing is actually very fine tuned to avoid IRQs and as far as I can see you just leave them enabled by decrementing the atomic from IRQ context. Additional to that we need allot of special handling in case of a hardware lockup here, which isn't done if you abuse the fence interface like this.
>> I think it's not needed to leave the irq enabled, it's a leftover from when I was debugging the mac and no interrupt occurred at all.
>>
>>> Also your approach of leaking the IRQ context outside of the driver is a very bad idea from the conceptual side. Please don't modify the fence interface at all and instead use the wait functions already exposed by radeon_fence.c. If you need some kind of signaling mechanism then wait inside a workqueue instead.
>> The fence takes up the role of a single shot workqueue here. Manually resetting the counter and calling wake_up_all would end up waking all active fences, there's no special handling needed inside radeon for this.
>
> Yeah that's actually the point here, you NEED to activate ALL fences, otherwise the fence handling inside the driver won't work.
It's done in a lazy fashion. If there's no need for an activated fence the interrupt will not be enabled.
>> The fence api does provide a synchronous wait function, but this causes a stall of whomever waits on it.
>
> Which is perfectly fine. What actually is the use case of not stalling a process who wants to wait for something?
Does radeon call ttm_bo_wait on all bo's before doing a command submission? No? Why should other drivers do that..

>> When I was testing this with intel I used the fence callback to poke a register in i915, this allowed it to not block until it hits the wait op in the command stream, and even then only if the callback was not called first.
>>
>> It's documented that the callbacks can be called from any context and will be called with irqs disabled, so nothing scary should be done. The kernel provides enough debug mechanisms to find any violators.
>> PROVE_LOCKING and DEBUG_ATOMIC_SLEEP for example.
>
> No thanks, we even abandoned that concept internal in the driver. Please use the blocking wait functions instead.
No, this just stalls all gpu's that share a bo.

The idea is to provide a standardized api so bo's can be synchronized without stalling. The first step to this is ww_mutex.
If this lock is shared between multiple gpu's the same object can be reserved between multiple devices without causing
a deadlock with circular dependencies. With some small patches it's possible to do this already between multiple drivers
that use ttm. ttm_bo_reserve, ttm_bo_unreserve and all the other code dealing with ttm reservations have been converted
to use ww_mutex locking.

Fencing is the next step. When all buffers are locked a callback should be added to any previous fence, and a single new fence
signaling completion of the command submission should be placed on all locked objects. Because the common path is that no
objects are shared, the callback and FIFO stalling will only be needed for dma-bufs. When all callbacks have fired the FIFO can be
unblocked. This prevents having to sync the gpu to the cpu. If a bo is submitted to 1 gpu, and then immediately to another it will not
stall unless needed. For example in a optimus configuration an application could copy a rendered frame from VRAM to a shared
dma-buf (xorg's buffer), then have Xorg copying it again (on intel's gpu) from the dma-buf to a framebuffer .

~Maarten


--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christian König Aug. 20, 2013, 2:16 p.m. UTC | #7

Am 20.08.2013 15:21, schrieb Maarten Lankhorst:
> Op 20-08-13 11:51, Christian König schreef:
>> Am 20.08.2013 11:36, schrieb Maarten Lankhorst:
>> [SNIP]
>>
>>>>>>> [SNIP]
>>>>>>> +/**
>>>>>>> + * radeon_fence_enable_signaling - enable signalling on fence
>>>>>>> + * @fence: fence
>>>>>>> + *
>>>>>>> + * This function is called with fence_queue lock held, and adds a callback
>>>>>>> + * to fence_queue that checks if this fence is signaled, and if so it
>>>>>>> + * signals the fence and removes itself.
>>>>>>> + */
>>>>>>> +static bool radeon_fence_enable_signaling(struct fence *f)
>>>>>>> +{
>>>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>>>> +
>>>>>>> +    if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>>>>> +        !fence->rdev->ddev->irq_enabled)
>>>>>>> +        return false;
>>>>>>> +
>>>>>> Do I get that right that you rely on IRQs to be enabled and working here? Cause that would be a quite bad idea from the conceptual side.
>>>>> For cross-device synchronization it would be nice to have working irqs, it allows signalling fences faster,
>>>>> and it allows for callbacks on completion to be called. For internal usage it's no more required than it was before.
>>>> That's a big NAK.
>>>>
>>>> The fence processing is actually very fine tuned to avoid IRQs and as far as I can see you just leave them enabled by decrementing the atomic from IRQ context. Additional to that we need allot of special handling in case of a hardware lockup here, which isn't done if you abuse the fence interface like this.
>>> I think it's not needed to leave the irq enabled, it's a leftover from when I was debugging the mac and no interrupt occurred at all.
>>>
>>>> Also your approach of leaking the IRQ context outside of the driver is a very bad idea from the conceptual side. Please don't modify the fence interface at all and instead use the wait functions already exposed by radeon_fence.c. If you need some kind of signaling mechanism then wait inside a workqueue instead.
>>> The fence takes up the role of a single shot workqueue here. Manually resetting the counter and calling wake_up_all would end up waking all active fences, there's no special handling needed inside radeon for this.
>> Yeah that's actually the point here, you NEED to activate ALL fences, otherwise the fence handling inside the driver won't work.
> It's done in a lazy fashion. If there's no need for an activated fence the interrupt will not be enabled.
>>> The fence api does provide a synchronous wait function, but this causes a stall of whomever waits on it.
>> Which is perfectly fine. What actually is the use case of not stalling a process who wants to wait for something?
> Does radeon call ttm_bo_wait on all bo's before doing a command submission? No? Why should other drivers do that..

Sure it does if hardware synchronization isn't supported.

>>> When I was testing this with intel I used the fence callback to poke a register in i915, this allowed it to not block until it hits the wait op in the command stream, and even then only if the callback was not called first.
>>>
>>> It's documented that the callbacks can be called from any context and will be called with irqs disabled, so nothing scary should be done. The kernel provides enough debug mechanisms to find any violators.
>>> PROVE_LOCKING and DEBUG_ATOMIC_SLEEP for example.
>> No thanks, we even abandoned that concept internal in the driver. Please use the blocking wait functions instead.
> No, this just stalls all gpu's that share a bo.
>
> The idea is to provide a standardized api so bo's can be synchronized without stalling. The first step to this is ww_mutex.
> If this lock is shared between multiple gpu's the same object can be reserved between multiple devices without causing
> a deadlock with circular dependencies. With some small patches it's possible to do this already between multiple drivers
> that use ttm. ttm_bo_reserve, ttm_bo_unreserve and all the other code dealing with ttm reservations have been converted
> to use ww_mutex locking.
>
> Fencing is the next step. When all buffers are locked a callback should be added to any previous fence, and a single new fence
> signaling completion of the command submission should be placed on all locked objects. Because the common path is that no
> objects are shared, the callback and FIFO stalling will only be needed for dma-bufs. When all callbacks have fired the FIFO can be
> unblocked. This prevents having to sync the gpu to the cpu. If a bo is submitted to 1 gpu, and then immediately to another it will not
> stall unless needed. For example in a optimus configuration an application could copy a rendered frame from VRAM to a shared
> dma-buf (xorg's buffer), then have Xorg copying it again (on intel's gpu) from the dma-buf to a framebuffer .

Yeah, that's the same concept we used to have for multiple rings, to 
avoid stalling if a buffer is currently used in a different part of the 
GPU than the current command submission wants it to. After allot of 
internal discussion we came to the conclusion that it just doesn't worth 
the effort.

Have you thought over all the consequences that are implied by having a 
serialized stream of command submissions?

When a buffer is in use by another device you basically have only two 
options: Either block for the buffer to become unused or use a hardware 
method (semaphores) to sync up your operations. Of course you can 
optimize by using a multiple reader / single writers model, but for this 
you basically just have to teach ttm that there might be more than one 
fence for each bo.

Anyway those things are only optimizations, and in the end it always 
comes down to blocking the command submission if there is no other way 
of hardware synchronization and I absolutely don't see any reason why we 
shouldn't do this with just a normal blocking call to a waitqueue.

Christian.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff mbox

Patch

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 9f19259..971284e 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@ 
 #include <linux/wait.h>
 #include <linux/list.h>
 #include <linux/kref.h>
+#include <linux/fence.h>
 
 #include <ttm/ttm_bo_api.h>
 #include <ttm/ttm_bo_driver.h>
@@ -114,9 +115,6 @@  extern int radeon_aspm;
 /* max number of rings */
 #define RADEON_NUM_RINGS			6
 
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ		0LL
-
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
 #define RADEON_RING_TYPE_GFX_INDEX	0
@@ -285,12 +283,15 @@  struct radeon_fence_driver {
 };
 
 struct radeon_fence {
+	struct fence base;
+
 	struct radeon_device		*rdev;
-	struct kref			kref;
 	/* protected by radeon_fence.lock */
 	uint64_t			seq;
 	/* RB, DMA, etc. */
 	unsigned			ring;
+
+	wait_queue_t fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2039,6 +2040,7 @@  struct radeon_device {
 	struct radeon_mman		mman;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
 	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
 	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	bool				ib_pool_ready;
@@ -2117,11 +2119,6 @@  u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 offset);
 void cik_mm_wdoorbell(struct radeon_device *rdev, u32 offset, u32 v);
 
 /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
  * Registers read & write functions.
  */
 #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 63398ae..d76a187 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1150,6 +1150,7 @@  int radeon_device_init(struct radeon_device *rdev,
 	for (i = 0; i < RADEON_NUM_RINGS; i++) {
 		rdev->ring[i].idx = i;
 	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
 	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
 		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index ddb8f8e..92a1576 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@ 
 #include "radeon.h"
 #include "radeon_trace.h"
 
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+	({								\
+		struct radeon_fence *__f;				\
+		__f = container_of((p), struct radeon_fence, base);	\
+		__f->base.ops == &radeon_fence_ops ? __f : NULL;	\
+	})
+
 /*
  * Fences
  * Fences mark an event in the GPUs pipeline and are used
@@ -111,14 +120,17 @@  int radeon_fence_emit(struct radeon_device *rdev,
 		      struct radeon_fence **fence,
 		      int ring)
 {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
 	/* we are protected by the ring emission mutex */
 	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
 	if ((*fence) == NULL) {
 		return -ENOMEM;
 	}
-	kref_init(&((*fence)->kref));
+	__fence_init(&(*fence)->base, &radeon_fence_ops,
+		     &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
 	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
+	(*fence)->seq = seq;
 	(*fence)->ring = ring;
 	radeon_fence_ring_emit(rdev, ring, *fence);
 	trace_radeon_fence_emit(rdev->ddev, (*fence)->seq);
@@ -126,15 +138,38 @@  int radeon_fence_emit(struct radeon_device *rdev,
 }
 
 /**
- * radeon_fence_process - process a fence
+ * radeon_fence_check_signaled - callback from fence_queue
  *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
- *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
  */
-void radeon_fence_process(struct radeon_device *rdev, int ring)
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = __fence_signal(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		/* probably a bad idea to call this from the irq handler, so lets not.. */
+		atomic_dec(&fence->rdev->irq.ring_int[fence->ring]);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
 {
 	uint64_t seq, last_seq, last_emitted;
 	unsigned count_loop = 0;
@@ -190,25 +225,24 @@  void radeon_fence_process(struct radeon_device *rdev, int ring)
 		}
 	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
 
-	if (wake) {
+	if (wake)
 		rdev->fence_drv[ring].last_activity = jiffies;
-		wake_up_all(&rdev->fence_queue);
-	}
+	return wake;
 }
 
 /**
- * radeon_fence_destroy - destroy a fence
+ * radeon_fence_process - process a fence
  *
- * @kref: fence kref
+ * @rdev: radeon_device pointer
+ * @ring: ring index the fence is associated with
  *
- * Frees the fence object (all asics).
+ * Checks the current fence value and wakes the fence queue
+ * if the sequence number has increased (all asics).
  */
-static void radeon_fence_destroy(struct kref *kref)
+void radeon_fence_process(struct radeon_device *rdev, int ring)
 {
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
+	if (__radeon_fence_process(rdev, ring))
+		wake_up_all(&rdev->fence_queue);
 }
 
 /**
@@ -239,6 +273,49 @@  static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
 	return false;
 }
 
+static bool __radeon_fence_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+
+	return radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring);
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+
+	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+	    !fence->rdev->ddev->irq_enabled)
+		return false;
+
+	radeon_irq_kms_sw_irq_get(fence->rdev, fence->ring);
+
+	if (__radeon_fence_process(fence->rdev, fence->ring))
+		wake_up_all_locked(&fence->rdev->fence_queue);
+
+	/* did fence get signaled after we enabled the sw irq? */
+	if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		return false;
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	return true;
+}
+
 /**
  * radeon_fence_signaled - check if a fence has signaled
  *
@@ -252,11 +329,13 @@  bool radeon_fence_signaled(struct radeon_fence *fence)
 	if (!fence) {
 		return true;
 	}
-	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
-		return true;
-	}
+
 	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
-		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
 		return true;
 	}
 	return false;
@@ -379,7 +458,7 @@  static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
  * radeon_fence_wait - wait for a fence to signal
  *
  * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
  *
  * Wait for the requested fence to signal (all asics).
  * @intr selects whether to use interruptable (true) or non-interruptable
@@ -390,17 +469,17 @@  int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
 	int r;
 
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))
+		return 0;
 
 	r = radeon_fence_wait_seq(fence->rdev, fence->seq,
 				  fence->ring, intr, true);
 	if (r) {
 		return r;
 	}
-	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
 	return 0;
 }
 
@@ -567,8 +646,8 @@  int radeon_fence_wait_any(struct radeon_device *rdev,
 			continue;
 		}
 
-		if (fences[i]->seq == RADEON_FENCE_SIGNALED_SEQ) {
-			/* something was allready signaled */
+		if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fences[i]->base.flags)) {
+			/* already signaled */
 			return 0;
 		}
 
@@ -641,7 +720,7 @@  int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring)
  */
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
 {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
 	return fence;
 }
 
@@ -657,9 +736,8 @@  void radeon_fence_unref(struct radeon_fence **fence)
 	struct radeon_fence *tmp = *fence;
 
 	*fence = NULL;
-	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
-	}
+	if (tmp)
+		fence_put(&tmp->base);
 }
 
 /**
@@ -947,3 +1025,10 @@  int radeon_debugfs_fence_init(struct radeon_device *rdev)
 	return 0;
 #endif
 }
+
+static const struct fence_ops radeon_fence_ops = {
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = __radeon_fence_signaled,
+	.wait = fence_default_wait,
+	.release = NULL,
+};