[REPOST,4/8] android: convert sync to fence api, v5

Message ID 20140618103711.15728.97842.stgit@patser (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Maarten Lankhorst June 18, 2014, 10:37 a.m. UTC
  Just to show it's easy.

Android syncpoints can be mapped to a timeline. This removes the need
to maintain a separate api for synchronization. I've left the android
trace events in place, but the core fence events should already be
sufficient for debugging.

v2:
- Call fence_remove_callback in sync_fence_free if not all fences have fired.
v3:
- Merge Colin Cross' bugfixes, and the android fence merge optimization.
v4:
- Merge with the upstream fixes.
v5:
- Fix small style issues pointed out by Thomas Hellstrom.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: John Stultz <john.stultz@linaro.org>
---
 drivers/staging/android/Kconfig      |    1 
 drivers/staging/android/Makefile     |    2 
 drivers/staging/android/sw_sync.c    |    6 
 drivers/staging/android/sync.c       |  913 +++++++++++-----------------------
 drivers/staging/android/sync.h       |   79 ++-
 drivers/staging/android/sync_debug.c |  247 +++++++++
 drivers/staging/android/trace/sync.h |   12 
 7 files changed, 609 insertions(+), 651 deletions(-)
 create mode 100644 drivers/staging/android/sync_debug.c


--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

Comments

Greg KH June 19, 2014, 1:15 a.m. UTC | #1
On Wed, Jun 18, 2014 at 12:37:11PM +0200, Maarten Lankhorst wrote:
> Just to show it's easy.
> 
> Android syncpoints can be mapped to a timeline. This removes the need
> to maintain a separate api for synchronization. I've left the android
> trace events in place, but the core fence events should already be
> sufficient for debugging.
> 
> v2:
> - Call fence_remove_callback in sync_fence_free if not all fences have fired.
> v3:
> - Merge Colin Cross' bugfixes, and the android fence merge optimization.
> v4:
> - Merge with the upstream fixes.
> v5:
> - Fix small style issues pointed out by Thomas Hellstrom.
> 
> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Acked-by: John Stultz <john.stultz@linaro.org>
> ---
>  drivers/staging/android/Kconfig      |    1 
>  drivers/staging/android/Makefile     |    2 
>  drivers/staging/android/sw_sync.c    |    6 
>  drivers/staging/android/sync.c       |  913 +++++++++++-----------------------
>  drivers/staging/android/sync.h       |   79 ++-
>  drivers/staging/android/sync_debug.c |  247 +++++++++
>  drivers/staging/android/trace/sync.h |   12 
>  7 files changed, 609 insertions(+), 651 deletions(-)
>  create mode 100644 drivers/staging/android/sync_debug.c

With these changes, can we pull the android sync logic out of
drivers/staging/ now?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
Daniel Vetter June 19, 2014, 6:37 a.m. UTC | #2
On Wed, Jun 18, 2014 at 06:15:56PM -0700, Greg KH wrote:
> On Wed, Jun 18, 2014 at 12:37:11PM +0200, Maarten Lankhorst wrote:
> > Just to show it's easy.
> > 
> > Android syncpoints can be mapped to a timeline. This removes the need
> > to maintain a separate api for synchronization. I've left the android
> > trace events in place, but the core fence events should already be
> > sufficient for debugging.
> > 
> > v2:
> > - Call fence_remove_callback in sync_fence_free if not all fences have fired.
> > v3:
> > - Merge Colin Cross' bugfixes, and the android fence merge optimization.
> > v4:
> > - Merge with the upstream fixes.
> > v5:
> > - Fix small style issues pointed out by Thomas Hellstrom.
> > 
> > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> > Acked-by: John Stultz <john.stultz@linaro.org>
> > ---
> >  drivers/staging/android/Kconfig      |    1 
> >  drivers/staging/android/Makefile     |    2 
> >  drivers/staging/android/sw_sync.c    |    6 
> >  drivers/staging/android/sync.c       |  913 +++++++++++-----------------------
> >  drivers/staging/android/sync.h       |   79 ++-
> >  drivers/staging/android/sync_debug.c |  247 +++++++++
> >  drivers/staging/android/trace/sync.h |   12 
> >  7 files changed, 609 insertions(+), 651 deletions(-)
> >  create mode 100644 drivers/staging/android/sync_debug.c
> 
> With these changes, can we pull the android sync logic out of
> drivers/staging/ now?

Afaik the google guys never really looked at this and acked it. So I'm not
sure whether they'll follow along. The other issue I have as the
maintainer of gfx driver is that I don't want to implement support for two
different sync object primitives (once for dma-buf and once for android
syncpts), and my impression thus far has been that even with this we're
not there.

I'm trying to get our own android guys to upstream their i915 syncpts
support, but thus far I haven't managed to convince them to throw people's
time at this.

It looks like a step into the right direction, but until I have the proof
in the form of i915 patches that I won't have to support 2 gfx fencing
frameworks I'm opposed to de-staging android syncpts. Ofc someone else
could do that too, but besides i915 I don't see a full-fledged (modeset
side only kinda doesn't count) upstream gfx driver shipping on android.
-Daniel
  
Thierry Reding June 19, 2014, 11:48 a.m. UTC | #3
On Thu, Jun 19, 2014 at 08:37:27AM +0200, Daniel Vetter wrote:
> On Wed, Jun 18, 2014 at 06:15:56PM -0700, Greg KH wrote:
> > On Wed, Jun 18, 2014 at 12:37:11PM +0200, Maarten Lankhorst wrote:
> > > Just to show it's easy.
> > > 
> > > Android syncpoints can be mapped to a timeline. This removes the need
> > > to maintain a separate api for synchronization. I've left the android
> > > trace events in place, but the core fence events should already be
> > > sufficient for debugging.
> > > 
> > > v2:
> > > - Call fence_remove_callback in sync_fence_free if not all fences have fired.
> > > v3:
> > > - Merge Colin Cross' bugfixes, and the android fence merge optimization.
> > > v4:
> > > - Merge with the upstream fixes.
> > > v5:
> > > - Fix small style issues pointed out by Thomas Hellstrom.
> > > 
> > > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> > > Acked-by: John Stultz <john.stultz@linaro.org>
> > > ---
> > >  drivers/staging/android/Kconfig      |    1 
> > >  drivers/staging/android/Makefile     |    2 
> > >  drivers/staging/android/sw_sync.c    |    6 
> > >  drivers/staging/android/sync.c       |  913 +++++++++++-----------------------
> > >  drivers/staging/android/sync.h       |   79 ++-
> > >  drivers/staging/android/sync_debug.c |  247 +++++++++
> > >  drivers/staging/android/trace/sync.h |   12 
> > >  7 files changed, 609 insertions(+), 651 deletions(-)
> > >  create mode 100644 drivers/staging/android/sync_debug.c
> > 
> > With these changes, can we pull the android sync logic out of
> > drivers/staging/ now?
> 
> Afaik the google guys never really looked at this and acked it. So I'm not
> sure whether they'll follow along. The other issue I have as the
> maintainer of gfx driver is that I don't want to implement support for two
> different sync object primitives (once for dma-buf and once for android
> syncpts), and my impression thus far has been that even with this we're
> not there.
> 
> I'm trying to get our own android guys to upstream their i915 syncpts
> support, but thus far I haven't managed to convince them to throw people's
> time at this.

This has been discussed a fair bit internally recently and some of our
GPU experts have raised concerns that this may result in seriously
degraded performance in our proprietary graphics stack. Now I don't care
very much for the proprietary graphics stack, but by extension I would
assume that the same restrictions are relevant for any open-source
driver as well.

I'm still trying to fully understand all the implications and at the
same time get some of the people who raised concerns to join in this
discussion. As I understand it the concern is mostly about explicit vs.
implicit synchronization and having this mechanism in the kernel will
implicitly synchronize all accesses to these buffers even in cases where
it's not needed (read vs. write locks, etc.). In one particular instance
it was even mentioned that this kind of implicit synchronization can
lead to deadlocks in some use-cases (this was mentioned for Android
compositing, but I suspect that the same may happen for Wayland or X
compositors).

Thierry
  
Daniel Vetter June 19, 2014, 12:28 p.m. UTC | #4
On Thu, Jun 19, 2014 at 1:48 PM, Thierry Reding
<thierry.reding@gmail.com> wrote:
>> > With these changes, can we pull the android sync logic out of
>> > drivers/staging/ now?
>>
>> Afaik the google guys never really looked at this and acked it. So I'm not
>> sure whether they'll follow along. The other issue I have as the
>> maintainer of gfx driver is that I don't want to implement support for two
>> different sync object primitives (once for dma-buf and once for android
>> syncpts), and my impression thus far has been that even with this we're
>> not there.
>>
>> I'm trying to get our own android guys to upstream their i915 syncpts
>> support, but thus far I haven't managed to convince them to throw people's
>> time at this.
>
> This has been discussed a fair bit internally recently and some of our
> GPU experts have raised concerns that this may result in seriously
> degraded performance in our proprietary graphics stack. Now I don't care
> very much for the proprietary graphics stack, but by extension I would
> assume that the same restrictions are relevant for any open-source
> driver as well.
>
> I'm still trying to fully understand all the implications and at the
> same time get some of the people who raised concerns to join in this
> discussion. As I understand it the concern is mostly about explicit vs.
> implicit synchronization and having this mechanism in the kernel will
> implicitly synchronize all accesses to these buffers even in cases where
> it's not needed (read vs. write locks, etc.). In one particular instance
> it was even mentioned that this kind of implicit synchronization can
> lead to deadlocks in some use-cases (this was mentioned for Android
> compositing, but I suspect that the same may happen for Wayland or X
> compositors).

Well the implicit fences here actually can't deadlock. That's the
entire point behind using ww mutexes. I've also heard tons of
complaints about implicit enforced syncing (especially from opencl
people), but in the end drivers and always expose unsynchronized
access for specific cases. We do that in i915 for upload buffers and
other fun stuff. This is about shared stuff across different drivers
and different processes.

I also expect that i915 will loose implicit syncing in a few upcoming
hw modes because explicit syncing is a more natural fit there.

All that isn't about the discussion at hand imo since no matter what
i915 needs to have on internal representation for a bit of gpu work,
and afaics right now we don't have that. With this patch android
syncpts use Maarten's fences internally, but I can't freely exchange
one for the other. So in i915 I still expect to get stuck with both of
them, which is one too many.

The other issue (and I haven't dug into details that much) I have with
syncpts are some of the interface choices. Apparently you can commit a
fence after creation (or at least the hw composer interface works like
that) which means userspace can construct deadlocks with android
syncpts if I'm not super careful in my driver. I haven't seen any
generic code to do that, so I presume everyone just blindly trusts
surface-flinger to not do that. Speaks of the average quality of an
android gfx driver if the kernel is less trusted than the compositor
in userspace ...

There's a few other things like exposing timestamps (which are tricky
to do right, our driver is littered with wrong attempts) and other
details.

Finally I've never seen anyone from google or any android product guy
push a real driver enabling for syncpts to upstream, and google itself
has a bit a history of constantly exchanging their gfx framework for
the next best thing. So I really doubt this is worthwhile to pursue in
upstream with our essentially eternal api guarantees. At least until
we see serious uptake from vendors and gfx driver guys. Unfortunately
the Intel android folks are no exception here and haven't pushed
anything like this in my direction yet at all. Despite multiple pokes
from my side.

So from my side I think we should move ahead with Maarten's work and
figure the android side out once there's real interest.
-Daniel
  
Colin Cross June 19, 2014, 3:22 p.m. UTC | #5
On Wed, Jun 18, 2014 at 11:37 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Wed, Jun 18, 2014 at 06:15:56PM -0700, Greg KH wrote:
>> On Wed, Jun 18, 2014 at 12:37:11PM +0200, Maarten Lankhorst wrote:
>> > Just to show it's easy.
>> >
>> > Android syncpoints can be mapped to a timeline. This removes the need
>> > to maintain a separate api for synchronization. I've left the android
>> > trace events in place, but the core fence events should already be
>> > sufficient for debugging.
>> >
>> > v2:
>> > - Call fence_remove_callback in sync_fence_free if not all fences have fired.
>> > v3:
>> > - Merge Colin Cross' bugfixes, and the android fence merge optimization.
>> > v4:
>> > - Merge with the upstream fixes.
>> > v5:
>> > - Fix small style issues pointed out by Thomas Hellstrom.
>> >
>> > Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>> > Acked-by: John Stultz <john.stultz@linaro.org>
>> > ---
>> >  drivers/staging/android/Kconfig      |    1
>> >  drivers/staging/android/Makefile     |    2
>> >  drivers/staging/android/sw_sync.c    |    6
>> >  drivers/staging/android/sync.c       |  913 +++++++++++-----------------------
>> >  drivers/staging/android/sync.h       |   79 ++-
>> >  drivers/staging/android/sync_debug.c |  247 +++++++++
>> >  drivers/staging/android/trace/sync.h |   12
>> >  7 files changed, 609 insertions(+), 651 deletions(-)
>> >  create mode 100644 drivers/staging/android/sync_debug.c
>>
>> With these changes, can we pull the android sync logic out of
>> drivers/staging/ now?
>
> Afaik the google guys never really looked at this and acked it. So I'm not
> sure whether they'll follow along. The other issue I have as the
> maintainer of gfx driver is that I don't want to implement support for two
> different sync object primitives (once for dma-buf and once for android
> syncpts), and my impression thus far has been that even with this we're
> not there.

We have tested these patches to use dma fences to back the android
sync driver and not found any major issues.  However, my understanding
is that dma fences are designed for implicit sync, and explicit sync
through the android sync driver is bolted on the side to share code.
Android is not moving away from explicit sync, but we do wrap all of
our userspace sync accesses through libsync
(https://android.googlesource.com/platform/system/core/+/master/libsync/sync.c,
ignore the sw_sync parts), so if the kernel supported a slightly
different userspace explicit sync interface we could adapt to it
fairly easily.  All we require is that individual kernel drivers need
to be able to accept work alongisde an fd to wait on, and to return an
fd that will signal when the work is done, and that userspace has some
way to merge two of those fds, wait on an fd, and get some debugging
info from an fd.  However, this patch set doesn't do that, it has no
way to export a dma fence as an fd except through the android sync
driver, so it is not yet ready to fully replace android sync.

> I'm trying to get our own android guys to upstream their i915 syncpts
> support, but thus far I haven't managed to convince them to throw people's
> time at this.
>
> It looks like a step into the right direction, but until I have the proof
> in the form of i915 patches that I won't have to support 2 gfx fencing
> frameworks I'm opposed to de-staging android syncpts. Ofc someone else
> could do that too, but besides i915 I don't see a full-fledged (modeset
> side only kinda doesn't count) upstream gfx driver shipping on android.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
Colin Cross June 19, 2014, 3:35 p.m. UTC | #6
On Thu, Jun 19, 2014 at 5:28 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jun 19, 2014 at 1:48 PM, Thierry Reding
> <thierry.reding@gmail.com> wrote:
>>> > With these changes, can we pull the android sync logic out of
>>> > drivers/staging/ now?
>>>
>>> Afaik the google guys never really looked at this and acked it. So I'm not
>>> sure whether they'll follow along. The other issue I have as the
>>> maintainer of gfx driver is that I don't want to implement support for two
>>> different sync object primitives (once for dma-buf and once for android
>>> syncpts), and my impression thus far has been that even with this we're
>>> not there.
>>>
>>> I'm trying to get our own android guys to upstream their i915 syncpts
>>> support, but thus far I haven't managed to convince them to throw people's
>>> time at this.
>>
>> This has been discussed a fair bit internally recently and some of our
>> GPU experts have raised concerns that this may result in seriously
>> degraded performance in our proprietary graphics stack. Now I don't care
>> very much for the proprietary graphics stack, but by extension I would
>> assume that the same restrictions are relevant for any open-source
>> driver as well.
>>
>> I'm still trying to fully understand all the implications and at the
>> same time get some of the people who raised concerns to join in this
>> discussion. As I understand it the concern is mostly about explicit vs.
>> implicit synchronization and having this mechanism in the kernel will
>> implicitly synchronize all accesses to these buffers even in cases where
>> it's not needed (read vs. write locks, etc.). In one particular instance
>> it was even mentioned that this kind of implicit synchronization can
>> lead to deadlocks in some use-cases (this was mentioned for Android
>> compositing, but I suspect that the same may happen for Wayland or X
>> compositors).
>
> Well the implicit fences here actually can't deadlock. That's the
> entire point behind using ww mutexes. I've also heard tons of
> complaints about implicit enforced syncing (especially from opencl
> people), but in the end drivers and always expose unsynchronized
> access for specific cases. We do that in i915 for upload buffers and
> other fun stuff. This is about shared stuff across different drivers
> and different processes.
>
> I also expect that i915 will loose implicit syncing in a few upcoming
> hw modes because explicit syncing is a more natural fit there.
>
> All that isn't about the discussion at hand imo since no matter what
> i915 needs to have on internal representation for a bit of gpu work,
> and afaics right now we don't have that. With this patch android
> syncpts use Maarten's fences internally, but I can't freely exchange
> one for the other. So in i915 I still expect to get stuck with both of
> them, which is one too many.
>
> The other issue (and I haven't dug into details that much) I have with
> syncpts are some of the interface choices. Apparently you can commit a
> fence after creation (or at least the hw composer interface works like
> that) which means userspace can construct deadlocks with android
> syncpts if I'm not super careful in my driver. I haven't seen any
> generic code to do that, so I presume everyone just blindly trusts
> surface-flinger to not do that. Speaks of the average quality of an
> android gfx driver if the kernel is less trusted than the compositor
> in userspace ...

Android sync is designed not to allow userspace to deadlock the
kernel, a sync_pt should only get created by the kernel when it has
received a chunk of work that it expects to complete in the near
future.  The CONFIG_SW_SYNC_USER driver violates that by allowing
userspace to create and signal arbitrary sync points, but that is
intended only for testing sync.

> There's a few other things like exposing timestamps (which are tricky
> to do right, our driver is littered with wrong attempts) and other
> details.

Timestamps are necessary for vsync synchronization to reduce the frame latency.

> Finally I've never seen anyone from google or any android product guy
> push a real driver enabling for syncpts to upstream, and google itself
> has a bit a history of constantly exchanging their gfx framework for
> the next best thing. So I really doubt this is worthwhile to pursue in
> upstream with our essentially eternal api guarantees. At least until
> we see serious uptake from vendors and gfx driver guys. Unfortunately
> the Intel android folks are no exception here and haven't pushed
> anything like this in my direction yet at all. Despite multiple pokes
> from my side.

As far as I know, every SoC vendor that supports android is using sync
now, but none of them have succeeded in pushing their drivers upstream
for a variety of other reasons (interfaces only used by closed source
userspaces, KMS/DRM vs ADF, ION, etc.).

> So from my side I think we should move ahead with Maarten's work and
> figure the android side out once there's real interest.

As long as that doesn't involve removing the Android sync interfaces
from staging until dma fence fds are supported, that's fine with me.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
Daniel Vetter June 19, 2014, 4:34 p.m. UTC | #7
On Thu, Jun 19, 2014 at 08:35:29AM -0700, Colin Cross wrote:
> On Thu, Jun 19, 2014 at 5:28 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Thu, Jun 19, 2014 at 1:48 PM, Thierry Reding
> > <thierry.reding@gmail.com> wrote:
> >>> > With these changes, can we pull the android sync logic out of
> >>> > drivers/staging/ now?
> >>>
> >>> Afaik the google guys never really looked at this and acked it. So I'm not
> >>> sure whether they'll follow along. The other issue I have as the
> >>> maintainer of gfx driver is that I don't want to implement support for two
> >>> different sync object primitives (once for dma-buf and once for android
> >>> syncpts), and my impression thus far has been that even with this we're
> >>> not there.
> >>>
> >>> I'm trying to get our own android guys to upstream their i915 syncpts
> >>> support, but thus far I haven't managed to convince them to throw people's
> >>> time at this.
> >>
> >> This has been discussed a fair bit internally recently and some of our
> >> GPU experts have raised concerns that this may result in seriously
> >> degraded performance in our proprietary graphics stack. Now I don't care
> >> very much for the proprietary graphics stack, but by extension I would
> >> assume that the same restrictions are relevant for any open-source
> >> driver as well.
> >>
> >> I'm still trying to fully understand all the implications and at the
> >> same time get some of the people who raised concerns to join in this
> >> discussion. As I understand it the concern is mostly about explicit vs.
> >> implicit synchronization and having this mechanism in the kernel will
> >> implicitly synchronize all accesses to these buffers even in cases where
> >> it's not needed (read vs. write locks, etc.). In one particular instance
> >> it was even mentioned that this kind of implicit synchronization can
> >> lead to deadlocks in some use-cases (this was mentioned for Android
> >> compositing, but I suspect that the same may happen for Wayland or X
> >> compositors).
> >
> > Well the implicit fences here actually can't deadlock. That's the
> > entire point behind using ww mutexes. I've also heard tons of
> > complaints about implicit enforced syncing (especially from opencl
> > people), but in the end drivers and always expose unsynchronized
> > access for specific cases. We do that in i915 for upload buffers and
> > other fun stuff. This is about shared stuff across different drivers
> > and different processes.
> >
> > I also expect that i915 will loose implicit syncing in a few upcoming
> > hw modes because explicit syncing is a more natural fit there.
> >
> > All that isn't about the discussion at hand imo since no matter what
> > i915 needs to have on internal representation for a bit of gpu work,
> > and afaics right now we don't have that. With this patch android
> > syncpts use Maarten's fences internally, but I can't freely exchange
> > one for the other. So in i915 I still expect to get stuck with both of
> > them, which is one too many.
> >
> > The other issue (and I haven't dug into details that much) I have with
> > syncpts are some of the interface choices. Apparently you can commit a
> > fence after creation (or at least the hw composer interface works like
> > that) which means userspace can construct deadlocks with android
> > syncpts if I'm not super careful in my driver. I haven't seen any
> > generic code to do that, so I presume everyone just blindly trusts
> > surface-flinger to not do that. Speaks of the average quality of an
> > android gfx driver if the kernel is less trusted than the compositor
> > in userspace ...
> 
> Android sync is designed not to allow userspace to deadlock the
> kernel, a sync_pt should only get created by the kernel when it has
> received a chunk of work that it expects to complete in the near
> future.  The CONFIG_SW_SYNC_USER driver violates that by allowing
> userspace to create and signal arbitrary sync points, but that is
> intended only for testing sync.

Ok, that makes sense. As long as we sufficiently taint the kernel and hide
the sw_sync framework we should be good. I was confused by the hw composer
interface spec which seemed to suggest that the fences for a screen update
completion should be attached before surfaceflinger commits the state. But
I never looked at an implemention so guess that impression is wrong.

> > There's a few other things like exposing timestamps (which are tricky
> > to do right, our driver is littered with wrong attempts) and other
> > details.
> 
> Timestamps are necessary for vsync synchronization to reduce the frame latency.

I'm not against timestamps (we have them for drm vblank events too after
all), just would like for them to be optional. And we need to give
userspace very clear indication which hw clock the timestamp was based on
(or whether we're using the clock_monotonic system clock) to make sure
debug and profiling tools can properly align things. Because hw clocks
always get out of sync. Last time I've looked at the syncpt ioctls that
part was missing.

> > Finally I've never seen anyone from google or any android product guy
> > push a real driver enabling for syncpts to upstream, and google itself
> > has a bit a history of constantly exchanging their gfx framework for
> > the next best thing. So I really doubt this is worthwhile to pursue in
> > upstream with our essentially eternal api guarantees. At least until
> > we see serious uptake from vendors and gfx driver guys. Unfortunately
> > the Intel android folks are no exception here and haven't pushed
> > anything like this in my direction yet at all. Despite multiple pokes
> > from my side.
> 
> As far as I know, every SoC vendor that supports android is using sync
> now, but none of them have succeeded in pushing their drivers upstream
> for a variety of other reasons (interfaces only used by closed source
> userspaces, KMS/DRM vs ADF, ION, etc.).

Yeah I know, and it double-frustrates me that I haven't yet managed to
drag our own team into the public. Except for a few bolt-on hacks to make
deadlines the android driver is the upstream one ...

> > So from my side I think we should move ahead with Maarten's work and
> > figure the android side out once there's real interest.
> 
> As long as that doesn't involve removing the Android sync interfaces
> from staging until dma fence fds are supported, that's fine with me.

Nah, definitely not asking for that. I'd just like to have a real
implemention in a real driver merged upstream and the userpsace/kernel ABI
reviewed a bit before I want to sign up for something we'll need to keep
working forever. E.g. I also think we should expose the actual waiting
through poll and friends for neater integration with the usual display
toolkit event loops. egl shys away from such platform specific stuff, but
we can do it.

Like I've said I want to have to deal with one fence primitive in the
lower levels of my driver for scheduling and synchronization and all that.
The fence proposal fits the bill from my pov since the implicit
synchronization with dma-bufs is optional. But the integration with
android syncpts seems to not yet be there really.
-Daniel
  
Thierry Reding June 20, 2014, 8:52 p.m. UTC | #8
On Thu, Jun 19, 2014 at 02:28:14PM +0200, Daniel Vetter wrote:
> On Thu, Jun 19, 2014 at 1:48 PM, Thierry Reding
> <thierry.reding@gmail.com> wrote:
> >> > With these changes, can we pull the android sync logic out of
> >> > drivers/staging/ now?
> >>
> >> Afaik the google guys never really looked at this and acked it. So I'm not
> >> sure whether they'll follow along. The other issue I have as the
> >> maintainer of gfx driver is that I don't want to implement support for two
> >> different sync object primitives (once for dma-buf and once for android
> >> syncpts), and my impression thus far has been that even with this we're
> >> not there.
> >>
> >> I'm trying to get our own android guys to upstream their i915 syncpts
> >> support, but thus far I haven't managed to convince them to throw people's
> >> time at this.
> >
> > This has been discussed a fair bit internally recently and some of our
> > GPU experts have raised concerns that this may result in seriously
> > degraded performance in our proprietary graphics stack. Now I don't care
> > very much for the proprietary graphics stack, but by extension I would
> > assume that the same restrictions are relevant for any open-source
> > driver as well.
> >
> > I'm still trying to fully understand all the implications and at the
> > same time get some of the people who raised concerns to join in this
> > discussion. As I understand it the concern is mostly about explicit vs.
> > implicit synchronization and having this mechanism in the kernel will
> > implicitly synchronize all accesses to these buffers even in cases where
> > it's not needed (read vs. write locks, etc.). In one particular instance
> > it was even mentioned that this kind of implicit synchronization can
> > lead to deadlocks in some use-cases (this was mentioned for Android
> > compositing, but I suspect that the same may happen for Wayland or X
> > compositors).
> 
> Well the implicit fences here actually can't deadlock. That's the
> entire point behind using ww mutexes. I've also heard tons of
> complaints about implicit enforced syncing (especially from opencl
> people), but in the end drivers and always expose unsynchronized
> access for specific cases. We do that in i915 for upload buffers and
> other fun stuff. This is about shared stuff across different drivers
> and different processes.

Tegra K1 needs to share buffers across different drivers even for very
basic use-cases since the GPU and display drivers are separate. So while
I agree that the GPU driver can still use explicit synchronization for
internal work, things aren't that simple in general.

Let me try to reconstruct the use-case that caused the lock on Android:
the compositor uses a hardware overlay to display an image. The system
detects that there's little activity and instructs the compositor to put
everything into one image and scan out only that (for power efficiency).
Now with implicit locking the display driver has a lock on the image, so
the GPU (used for compositing) needs to wait for it before it can
composite everything into one image. But the display driver cannot
release the lock on the image until the final image has been composited
and can be displayed instead.

This may not be technically a deadlock, but it's still a stalemate.
Unless I'm missing something fundamental about DMA fences and ww mutexes
I don't see how you can get out of this situation.

Explicit vs. implicit synchronization may also become more of an issue
as buffers are imported from other sources (such as cameras).

> Finally I've never seen anyone from google or any android product guy
> push a real driver enabling for syncpts to upstream, and google itself
> has a bit a history of constantly exchanging their gfx framework for
> the next best thing. So I really doubt this is worthwhile to pursue in
> upstream with our essentially eternal api guarantees. At least until
> we see serious uptake from vendors and gfx driver guys. Unfortunately
> the Intel android folks are no exception here and haven't pushed
> anything like this in my direction yet at all. Despite multiple pokes
> from my side.
> 
> So from my side I think we should move ahead with Maarten's work and
> figure the android side out once there's real interest.

The downside of that is that we may end up with two different ways to
synchronize buffers if it turns out that we can't make Android (or other
use-cases) work with DMA fence. However I don't think that justifies
postponing this patch set indefinitely.

Thierry
  
Maarten Lankhorst June 23, 2014, 8:45 a.m. UTC | #9
Hey,

op 20-06-14 22:52, Thierry Reding schreef:
> On Thu, Jun 19, 2014 at 02:28:14PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 19, 2014 at 1:48 PM, Thierry Reding
>> <thierry.reding@gmail.com> wrote:
>>>>> With these changes, can we pull the android sync logic out of
>>>>> drivers/staging/ now?
>>>> Afaik the google guys never really looked at this and acked it. So I'm not
>>>> sure whether they'll follow along. The other issue I have as the
>>>> maintainer of gfx driver is that I don't want to implement support for two
>>>> different sync object primitives (once for dma-buf and once for android
>>>> syncpts), and my impression thus far has been that even with this we're
>>>> not there.
>>>>
>>>> I'm trying to get our own android guys to upstream their i915 syncpts
>>>> support, but thus far I haven't managed to convince them to throw people's
>>>> time at this.
>>> This has been discussed a fair bit internally recently and some of our
>>> GPU experts have raised concerns that this may result in seriously
>>> degraded performance in our proprietary graphics stack. Now I don't care
>>> very much for the proprietary graphics stack, but by extension I would
>>> assume that the same restrictions are relevant for any open-source
>>> driver as well.
>>>
>>> I'm still trying to fully understand all the implications and at the
>>> same time get some of the people who raised concerns to join in this
>>> discussion. As I understand it the concern is mostly about explicit vs.
>>> implicit synchronization and having this mechanism in the kernel will
>>> implicitly synchronize all accesses to these buffers even in cases where
>>> it's not needed (read vs. write locks, etc.). In one particular instance
>>> it was even mentioned that this kind of implicit synchronization can
>>> lead to deadlocks in some use-cases (this was mentioned for Android
>>> compositing, but I suspect that the same may happen for Wayland or X
>>> compositors).
>> Well the implicit fences here actually can't deadlock. That's the
>> entire point behind using ww mutexes. I've also heard tons of
>> complaints about implicit enforced syncing (especially from opencl
>> people), but in the end drivers and always expose unsynchronized
>> access for specific cases. We do that in i915 for upload buffers and
>> other fun stuff. This is about shared stuff across different drivers
>> and different processes.
> Tegra K1 needs to share buffers across different drivers even for very
> basic use-cases since the GPU and display drivers are separate. So while
> I agree that the GPU driver can still use explicit synchronization for
> internal work, things aren't that simple in general.
>
> Let me try to reconstruct the use-case that caused the lock on Android:
> the compositor uses a hardware overlay to display an image. The system
> detects that there's little activity and instructs the compositor to put
> everything into one image and scan out only that (for power efficiency).
> Now with implicit locking the display driver has a lock on the image, so
> the GPU (used for compositing) needs to wait for it before it can
> composite everything into one image. But the display driver cannot
> release the lock on the image until the final image has been composited
> and can be displayed instead.
>
> This may not be technically a deadlock, but it's still a stalemate.
> Unless I'm missing something fundamental about DMA fences and ww mutexes
> I don't see how you can get out of this situation.
This sounds like a case for implicit shared fences. ;-) Reading and scanning out would
only wait for the last 'exclusive' fence, not on each other.

But in drivers/drm I can encounter a similar issue, people expect to be able to
overwrite the contents of the currently displayed buffer, so I 'solved' it by not adding
a fence on the buffer, only by waiting for buffer idle before page flipping.
The rationale is that the buffer is pinned internally, and the backing storage cannot
go away until dma_buf_unmap_attachment is called. So when you render to the
current front buffer without queuing a page flip you get exactly what you expect. ;-)

> Explicit vs. implicit synchronization may also become more of an issue
> as buffers are imported from other sources (such as cameras).
Yeah, but the kernel space primitives would in both cases be the same, so drivers don't need to implement 2 separate fencing mechanisms for that. :-)

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
Daniel Vetter July 7, 2014, 1:28 p.m. UTC | #10
On Mon, Jun 23, 2014 at 10:45 AM, Maarten Lankhorst
<maarten.lankhorst@canonical.com> wrote:
> But in drivers/drm I can encounter a similar issue, people expect to be able to
> overwrite the contents of the currently displayed buffer, so I 'solved' it by not adding
> a fence on the buffer, only by waiting for buffer idle before page flipping.
> The rationale is that the buffer is pinned internally, and the backing storage cannot
> go away until dma_buf_unmap_attachment is called. So when you render to the
> current front buffer without queuing a page flip you get exactly what you expect. ;-)

Yeah, scanout buffers are special and imo we should only uses fences
as barriers just around the flip, but _not_ to prevent frontbuffer in
general. Userspace is after all allowed to do that (e.g. with the dumb
bo ioctls). If we'd premanently lock scanout buffers that would indeed
result in hilarity all over the place, no surprises there. And all
current drivers with dynamic memory management solve this through
pinning, but not by restricting write access at all.

So I think we can shrug this scenario off as a non-issue of the "don't
do this" kind ;-) Thierry, is there anything else you've stumbled over
in the tegra k1 enabling work?

I still get the impression that there's an awful lot of
misunderstandings between the explicit and implicit syncing camps and
that we need to do a lot more talking for a better understanding ...

Cheers, Daniel
  

Patch

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 99e484f845f2..51607e9aa049 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -88,6 +88,7 @@  config SYNC
 	bool "Synchronization framework"
 	default n
 	select ANON_INODES
+	select DMA_SHARED_BUFFER
 	---help---
 	  This option enables the framework for synchronization between multiple
 	  drivers.  Sync implementations can take advantage of hardware
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 0a01e1914905..517ad5ffa429 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -9,5 +9,5 @@  obj-$(CONFIG_ANDROID_TIMED_OUTPUT)	+= timed_output.o
 obj-$(CONFIG_ANDROID_TIMED_GPIO)	+= timed_gpio.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)	+= lowmemorykiller.o
 obj-$(CONFIG_ANDROID_INTF_ALARM_DEV)	+= alarm-dev.o
-obj-$(CONFIG_SYNC)			+= sync.o
+obj-$(CONFIG_SYNC)			+= sync.o sync_debug.o
 obj-$(CONFIG_SW_SYNC)			+= sw_sync.o
diff --git a/drivers/staging/android/sw_sync.c b/drivers/staging/android/sw_sync.c
index 12a136ec1cec..a76db3ff87cb 100644
--- a/drivers/staging/android/sw_sync.c
+++ b/drivers/staging/android/sw_sync.c
@@ -50,7 +50,7 @@  static struct sync_pt *sw_sync_pt_dup(struct sync_pt *sync_pt)
 {
 	struct sw_sync_pt *pt = (struct sw_sync_pt *) sync_pt;
 	struct sw_sync_timeline *obj =
-		(struct sw_sync_timeline *)sync_pt->parent;
+		(struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
 	return (struct sync_pt *) sw_sync_pt_create(obj, pt->value);
 }
@@ -59,7 +59,7 @@  static int sw_sync_pt_has_signaled(struct sync_pt *sync_pt)
 {
 	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
 	struct sw_sync_timeline *obj =
-		(struct sw_sync_timeline *)sync_pt->parent;
+		(struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
 	return sw_sync_cmp(obj->value, pt->value) >= 0;
 }
@@ -97,7 +97,6 @@  static void sw_sync_pt_value_str(struct sync_pt *sync_pt,
 				       char *str, int size)
 {
 	struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
-
 	snprintf(str, size, "%d", pt->value);
 }
 
@@ -157,7 +156,6 @@  static int sw_sync_open(struct inode *inode, struct file *file)
 static int sw_sync_release(struct inode *inode, struct file *file)
 {
 	struct sw_sync_timeline *obj = file->private_data;
-
 	sync_timeline_destroy(&obj->obj);
 	return 0;
 }
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 18174f7c871c..70b09b5001ba 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -31,22 +31,13 @@ 
 #define CREATE_TRACE_POINTS
 #include "trace/sync.h"
 
-static void sync_fence_signal_pt(struct sync_pt *pt);
-static int _sync_pt_has_signaled(struct sync_pt *pt);
-static void sync_fence_free(struct kref *kref);
-static void sync_dump(void);
-
-static LIST_HEAD(sync_timeline_list_head);
-static DEFINE_SPINLOCK(sync_timeline_list_lock);
-
-static LIST_HEAD(sync_fence_list_head);
-static DEFINE_SPINLOCK(sync_fence_list_lock);
+static const struct fence_ops android_fence_ops;
+static const struct file_operations sync_fence_fops;
 
 struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
 					   int size, const char *name)
 {
 	struct sync_timeline *obj;
-	unsigned long flags;
 
 	if (size < sizeof(struct sync_timeline))
 		return NULL;
@@ -57,17 +48,14 @@  struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
 
 	kref_init(&obj->kref);
 	obj->ops = ops;
+	obj->context = fence_context_alloc(1);
 	strlcpy(obj->name, name, sizeof(obj->name));
 
 	INIT_LIST_HEAD(&obj->child_list_head);
-	spin_lock_init(&obj->child_list_lock);
-
 	INIT_LIST_HEAD(&obj->active_list_head);
-	spin_lock_init(&obj->active_list_lock);
+	spin_lock_init(&obj->child_list_lock);
 
-	spin_lock_irqsave(&sync_timeline_list_lock, flags);
-	list_add_tail(&obj->sync_timeline_list, &sync_timeline_list_head);
-	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+	sync_timeline_debug_add(obj);
 
 	return obj;
 }
@@ -77,11 +65,8 @@  static void sync_timeline_free(struct kref *kref)
 {
 	struct sync_timeline *obj =
 		container_of(kref, struct sync_timeline, kref);
-	unsigned long flags;
 
-	spin_lock_irqsave(&sync_timeline_list_lock, flags);
-	list_del(&obj->sync_timeline_list);
-	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+	sync_timeline_debug_remove(obj);
 
 	if (obj->ops->release_obj)
 		obj->ops->release_obj(obj);
@@ -89,6 +74,16 @@  static void sync_timeline_free(struct kref *kref)
 	kfree(obj);
 }
 
+static void sync_timeline_get(struct sync_timeline *obj)
+{
+	kref_get(&obj->kref);
+}
+
+static void sync_timeline_put(struct sync_timeline *obj)
+{
+	kref_put(&obj->kref, sync_timeline_free);
+}
+
 void sync_timeline_destroy(struct sync_timeline *obj)
 {
 	obj->destroyed = true;
@@ -102,75 +97,33 @@  void sync_timeline_destroy(struct sync_timeline *obj)
 	 * signal any children that their parent is going away.
 	 */
 	sync_timeline_signal(obj);
-
-	kref_put(&obj->kref, sync_timeline_free);
+	sync_timeline_put(obj);
 }
 EXPORT_SYMBOL(sync_timeline_destroy);
 
-static void sync_timeline_add_pt(struct sync_timeline *obj, struct sync_pt *pt)
-{
-	unsigned long flags;
-
-	pt->parent = obj;
-
-	spin_lock_irqsave(&obj->child_list_lock, flags);
-	list_add_tail(&pt->child_list, &obj->child_list_head);
-	spin_unlock_irqrestore(&obj->child_list_lock, flags);
-}
-
-static void sync_timeline_remove_pt(struct sync_pt *pt)
-{
-	struct sync_timeline *obj = pt->parent;
-	unsigned long flags;
-
-	spin_lock_irqsave(&obj->active_list_lock, flags);
-	if (!list_empty(&pt->active_list))
-		list_del_init(&pt->active_list);
-	spin_unlock_irqrestore(&obj->active_list_lock, flags);
-
-	spin_lock_irqsave(&obj->child_list_lock, flags);
-	if (!list_empty(&pt->child_list))
-		list_del_init(&pt->child_list);
-
-	spin_unlock_irqrestore(&obj->child_list_lock, flags);
-}
-
 void sync_timeline_signal(struct sync_timeline *obj)
 {
 	unsigned long flags;
 	LIST_HEAD(signaled_pts);
-	struct list_head *pos, *n;
+	struct sync_pt *pt, *next;
 
 	trace_sync_timeline(obj);
 
-	spin_lock_irqsave(&obj->active_list_lock, flags);
-
-	list_for_each_safe(pos, n, &obj->active_list_head) {
-		struct sync_pt *pt =
-			container_of(pos, struct sync_pt, active_list);
+	spin_lock_irqsave(&obj->child_list_lock, flags);
 
-		if (_sync_pt_has_signaled(pt)) {
-			list_del_init(pos);
-			list_add(&pt->signaled_list, &signaled_pts);
-			kref_get(&pt->fence->kref);
-		}
+	list_for_each_entry_safe(pt, next, &obj->active_list_head,
+				 active_list) {
+		if (__fence_is_signaled(&pt->base))
+			list_del(&pt->active_list);
 	}
 
-	spin_unlock_irqrestore(&obj->active_list_lock, flags);
-
-	list_for_each_safe(pos, n, &signaled_pts) {
-		struct sync_pt *pt =
-			container_of(pos, struct sync_pt, signaled_list);
-
-		list_del_init(pos);
-		sync_fence_signal_pt(pt);
-		kref_put(&pt->fence->kref, sync_fence_free);
-	}
+	spin_unlock_irqrestore(&obj->child_list_lock, flags);
 }
 EXPORT_SYMBOL(sync_timeline_signal);
 
-struct sync_pt *sync_pt_create(struct sync_timeline *parent, int size)
+struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size)
 {
+	unsigned long flags;
 	struct sync_pt *pt;
 
 	if (size < sizeof(struct sync_pt))
@@ -180,87 +133,28 @@  struct sync_pt *sync_pt_create(struct sync_timeline *parent, int size)
 	if (pt == NULL)
 		return NULL;
 
+	spin_lock_irqsave(&obj->child_list_lock, flags);
+	sync_timeline_get(obj);
+	__fence_init(&pt->base, &android_fence_ops, &obj->child_list_lock,
+		     obj->context, ++obj->value);
+	list_add_tail(&pt->child_list, &obj->child_list_head);
 	INIT_LIST_HEAD(&pt->active_list);
-	kref_get(&parent->kref);
-	sync_timeline_add_pt(parent, pt);
-
+	spin_unlock_irqrestore(&obj->child_list_lock, flags);
 	return pt;
 }
 EXPORT_SYMBOL(sync_pt_create);
 
 void sync_pt_free(struct sync_pt *pt)
 {
-	if (pt->parent->ops->free_pt)
-		pt->parent->ops->free_pt(pt);
-
-	sync_timeline_remove_pt(pt);
-
-	kref_put(&pt->parent->kref, sync_timeline_free);
-
-	kfree(pt);
+	fence_put(&pt->base);
 }
 EXPORT_SYMBOL(sync_pt_free);
 
-/* call with pt->parent->active_list_lock held */
-static int _sync_pt_has_signaled(struct sync_pt *pt)
-{
-	int old_status = pt->status;
-
-	if (!pt->status)
-		pt->status = pt->parent->ops->has_signaled(pt);
-
-	if (!pt->status && pt->parent->destroyed)
-		pt->status = -ENOENT;
-
-	if (pt->status != old_status)
-		pt->timestamp = ktime_get();
-
-	return pt->status;
-}
-
-static struct sync_pt *sync_pt_dup(struct sync_pt *pt)
-{
-	return pt->parent->ops->dup(pt);
-}
-
-/* Adds a sync pt to the active queue.  Called when added to a fence */
-static void sync_pt_activate(struct sync_pt *pt)
-{
-	struct sync_timeline *obj = pt->parent;
-	unsigned long flags;
-	int err;
-
-	spin_lock_irqsave(&obj->active_list_lock, flags);
-
-	err = _sync_pt_has_signaled(pt);
-	if (err != 0)
-		goto out;
-
-	list_add_tail(&pt->active_list, &obj->active_list_head);
-
-out:
-	spin_unlock_irqrestore(&obj->active_list_lock, flags);
-}
-
-static int sync_fence_release(struct inode *inode, struct file *file);
-static unsigned int sync_fence_poll(struct file *file, poll_table *wait);
-static long sync_fence_ioctl(struct file *file, unsigned int cmd,
-			     unsigned long arg);
-
-
-static const struct file_operations sync_fence_fops = {
-	.release = sync_fence_release,
-	.poll = sync_fence_poll,
-	.unlocked_ioctl = sync_fence_ioctl,
-	.compat_ioctl = sync_fence_ioctl,
-};
-
-static struct sync_fence *sync_fence_alloc(const char *name)
+static struct sync_fence *sync_fence_alloc(int size, const char *name)
 {
 	struct sync_fence *fence;
-	unsigned long flags;
 
-	fence = kzalloc(sizeof(struct sync_fence), GFP_KERNEL);
+	fence = kzalloc(size, GFP_KERNEL);
 	if (fence == NULL)
 		return NULL;
 
@@ -272,16 +166,8 @@  static struct sync_fence *sync_fence_alloc(const char *name)
 	kref_init(&fence->kref);
 	strlcpy(fence->name, name, sizeof(fence->name));
 
-	INIT_LIST_HEAD(&fence->pt_list_head);
-	INIT_LIST_HEAD(&fence->waiter_list_head);
-	spin_lock_init(&fence->waiter_list_lock);
-
 	init_waitqueue_head(&fence->wq);
 
-	spin_lock_irqsave(&sync_fence_list_lock, flags);
-	list_add_tail(&fence->sync_fence_list, &sync_fence_list_head);
-	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
-
 	return fence;
 
 err:
@@ -289,120 +175,42 @@  err:
 	return NULL;
 }
 
-/* TODO: implement a create which takes more that one sync_pt */
-struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+static void fence_check_cb_func(struct fence *f, struct fence_cb *cb)
 {
+	struct sync_fence_cb *check;
 	struct sync_fence *fence;
 
-	if (pt->fence)
-		return NULL;
-
-	fence = sync_fence_alloc(name);
-	if (fence == NULL)
-		return NULL;
-
-	pt->fence = fence;
-	list_add(&pt->pt_list, &fence->pt_list_head);
-	sync_pt_activate(pt);
-
-	/*
-	 * signal the fence in case pt was activated before
-	 * sync_pt_activate(pt) was called
-	 */
-	sync_fence_signal_pt(pt);
+	check = container_of(cb, struct sync_fence_cb, cb);
+	fence = check->fence;
 
-	return fence;
+	if (atomic_dec_and_test(&fence->status))
+		wake_up_all(&fence->wq);
 }
-EXPORT_SYMBOL(sync_fence_create);
-
-static int sync_fence_copy_pts(struct sync_fence *dst, struct sync_fence *src)
-{
-	struct list_head *pos;
-
-	list_for_each(pos, &src->pt_list_head) {
-		struct sync_pt *orig_pt =
-			container_of(pos, struct sync_pt, pt_list);
-		struct sync_pt *new_pt = sync_pt_dup(orig_pt);
 
-		if (new_pt == NULL)
-			return -ENOMEM;
-
-		new_pt->fence = dst;
-		list_add(&new_pt->pt_list, &dst->pt_list_head);
-	}
-
-	return 0;
-}
-
-static int sync_fence_merge_pts(struct sync_fence *dst, struct sync_fence *src)
-{
-	struct list_head *src_pos, *dst_pos, *n;
-
-	list_for_each(src_pos, &src->pt_list_head) {
-		struct sync_pt *src_pt =
-			container_of(src_pos, struct sync_pt, pt_list);
-		bool collapsed = false;
-
-		list_for_each_safe(dst_pos, n, &dst->pt_list_head) {
-			struct sync_pt *dst_pt =
-				container_of(dst_pos, struct sync_pt, pt_list);
-			/* collapse two sync_pts on the same timeline
-			 * to a single sync_pt that will signal at
-			 * the later of the two
-			 */
-			if (dst_pt->parent == src_pt->parent) {
-				if (dst_pt->parent->ops->compare(dst_pt, src_pt)
-						 == -1) {
-					struct sync_pt *new_pt =
-						sync_pt_dup(src_pt);
-					if (new_pt == NULL)
-						return -ENOMEM;
-
-					new_pt->fence = dst;
-					list_replace(&dst_pt->pt_list,
-						     &new_pt->pt_list);
-					sync_pt_free(dst_pt);
-				}
-				collapsed = true;
-				break;
-			}
-		}
-
-		if (!collapsed) {
-			struct sync_pt *new_pt = sync_pt_dup(src_pt);
-
-			if (new_pt == NULL)
-				return -ENOMEM;
-
-			new_pt->fence = dst;
-			list_add(&new_pt->pt_list, &dst->pt_list_head);
-		}
-	}
-
-	return 0;
-}
-
-static void sync_fence_detach_pts(struct sync_fence *fence)
+/* TODO: implement a create which takes more that one sync_pt */
+struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
 {
-	struct list_head *pos, *n;
+	struct sync_fence *fence;
 
-	list_for_each_safe(pos, n, &fence->pt_list_head) {
-		struct sync_pt *pt = container_of(pos, struct sync_pt, pt_list);
+	fence = sync_fence_alloc(offsetof(struct sync_fence, cbs[1]), name);
+	if (fence == NULL)
+		return NULL;
 
-		sync_timeline_remove_pt(pt);
-	}
-}
+	fence->num_fences = 1;
+	atomic_set(&fence->status, 1);
 
-static void sync_fence_free_pts(struct sync_fence *fence)
-{
-	struct list_head *pos, *n;
+	fence_get(&pt->base);
+	fence->cbs[0].sync_pt = &pt->base;
+	fence->cbs[0].fence = fence;
+	if (fence_add_callback(&pt->base, &fence->cbs[0].cb,
+			       fence_check_cb_func))
+		atomic_dec(&fence->status);
 
-	list_for_each_safe(pos, n, &fence->pt_list_head) {
-		struct sync_pt *pt = container_of(pos, struct sync_pt, pt_list);
+	sync_fence_debug_add(fence);
 
-		sync_pt_free(pt);
-	}
+	return fence;
 }
+EXPORT_SYMBOL(sync_fence_create);
 
 struct sync_fence *sync_fence_fdget(int fd)
 {
@@ -434,197 +242,155 @@  void sync_fence_install(struct sync_fence *fence, int fd)
 }
 EXPORT_SYMBOL(sync_fence_install);
 
-static int sync_fence_get_status(struct sync_fence *fence)
+static void sync_fence_add_pt(struct sync_fence *fence,
+			      int *i, struct fence *pt)
 {
-	struct list_head *pos;
-	int status = 1;
-
-	list_for_each(pos, &fence->pt_list_head) {
-		struct sync_pt *pt = container_of(pos, struct sync_pt, pt_list);
-		int pt_status = pt->status;
-
-		if (pt_status < 0) {
-			status = pt_status;
-			break;
-		} else if (status == 1) {
-			status = pt_status;
-		}
-	}
+	fence->cbs[*i].sync_pt = pt;
+	fence->cbs[*i].fence = fence;
 
-	return status;
+	if (!fence_add_callback(pt, &fence->cbs[*i].cb, fence_check_cb_func)) {
+		fence_get(pt);
+		(*i)++;
+	}
 }
 
 struct sync_fence *sync_fence_merge(const char *name,
 				    struct sync_fence *a, struct sync_fence *b)
 {
+	int num_fences = a->num_fences + b->num_fences;
 	struct sync_fence *fence;
-	struct list_head *pos;
-	int err;
+	int i, i_a, i_b;
+	unsigned long size = offsetof(struct sync_fence, cbs[num_fences]);
 
-	fence = sync_fence_alloc(name);
+	fence = sync_fence_alloc(size, name);
 	if (fence == NULL)
 		return NULL;
 
-	err = sync_fence_copy_pts(fence, a);
-	if (err < 0)
-		goto err;
+	atomic_set(&fence->status, num_fences);
 
-	err = sync_fence_merge_pts(fence, b);
-	if (err < 0)
-		goto err;
+	/*
+	 * Assume sync_fence a and b are both ordered and have no
+	 * duplicates with the same context.
+	 *
+	 * If a sync_fence can only be created with sync_fence_merge
+	 * and sync_fence_create, this is a reasonable assumption.
+	 */
+	for (i = i_a = i_b = 0; i_a < a->num_fences && i_b < b->num_fences; ) {
+		struct fence *pt_a = a->cbs[i_a].sync_pt;
+		struct fence *pt_b = b->cbs[i_b].sync_pt;
+
+		if (pt_a->context < pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_a);
+
+			i_a++;
+		} else if (pt_a->context > pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_b);
 
-	list_for_each(pos, &fence->pt_list_head) {
-		struct sync_pt *pt =
-			container_of(pos, struct sync_pt, pt_list);
-		sync_pt_activate(pt);
+			i_b++;
+		} else {
+			if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+				sync_fence_add_pt(fence, &i, pt_a);
+			else
+				sync_fence_add_pt(fence, &i, pt_b);
+
+			i_a++;
+			i_b++;
+		}
 	}
 
-	/*
-	 * signal the fence in case one of it's pts were activated before
-	 * they were activated
-	 */
-	sync_fence_signal_pt(list_first_entry(&fence->pt_list_head,
-					      struct sync_pt,
-					      pt_list));
+	for (; i_a < a->num_fences; i_a++)
+		sync_fence_add_pt(fence, &i, a->cbs[i_a].sync_pt);
 
+	for (; i_b < b->num_fences; i_b++)
+		sync_fence_add_pt(fence, &i, b->cbs[i_b].sync_pt);
+
+	if (num_fences > i)
+		atomic_sub(num_fences - i, &fence->status);
+	fence->num_fences = i;
+
+	sync_fence_debug_add(fence);
 	return fence;
-err:
-	sync_fence_free_pts(fence);
-	kfree(fence);
-	return NULL;
 }
 EXPORT_SYMBOL(sync_fence_merge);
 
-static void sync_fence_signal_pt(struct sync_pt *pt)
+int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
+				 int wake_flags, void *key)
 {
-	LIST_HEAD(signaled_waiters);
-	struct sync_fence *fence = pt->fence;
-	struct list_head *pos;
-	struct list_head *n;
-	unsigned long flags;
-	int status;
-
-	status = sync_fence_get_status(fence);
-
-	spin_lock_irqsave(&fence->waiter_list_lock, flags);
-	/*
-	 * this should protect against two threads racing on the signaled
-	 * false -> true transition
-	 */
-	if (status && !fence->status) {
-		list_for_each_safe(pos, n, &fence->waiter_list_head)
-			list_move(pos, &signaled_waiters);
-
-		fence->status = status;
-	} else {
-		status = 0;
-	}
-	spin_unlock_irqrestore(&fence->waiter_list_lock, flags);
+	struct sync_fence_waiter *wait;
 
-	if (status) {
-		list_for_each_safe(pos, n, &signaled_waiters) {
-			struct sync_fence_waiter *waiter =
-				container_of(pos, struct sync_fence_waiter,
-					     waiter_list);
+	wait = container_of(curr, struct sync_fence_waiter, work);
+	list_del_init(&wait->work.task_list);
 
-			list_del(pos);
-			waiter->callback(fence, waiter);
-		}
-		wake_up(&fence->wq);
-	}
+	wait->callback(wait->work.private, wait);
+	return 1;
 }
 
 int sync_fence_wait_async(struct sync_fence *fence,
 			  struct sync_fence_waiter *waiter)
 {
+	int err = atomic_read(&fence->status);
 	unsigned long flags;
-	int err = 0;
 
-	spin_lock_irqsave(&fence->waiter_list_lock, flags);
+	if (err < 0)
+		return err;
 
-	if (fence->status) {
-		err = fence->status;
-		goto out;
-	}
+	if (!err)
+		return 1;
 
-	list_add_tail(&waiter->waiter_list, &fence->waiter_list_head);
-out:
-	spin_unlock_irqrestore(&fence->waiter_list_lock, flags);
+	init_waitqueue_func_entry(&waiter->work, sync_fence_wake_up_wq);
+	waiter->work.private = fence;
 
-	return err;
+	spin_lock_irqsave(&fence->wq.lock, flags);
+	err = atomic_read(&fence->status);
+	if (err > 0)
+		__add_wait_queue_tail(&fence->wq, &waiter->work);
+	spin_unlock_irqrestore(&fence->wq.lock, flags);
+
+	if (err < 0)
+		return err;
+
+	return !err;
 }
 EXPORT_SYMBOL(sync_fence_wait_async);
 
 int sync_fence_cancel_async(struct sync_fence *fence,
 			     struct sync_fence_waiter *waiter)
 {
-	struct list_head *pos;
-	struct list_head *n;
 	unsigned long flags;
-	int ret = -ENOENT;
+	int ret = 0;
 
-	spin_lock_irqsave(&fence->waiter_list_lock, flags);
-	/*
-	 * Make sure waiter is still in waiter_list because it is possible for
-	 * the waiter to be removed from the list while the callback is still
-	 * pending.
-	 */
-	list_for_each_safe(pos, n, &fence->waiter_list_head) {
-		struct sync_fence_waiter *list_waiter =
-			container_of(pos, struct sync_fence_waiter,
-				     waiter_list);
-		if (list_waiter == waiter) {
-			list_del(pos);
-			ret = 0;
-			break;
-		}
-	}
-	spin_unlock_irqrestore(&fence->waiter_list_lock, flags);
+	spin_lock_irqsave(&fence->wq.lock, flags);
+	if (!list_empty(&waiter->work.task_list))
+		list_del_init(&waiter->work.task_list);
+	else
+		ret = -ENOENT;
+	spin_unlock_irqrestore(&fence->wq.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(sync_fence_cancel_async);
 
-static bool sync_fence_check(struct sync_fence *fence)
-{
-	/*
-	 * Make sure that reads to fence->status are ordered with the
-	 * wait queue event triggering
-	 */
-	smp_rmb();
-	return fence->status != 0;
-}
-
 int sync_fence_wait(struct sync_fence *fence, long timeout)
 {
-	int err = 0;
-	struct sync_pt *pt;
-
-	trace_sync_wait(fence, 1);
-	list_for_each_entry(pt, &fence->pt_list_head, pt_list)
-		trace_sync_pt(pt);
+	long ret;
+	int i;
 
-	if (timeout > 0) {
+	if (timeout < 0)
+		timeout = MAX_SCHEDULE_TIMEOUT;
+	else
 		timeout = msecs_to_jiffies(timeout);
-		err = wait_event_interruptible_timeout(fence->wq,
-						       sync_fence_check(fence),
-						       timeout);
-	} else if (timeout < 0) {
-		err = wait_event_interruptible(fence->wq,
-					       sync_fence_check(fence));
-	}
-	trace_sync_wait(fence, 0);
 
-	if (err < 0)
-		return err;
-
-	if (fence->status < 0) {
-		pr_info("fence error %d on [%p]\n", fence->status, fence);
-		sync_dump();
-		return fence->status;
-	}
+	trace_sync_wait(fence, 1);
+	for (i = 0; i < fence->num_fences; ++i)
+		trace_sync_pt(fence->cbs[i].sync_pt);
+	ret = wait_event_interruptible_timeout(fence->wq,
+					       atomic_read(&fence->status) <= 0,
+					       timeout);
+	trace_sync_wait(fence, 0);
 
-	if (fence->status == 0) {
-		if (timeout > 0) {
+	if (ret < 0)
+		return ret;
+	else if (ret == 0) {
+		if (timeout) {
 			pr_info("fence timeout on [%p] after %dms\n", fence,
 				jiffies_to_msecs(timeout));
 			sync_dump();
@@ -632,15 +398,136 @@  int sync_fence_wait(struct sync_fence *fence, long timeout)
 		return -ETIME;
 	}
 
-	return 0;
+	ret = atomic_read(&fence->status);
+	if (ret) {
+		pr_info("fence error %ld on [%p]\n", ret, fence);
+		sync_dump();
+	}
+	return ret;
 }
 EXPORT_SYMBOL(sync_fence_wait);
 
+static const char *android_fence_get_driver_name(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	return parent->ops->driver_name;
+}
+
+static const char *android_fence_get_timeline_name(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	return parent->name;
+}
+
+static void android_fence_release(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+	unsigned long flags;
+
+	spin_lock_irqsave(fence->lock, flags);
+	list_del(&pt->child_list);
+	if (WARN_ON_ONCE(!list_empty(&pt->active_list)))
+		list_del(&pt->active_list);
+	spin_unlock_irqrestore(fence->lock, flags);
+
+	if (parent->ops->free_pt)
+		parent->ops->free_pt(pt);
+
+	sync_timeline_put(parent);
+	kfree(pt);
+}
+
+static bool android_fence_signaled(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+	int ret;
+
+	ret = parent->ops->has_signaled(pt);
+	if (ret < 0)
+		fence->status = ret;
+	return ret;
+}
+
+static bool android_fence_enable_signaling(struct fence *fence)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (android_fence_signaled(fence))
+		return false;
+
+	list_add_tail(&pt->active_list, &parent->active_list_head);
+	return true;
+}
+
+static int android_fence_fill_driver_data(struct fence *fence,
+					  void *data, int size)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (!parent->ops->fill_driver_data)
+		return 0;
+	return parent->ops->fill_driver_data(pt, data, size);
+}
+
+static void android_fence_value_str(struct fence *fence,
+				    char *str, int size)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (!parent->ops->pt_value_str) {
+		if (size)
+			*str = 0;
+		return;
+	}
+	parent->ops->pt_value_str(pt, str, size);
+}
+
+static void android_fence_timeline_value_str(struct fence *fence,
+					     char *str, int size)
+{
+	struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (!parent->ops->timeline_value_str) {
+		if (size)
+			*str = 0;
+		return;
+	}
+	parent->ops->timeline_value_str(parent, str, size);
+}
+
+static const struct fence_ops android_fence_ops = {
+	.get_driver_name = android_fence_get_driver_name,
+	.get_timeline_name = android_fence_get_timeline_name,
+	.enable_signaling = android_fence_enable_signaling,
+	.signaled = android_fence_signaled,
+	.wait = fence_default_wait,
+	.release = android_fence_release,
+	.fill_driver_data = android_fence_fill_driver_data,
+	.fence_value_str = android_fence_value_str,
+	.timeline_value_str = android_fence_timeline_value_str,
+};
+
 static void sync_fence_free(struct kref *kref)
 {
 	struct sync_fence *fence = container_of(kref, struct sync_fence, kref);
+	int i, status = atomic_read(&fence->status);
 
-	sync_fence_free_pts(fence);
+	for (i = 0; i < fence->num_fences; ++i) {
+		if (status)
+			fence_remove_callback(fence->cbs[i].sync_pt,
+					      &fence->cbs[i].cb);
+		fence_put(fence->cbs[i].sync_pt);
+	}
 
 	kfree(fence);
 }
@@ -648,44 +535,25 @@  static void sync_fence_free(struct kref *kref)
 static int sync_fence_release(struct inode *inode, struct file *file)
 {
 	struct sync_fence *fence = file->private_data;
-	unsigned long flags;
-
-	/*
-	 * We need to remove all ways to access this fence before droping
-	 * our ref.
-	 *
-	 * start with its membership in the global fence list
-	 */
-	spin_lock_irqsave(&sync_fence_list_lock, flags);
-	list_del(&fence->sync_fence_list);
-	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
 
-	/*
-	 * remove its pts from their parents so that sync_timeline_signal()
-	 * can't reference the fence.
-	 */
-	sync_fence_detach_pts(fence);
+	sync_fence_debug_remove(fence);
 
 	kref_put(&fence->kref, sync_fence_free);
-
 	return 0;
 }
 
 static unsigned int sync_fence_poll(struct file *file, poll_table *wait)
 {
 	struct sync_fence *fence = file->private_data;
+	int status;
 
 	poll_wait(file, &fence->wq, wait);
 
-	/*
-	 * Make sure that reads to fence->status are ordered with the
-	 * wait queue event triggering
-	 */
-	smp_rmb();
+	status = atomic_read(&fence->status);
 
-	if (fence->status == 1)
+	if (!status)
 		return POLLIN;
-	else if (fence->status < 0)
+	else if (status < 0)
 		return POLLERR;
 	else
 		return 0;
@@ -750,7 +618,7 @@  err_put_fd:
 	return err;
 }
 
-static int sync_fill_pt_info(struct sync_pt *pt, void *data, int size)
+static int sync_fill_pt_info(struct fence *fence, void *data, int size)
 {
 	struct sync_pt_info *info = data;
 	int ret;
@@ -760,20 +628,24 @@  static int sync_fill_pt_info(struct sync_pt *pt, void *data, int size)
 
 	info->len = sizeof(struct sync_pt_info);
 
-	if (pt->parent->ops->fill_driver_data) {
-		ret = pt->parent->ops->fill_driver_data(pt, info->driver_data,
-							size - sizeof(*info));
+	if (fence->ops->fill_driver_data) {
+		ret = fence->ops->fill_driver_data(fence, info->driver_data,
+						   size - sizeof(*info));
 		if (ret < 0)
 			return ret;
 
 		info->len += ret;
 	}
 
-	strlcpy(info->obj_name, pt->parent->name, sizeof(info->obj_name));
-	strlcpy(info->driver_name, pt->parent->ops->driver_name,
+	strlcpy(info->obj_name, fence->ops->get_timeline_name(fence),
+		sizeof(info->obj_name));
+	strlcpy(info->driver_name, fence->ops->get_driver_name(fence),
 		sizeof(info->driver_name));
-	info->status = pt->status;
-	info->timestamp_ns = ktime_to_ns(pt->timestamp);
+	if (fence_is_signaled(fence))
+		info->status = fence->status >= 0 ? 1 : fence->status;
+	else
+		info->status = 0;
+	info->timestamp_ns = ktime_to_ns(fence->timestamp);
 
 	return info->len;
 }
@@ -782,10 +654,9 @@  static long sync_fence_ioctl_fence_info(struct sync_fence *fence,
 					unsigned long arg)
 {
 	struct sync_fence_info_data *data;
-	struct list_head *pos;
 	__u32 size;
 	__u32 len = 0;
-	int ret;
+	int ret, i;
 
 	if (copy_from_user(&size, (void __user *)arg, sizeof(size)))
 		return -EFAULT;
@@ -801,12 +672,14 @@  static long sync_fence_ioctl_fence_info(struct sync_fence *fence,
 		return -ENOMEM;
 
 	strlcpy(data->name, fence->name, sizeof(data->name));
-	data->status = fence->status;
+	data->status = atomic_read(&fence->status);
+	if (data->status >= 0)
+		data->status = !data->status;
+
 	len = sizeof(struct sync_fence_info_data);
 
-	list_for_each(pos, &fence->pt_list_head) {
-		struct sync_pt *pt =
-			container_of(pos, struct sync_pt, pt_list);
+	for (i = 0; i < fence->num_fences; ++i) {
+		struct fence *pt = fence->cbs[i].sync_pt;
 
 		ret = sync_fill_pt_info(pt, (u8 *)data + len, size - len);
 
@@ -833,7 +706,6 @@  static long sync_fence_ioctl(struct file *file, unsigned int cmd,
 			     unsigned long arg)
 {
 	struct sync_fence *fence = file->private_data;
-
 	switch (cmd) {
 	case SYNC_IOC_WAIT:
 		return sync_fence_ioctl_wait(fence, arg);
@@ -849,181 +721,10 @@  static long sync_fence_ioctl(struct file *file, unsigned int cmd,
 	}
 }
 
-#ifdef CONFIG_DEBUG_FS
-static const char *sync_status_str(int status)
-{
-	if (status > 0)
-		return "signaled";
-	else if (status == 0)
-		return "active";
-	else
-		return "error";
-}
-
-static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
-{
-	int status = pt->status;
-
-	seq_printf(s, "  %s%spt %s",
-		   fence ? pt->parent->name : "",
-		   fence ? "_" : "",
-		   sync_status_str(status));
-	if (pt->status) {
-		struct timeval tv = ktime_to_timeval(pt->timestamp);
-
-		seq_printf(s, "@%ld.%06ld", tv.tv_sec, tv.tv_usec);
-	}
-
-	if (pt->parent->ops->timeline_value_str &&
-	    pt->parent->ops->pt_value_str) {
-		char value[64];
-
-		pt->parent->ops->pt_value_str(pt, value, sizeof(value));
-		seq_printf(s, ": %s", value);
-		if (fence) {
-			pt->parent->ops->timeline_value_str(pt->parent, value,
-						    sizeof(value));
-			seq_printf(s, " / %s", value);
-		}
-	} else if (pt->parent->ops->print_pt) {
-		seq_puts(s, ": ");
-		pt->parent->ops->print_pt(s, pt);
-	}
-
-	seq_puts(s, "\n");
-}
-
-static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj)
-{
-	struct list_head *pos;
-	unsigned long flags;
-
-	seq_printf(s, "%s %s", obj->name, obj->ops->driver_name);
-
-	if (obj->ops->timeline_value_str) {
-		char value[64];
-
-		obj->ops->timeline_value_str(obj, value, sizeof(value));
-		seq_printf(s, ": %s", value);
-	} else if (obj->ops->print_obj) {
-		seq_puts(s, ": ");
-		obj->ops->print_obj(s, obj);
-	}
-
-	seq_puts(s, "\n");
-
-	spin_lock_irqsave(&obj->child_list_lock, flags);
-	list_for_each(pos, &obj->child_list_head) {
-		struct sync_pt *pt =
-			container_of(pos, struct sync_pt, child_list);
-		sync_print_pt(s, pt, false);
-	}
-	spin_unlock_irqrestore(&obj->child_list_lock, flags);
-}
-
-static void sync_print_fence(struct seq_file *s, struct sync_fence *fence)
-{
-	struct list_head *pos;
-	unsigned long flags;
-
-	seq_printf(s, "[%p] %s: %s\n", fence, fence->name,
-		   sync_status_str(fence->status));
-
-	list_for_each(pos, &fence->pt_list_head) {
-		struct sync_pt *pt =
-			container_of(pos, struct sync_pt, pt_list);
-		sync_print_pt(s, pt, true);
-	}
-
-	spin_lock_irqsave(&fence->waiter_list_lock, flags);
-	list_for_each(pos, &fence->waiter_list_head) {
-		struct sync_fence_waiter *waiter =
-			container_of(pos, struct sync_fence_waiter,
-				     waiter_list);
-
-		seq_printf(s, "waiter %pF\n", waiter->callback);
-	}
-	spin_unlock_irqrestore(&fence->waiter_list_lock, flags);
-}
-
-static int sync_debugfs_show(struct seq_file *s, void *unused)
-{
-	unsigned long flags;
-	struct list_head *pos;
-
-	seq_puts(s, "objs:\n--------------\n");
-
-	spin_lock_irqsave(&sync_timeline_list_lock, flags);
-	list_for_each(pos, &sync_timeline_list_head) {
-		struct sync_timeline *obj =
-			container_of(pos, struct sync_timeline,
-				     sync_timeline_list);
-
-		sync_print_obj(s, obj);
-		seq_puts(s, "\n");
-	}
-	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
-
-	seq_puts(s, "fences:\n--------------\n");
-
-	spin_lock_irqsave(&sync_fence_list_lock, flags);
-	list_for_each(pos, &sync_fence_list_head) {
-		struct sync_fence *fence =
-			container_of(pos, struct sync_fence, sync_fence_list);
-
-		sync_print_fence(s, fence);
-		seq_puts(s, "\n");
-	}
-	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
-	return 0;
-}
-
-static int sync_debugfs_open(struct inode *inode, struct file *file)
-{
-	return single_open(file, sync_debugfs_show, inode->i_private);
-}
-
-static const struct file_operations sync_debugfs_fops = {
-	.open           = sync_debugfs_open,
-	.read           = seq_read,
-	.llseek         = seq_lseek,
-	.release        = single_release,
+static const struct file_operations sync_fence_fops = {
+	.release = sync_fence_release,
+	.poll = sync_fence_poll,
+	.unlocked_ioctl = sync_fence_ioctl,
+	.compat_ioctl = sync_fence_ioctl,
 };
 
-static __init int sync_debugfs_init(void)
-{
-	debugfs_create_file("sync", S_IRUGO, NULL, NULL, &sync_debugfs_fops);
-	return 0;
-}
-late_initcall(sync_debugfs_init);
-
-#define DUMP_CHUNK 256
-static char sync_dump_buf[64 * 1024];
-static void sync_dump(void)
-{
-	struct seq_file s = {
-		.buf = sync_dump_buf,
-		.size = sizeof(sync_dump_buf) - 1,
-	};
-	int i;
-
-	sync_debugfs_show(&s, NULL);
-
-	for (i = 0; i < s.count; i += DUMP_CHUNK) {
-		if ((s.count - i) > DUMP_CHUNK) {
-			char c = s.buf[i + DUMP_CHUNK];
-
-			s.buf[i + DUMP_CHUNK] = 0;
-			pr_cont("%s", s.buf + i);
-			s.buf[i + DUMP_CHUNK] = c;
-		} else {
-			s.buf[s.count] = 0;
-			pr_cont("%s", s.buf + i);
-		}
-	}
-}
-#else
-static void sync_dump(void)
-{
-}
-#endif
diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index eaf57cccf626..66b0f431f63e 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -19,6 +19,7 @@ 
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/wait.h>
+#include <linux/fence.h>
 
 #include "uapi/sync.h"
 
@@ -40,8 +41,6 @@  struct sync_fence;
  *			 -1 if a will signal before b
  * @free_pt:		called before sync_pt is freed
  * @release_obj:	called before sync_timeline is freed
- * @print_obj:		deprecated
- * @print_pt:		deprecated
  * @fill_driver_data:	write implementation specific driver data to data.
  *			  should return an error if there is not enough room
  *			  as specified by size.  This information is returned
@@ -67,13 +66,6 @@  struct sync_timeline_ops {
 	/* optional */
 	void (*release_obj)(struct sync_timeline *sync_timeline);
 
-	/* deprecated */
-	void (*print_obj)(struct seq_file *s,
-			  struct sync_timeline *sync_timeline);
-
-	/* deprecated */
-	void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);
-
 	/* optional */
 	int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);
 
@@ -104,19 +96,21 @@  struct sync_timeline {
 
 	/* protected by child_list_lock */
 	bool			destroyed;
+	int			context, value;
 
 	struct list_head	child_list_head;
 	spinlock_t		child_list_lock;
 
 	struct list_head	active_list_head;
-	spinlock_t		active_list_lock;
 
+#ifdef CONFIG_DEBUG_FS
 	struct list_head	sync_timeline_list;
+#endif
 };
 
 /**
  * struct sync_pt - sync point
- * @parent:		sync_timeline to which this sync_pt belongs
+ * @fence:		base fence class
  * @child_list:		membership in sync_timeline.child_list_head
  * @active_list:	membership in sync_timeline.active_list_head
  * @signaled_list:	membership in temporary signaled_list on stack
@@ -127,19 +121,22 @@  struct sync_timeline {
  *			  signaled or error.
  */
 struct sync_pt {
-	struct sync_timeline		*parent;
-	struct list_head	child_list;
+	struct fence base;
 
+	struct list_head	child_list;
 	struct list_head	active_list;
-	struct list_head	signaled_list;
-
-	struct sync_fence	*fence;
-	struct list_head	pt_list;
+};
 
-	/* protected by parent->active_list_lock */
-	int			status;
+static inline struct sync_timeline *sync_pt_parent(struct sync_pt *pt)
+{
+	return container_of(pt->base.lock, struct sync_timeline,
+			    child_list_lock);
+}
 
-	ktime_t			timestamp;
+struct sync_fence_cb {
+	struct fence_cb cb;
+	struct fence *sync_pt;
+	struct sync_fence *fence;
 };
 
 /**
@@ -149,9 +146,7 @@  struct sync_pt {
  * @name:		name of sync_fence.  Useful for debugging
  * @pt_list_head:	list of sync_pts in the fence.  immutable once fence
  *			  is created
- * @waiter_list_head:	list of asynchronous waiters on this fence
- * @waiter_list_lock:	lock protecting @waiter_list_head and @status
- * @status:		1: signaled, 0:active, <0: error
+ * @status:		0: signaled, >0:active, <0: error
  *
  * @wq:			wait queue for fence signaling
  * @sync_fence_list:	membership in global fence list
@@ -160,17 +155,15 @@  struct sync_fence {
 	struct file		*file;
 	struct kref		kref;
 	char			name[32];
-
-	/* this list is immutable once the fence is created */
-	struct list_head	pt_list_head;
-
-	struct list_head	waiter_list_head;
-	spinlock_t		waiter_list_lock; /* also protects status */
-	int			status;
+#ifdef CONFIG_DEBUG_FS
+	struct list_head	sync_fence_list;
+#endif
+	int num_fences;
 
 	wait_queue_head_t	wq;
+	atomic_t		status;
 
-	struct list_head	sync_fence_list;
+	struct sync_fence_cb	cbs[];
 };
 
 struct sync_fence_waiter;
@@ -184,14 +177,14 @@  typedef void (*sync_callback_t)(struct sync_fence *fence,
  * @callback_data:	pointer to pass to @callback
  */
 struct sync_fence_waiter {
-	struct list_head	waiter_list;
-
-	sync_callback_t		callback;
+	wait_queue_t work;
+	sync_callback_t callback;
 };
 
 static inline void sync_fence_waiter_init(struct sync_fence_waiter *waiter,
 					  sync_callback_t callback)
 {
+	INIT_LIST_HEAD(&waiter->work.task_list);
 	waiter->callback = callback;
 }
 
@@ -341,4 +334,22 @@  int sync_fence_cancel_async(struct sync_fence *fence,
  */
 int sync_fence_wait(struct sync_fence *fence, long timeout);
 
+#ifdef CONFIG_DEBUG_FS
+
+extern void sync_timeline_debug_add(struct sync_timeline *obj);
+extern void sync_timeline_debug_remove(struct sync_timeline *obj);
+extern void sync_fence_debug_add(struct sync_fence *fence);
+extern void sync_fence_debug_remove(struct sync_fence *fence);
+extern void sync_dump(void);
+
+#else
+# define sync_timeline_debug_add(obj)
+# define sync_timeline_debug_remove(obj)
+# define sync_fence_debug_add(fence)
+# define sync_fence_debug_remove(fence)
+# define sync_dump()
+#endif
+int sync_fence_wake_up_wq(wait_queue_t *curr, unsigned mode,
+				 int wake_flags, void *key);
+
 #endif /* _LINUX_SYNC_H */
diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c
new file mode 100644
index 000000000000..00efaf9d8aa2
--- /dev/null
+++ b/drivers/staging/android/sync_debug.c
@@ -0,0 +1,247 @@ 
+/*
+ * drivers/base/sync.c
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/debugfs.h>
+#include <linux/export.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/anon_inodes.h>
+#include "sync.h"
+
+#ifdef CONFIG_DEBUG_FS
+
+static LIST_HEAD(sync_timeline_list_head);
+static DEFINE_SPINLOCK(sync_timeline_list_lock);
+static LIST_HEAD(sync_fence_list_head);
+static DEFINE_SPINLOCK(sync_fence_list_lock);
+
+void sync_timeline_debug_add(struct sync_timeline *obj)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_timeline_list_lock, flags);
+	list_add_tail(&obj->sync_timeline_list, &sync_timeline_list_head);
+	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+}
+
+void sync_timeline_debug_remove(struct sync_timeline *obj)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_timeline_list_lock, flags);
+	list_del(&obj->sync_timeline_list);
+	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+}
+
+void sync_fence_debug_add(struct sync_fence *fence)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_fence_list_lock, flags);
+	list_add_tail(&fence->sync_fence_list, &sync_fence_list_head);
+	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
+}
+
+void sync_fence_debug_remove(struct sync_fence *fence)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&sync_fence_list_lock, flags);
+	list_del(&fence->sync_fence_list);
+	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
+}
+
+static const char *sync_status_str(int status)
+{
+	if (status == 0)
+		return "signaled";
+	else if (status > 0)
+		return "active";
+	else
+		return "error";
+}
+
+static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
+{
+	int status = 1;
+	struct sync_timeline *parent = sync_pt_parent(pt);
+
+	if (__fence_is_signaled(&pt->base))
+		status = pt->base.status;
+
+	seq_printf(s, "  %s%spt %s",
+		   fence ? parent->name : "",
+		   fence ? "_" : "",
+		   sync_status_str(status));
+
+	if (status <= 0) {
+		struct timeval tv = ktime_to_timeval(pt->base.timestamp);
+		seq_printf(s, "@%ld.%06ld", tv.tv_sec, tv.tv_usec);
+	}
+
+	if (parent->ops->timeline_value_str &&
+	    parent->ops->pt_value_str) {
+		char value[64];
+		parent->ops->pt_value_str(pt, value, sizeof(value));
+		seq_printf(s, ": %s", value);
+		if (fence) {
+			parent->ops->timeline_value_str(parent, value,
+						    sizeof(value));
+			seq_printf(s, " / %s", value);
+		}
+	}
+
+	seq_puts(s, "\n");
+}
+
+static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj)
+{
+	struct list_head *pos;
+	unsigned long flags;
+
+	seq_printf(s, "%s %s", obj->name, obj->ops->driver_name);
+
+	if (obj->ops->timeline_value_str) {
+		char value[64];
+		obj->ops->timeline_value_str(obj, value, sizeof(value));
+		seq_printf(s, ": %s", value);
+	}
+
+	seq_puts(s, "\n");
+
+	spin_lock_irqsave(&obj->child_list_lock, flags);
+	list_for_each(pos, &obj->child_list_head) {
+		struct sync_pt *pt =
+			container_of(pos, struct sync_pt, child_list);
+		sync_print_pt(s, pt, false);
+	}
+	spin_unlock_irqrestore(&obj->child_list_lock, flags);
+}
+
+static void sync_print_fence(struct seq_file *s, struct sync_fence *fence)
+{
+	wait_queue_t *pos;
+	unsigned long flags;
+	int i;
+
+	seq_printf(s, "[%p] %s: %s\n", fence, fence->name,
+		   sync_status_str(atomic_read(&fence->status)));
+
+	for (i = 0; i < fence->num_fences; ++i) {
+		struct sync_pt *pt =
+			container_of(fence->cbs[i].sync_pt,
+				     struct sync_pt, base);
+
+		sync_print_pt(s, pt, true);
+	}
+
+	spin_lock_irqsave(&fence->wq.lock, flags);
+	list_for_each_entry(pos, &fence->wq.task_list, task_list) {
+		struct sync_fence_waiter *waiter;
+
+		if (pos->func != &sync_fence_wake_up_wq)
+			continue;
+
+		waiter = container_of(pos, struct sync_fence_waiter, work);
+
+		seq_printf(s, "waiter %pF\n", waiter->callback);
+	}
+	spin_unlock_irqrestore(&fence->wq.lock, flags);
+}
+
+static int sync_debugfs_show(struct seq_file *s, void *unused)
+{
+	unsigned long flags;
+	struct list_head *pos;
+
+	seq_puts(s, "objs:\n--------------\n");
+
+	spin_lock_irqsave(&sync_timeline_list_lock, flags);
+	list_for_each(pos, &sync_timeline_list_head) {
+		struct sync_timeline *obj =
+			container_of(pos, struct sync_timeline,
+				     sync_timeline_list);
+
+		sync_print_obj(s, obj);
+		seq_puts(s, "\n");
+	}
+	spin_unlock_irqrestore(&sync_timeline_list_lock, flags);
+
+	seq_puts(s, "fences:\n--------------\n");
+
+	spin_lock_irqsave(&sync_fence_list_lock, flags);
+	list_for_each(pos, &sync_fence_list_head) {
+		struct sync_fence *fence =
+			container_of(pos, struct sync_fence, sync_fence_list);
+
+		sync_print_fence(s, fence);
+		seq_puts(s, "\n");
+	}
+	spin_unlock_irqrestore(&sync_fence_list_lock, flags);
+	return 0;
+}
+
+static int sync_debugfs_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, sync_debugfs_show, inode->i_private);
+}
+
+static const struct file_operations sync_debugfs_fops = {
+	.open           = sync_debugfs_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = single_release,
+};
+
+static __init int sync_debugfs_init(void)
+{
+	debugfs_create_file("sync", S_IRUGO, NULL, NULL, &sync_debugfs_fops);
+	return 0;
+}
+late_initcall(sync_debugfs_init);
+
+#define DUMP_CHUNK 256
+static char sync_dump_buf[64 * 1024];
+void sync_dump(void)
+{
+	struct seq_file s = {
+		.buf = sync_dump_buf,
+		.size = sizeof(sync_dump_buf) - 1,
+	};
+	int i;
+
+	sync_debugfs_show(&s, NULL);
+
+	for (i = 0; i < s.count; i += DUMP_CHUNK) {
+		if ((s.count - i) > DUMP_CHUNK) {
+			char c = s.buf[i + DUMP_CHUNK];
+			s.buf[i + DUMP_CHUNK] = 0;
+			pr_cont("%s", s.buf + i);
+			s.buf[i + DUMP_CHUNK] = c;
+		} else {
+			s.buf[s.count] = 0;
+			pr_cont("%s", s.buf + i);
+		}
+	}
+}
+
+#endif
diff --git a/drivers/staging/android/trace/sync.h b/drivers/staging/android/trace/sync.h
index 95462359ba57..77edb977a7bf 100644
--- a/drivers/staging/android/trace/sync.h
+++ b/drivers/staging/android/trace/sync.h
@@ -45,7 +45,7 @@  TRACE_EVENT(sync_wait,
 
 	TP_fast_assign(
 			__assign_str(name, fence->name);
-			__entry->status = fence->status;
+			__entry->status = atomic_read(&fence->status);
 			__entry->begin = begin;
 	),
 
@@ -54,19 +54,19 @@  TRACE_EVENT(sync_wait,
 );
 
 TRACE_EVENT(sync_pt,
-	TP_PROTO(struct sync_pt *pt),
+	TP_PROTO(struct fence *pt),
 
 	TP_ARGS(pt),
 
 	TP_STRUCT__entry(
-		__string(timeline, pt->parent->name)
+		__string(timeline, pt->ops->get_timeline_name(pt))
 		__array(char, value, 32)
 	),
 
 	TP_fast_assign(
-		__assign_str(timeline, pt->parent->name);
-		if (pt->parent->ops->pt_value_str) {
-			pt->parent->ops->pt_value_str(pt, __entry->value,
+		__assign_str(timeline, pt->ops->get_timeline_name(pt));
+		if (pt->ops->fence_value_str) {
+			pt->ops->fence_value_str(pt, __entry->value,
 							sizeof(__entry->value));
 		} else {
 			__entry->value[0] = '\0';