Keyword - dav1d

Entries feed - Comments feed

22 June 2020

dav1d 0.7.1

Release 0.7.1

We just did a small release of dav1d called 0.7.1, just one month after 0.7.0.

It is a quick release that fixes a couple of bugs and that does more optimizations on ARM32 and SSE2.

ARM 32-bit

After spending a lot of time on ARM64 during 0.5.0 and 0.7.0, we're spending some times for the people who are stuck with older phones, still running on 32-bit platforms.

With these new optimizations, we're 28% faster than before when decoding the Chimera sample on a Snapdragon 835.

The result is that we're only 20%-25% slower in 32bit compared to 64bit, which is quite a feast.

Compared to gav1, we're now 2x-2.4x faster, in 32bit mode.

dav1d vs gav1 ARM32

When comparing with numerous threading options, on a Galaxy S5, from 2014, we can see the following:

dav1d vs gav1 ARM32

With dav1d 0.7.1, we're able to decode the AV1 Chimera 1080p sample at more than 24 fps on a Galaxy S5 from 2014 on Android (32-bit)! Reaching 24fps does not even use the full CPU!
Once again, we see that the gav1 library has issues with threading.

Desktop

On the desktop, we did some SSE2 optimizations, for the people who don't have SSSE3 CPU, which should see quite a bump in decoding.

We also did optimizations for the scaled mode, in AVX2. (This is used only by bitstreams that use the spatial scalability feature).

Conclusion

See you soon, for more speed improvements!

PS: thanks again to Nathan for the graphs.

21 May 2020

dav1d 0.7.0: mobile focus

tl;dr

Dav1d new release:

  • 10% faster on Intel CPUs with 25% less RAM, assembly finished for 8bit
  • ARM64 assembly mostly done for 10/12bit in addition to 8bit
  • dav1d is twice as fast as gav1 on ARM CPU and 4 times faster for 10b
  • 1080p AV1 decodable real-time with 2 little-core on Pixel 1

Continue reading...

29 October 2019

dav1d 0.5.1: more speed!

A few reminders about dav1d

If you follow this blog, you should know everything about dav1d.

The VideoLAN, VLC and FFmpeg communities have been working on a new AV1 decoder, dav1d, to be the best and fastest decoder.

0.5.1

2 weeks ago, we released dav1d 0.5.0.

With 0.5.0, we showed that we were between 3x and 5x faster than aomdec on desktop CPUs, including 32bit CPUs, and between 2.5x and 3x faster on Android and iOS 64bit phones.

We even showed we were a lot faster than the new gav1 decoder on Android 64bit.

However, there were 2 cases where dav1d was not the best:

  • desktop without SSSE3 capabilities, aka very old CPUs, in single-thread,
  • Android phones in 32bits, in single-thread.

0.5.1 is a small release focused on those cases.

0.5.1 gets up to 50% speed improvements on SSE2 CPUs, which should make dav1d faster than aomdec in all desktop cases, from C to AVX-2.

At the same time, 0.5.1 gets up to 41% speed improvements on ARMv7 CPUs, which makes dav1d at least as fast as gav1.

Of course, in multi-thread, we were already faster :)

So, yes, dav1d is now faster than all the other decoders in all cases.

14 October 2019

dav1d 0.5.0 release: fastest!

tl;dr: dav1d is getting even faster

If you want a quick summary of this post, about our AV1 decoder:

  • dav1d is still ready for production, and getting used more and more,
  • dav1d has a speed gain of 12% on ARM64 mobile CPUs,
  • a gain of 22%-40% on SSSE3 processors
  • and another gain of 4-7% on AVX-2 processors, which was already quite fast.

Read the following for more details...

Continue reading...

3 May 2019

dav1d 0.3.0 release: even faster!

tl;dr: dav1d another fast release

If you want a quick summary of this post, about our AV1 decoder:

  • dav1d is still ready for production, and getting used more,
  • dav1d has a speed gain of 12% on ARM64 mobile CPUs,
  • a gain of 15%-25% on SSSE3 processors
  • and even a 5% gain on AVX-2 processors, which was already quite fast.

Read the following for more details...

Continue reading...

13 March 2019

dav1d shifts up a gear : 0.2 is out!

tl;dr: dav1d has its second release

If you want a quick summary of this post, about our AV1 decoder:

  • dav1d is really ready for production,
  • dav1d has impressive benchmarks on ARM devices,
  • dav1d is now fast on 32-bit desktop processors (SSSE3).

Read the following for more details...

A few reminders about dav1d

If you follow this blog, you should know everything about dav1d.

AV1 is a new video codec by the Alliance for Open Media, composed of most of the important Web companies (Google, Facebook, Netflix, Amazon, Microsoft, Mozilla...). AV1 has the potential to be up to 20% better than the HEVC codec, but the patents license is totally free, while HEVC patents licenses are insanely high and very confusing.

The VideoLAN, VLC and FFmpeg communities have started to work on a new decoder, sponsored by the Alliance for Open Media, in order to create the reference optimized decoder for AV1.

Second Release

Today, we release the second version of dav1d, called 0.2.1, Antelope.
You can now safely use the decoder on all platforms, with excellent performance.

For the first release, we showed impressive benchmarks for AVX-2 processors, with up to 5x speedups compared to the reference decoder.

In this release, the focus has been toward ARM devices (32-bit and 64-bit) and desktop processors that did not support AVX-2.

It is important to know that the ARM and SSSE3 optimizations are not finished yet. You should expect more performance in the future.

ARM devices

For the ARM devices, we've been doing both ARMv7 and ARMv8 acceleration. We've been testing on iOS, Windows and Android to be sure that it works fine on all OSes.

On ARMv8, we achieve between 150% and 200% of the speed of aomdec:

On ARMv7, we achieve up to 400% on the SnapDragon 410:

It's interesting to see that dav1d is faster in ARMv7 mode than the reference decoder in ARMv8 mode on the same machine.

iOS and iPhones

The playback on iOS is quite important, since those are the fastest ARM devices, and quite widespread:

Depending on the samples, we have achieve 1080p at 75fps on Summer sample, and 40fps on more complex samples, like Chimera.

Desktop

For the desktop, we focused on SSSE3 optimizations, because they should cover 98% active of the desktop processors.

We also did optimizations for both 32-bit and 64-bit architectures, and not only 64bits, as we did for AVX-2.

In multi-thread scenarios, we show between 2x and 3x gains compared to aomdec:

Get it

You can get the tarball on our FTP: dav1d 0.2.1.

You can get the code and report issues on our gitlab project.

You can also join the project, or sponsor our work, by contacting me :)

Conclusion

Dav1d 0.2 is now faster than aomdec on all the 4 important architectures (x86/SSSE3, x64/AVX-2, ARMv7, ARMv8).

The speedups we see goes from x2 and x5, and on ARM devices, we are now approaching 1080p60 in software.

We're going to continue acceleration work on SSSE3 and ARM devices, in the next few releases.

- page 1 of 2