antoinevg/digital_audio_networks.md

Created August 19, 2022 23:03

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/antoinevg/e9efa3bdc5acabee6d92aae2bdc11e15.js"></script>
Save antoinevg/e9efa3bdc5acabee6d92aae2bdc11e15 to your computer and use it in GitHub Desktop.

Raw

Professional Audio Over Ethernet Redux

Audio over ethernet is a large space. Folk have tried to skin this CAT (groan) in many ways and on many network layers including layer-1 (custom MAC), layer-2 (raw ethernet frames), layer-3/4 (UDP/RTP/etc.).

For my purposes I'm mainly interested in a layer 2/3 implementation that can run on >= 400 MHz bare-metal with a cycle or two left over for happy-joy-joy DSP fun-times.

Smuggling audio frames over the Internet is not a main focus for me because there's already a most excellent solution to that problem which is easy to implement on even smollish microcontrollers.

So, of the ~50 or so implementations out there, for the purpose of this discussion, we can probably reduce it down to the two (and maaaaybe a 1/2) protocols that have gotten some commercial traction over the years.

The state of the art

1: Dante/Rednet (Layer 3, IP Packets)

Dante is 100% proprietary Australian tech spawned out of an undead Motorola research lab and suspiciously sensible government grants in the late naughties. Owned by a company called Audinate who have historically licenced it to high end commercial audio gear manufacturers for uncomfortably large sums of money.

Until AVB started getting traction it was the standard for doing ethernet audio.

Implementations have historically been FPGA-based because easy license enforcement, guaranteed data rates, the market wasn't big enough to justify an ASIC and the margins were of the type where BOM cost never really features.

"Rednet" is the OEM version of Dante that Focusrite ships with their higher-end interfaces. I think they were the first "popular" 3rd-party manufacturer to license it.

Lately, Audinate have also started aggressively marketing their Dante "Embedded" Platform that's basically an i.MX 8M SBC running Linux and their protocol stack. Probably a response to AVB's growing success.

2: AVB (Layer 2, raw ethernet frames)

The "open" standard for ethernetted professional audio. Hell, you can even grab the sources on hte GitHubs!

If you can afford an IEEE subscription the full stack is:

IEEE 802.1AS: Timing and Synchronization for Time-Sensitive Applications (gPTP);
IEEE 802.1Qav: Forwarding and Queuing for Time-Sensitive Streams (FQTSS);
IEEE 802.1Qat: Stream Reservation Protocol (SRP);
IEEE 802.1BA: Audio Video Bridging (AVB) Systems;
IEEE 1722: Layer 2 Transport Protocol for Time Sensitive Applications (AV Transport Protocol, AVTP); and
IEEE 1722.1: Device Discovery, Enumeration, Connection Management and Control Protocol (AVDECC).

Aside: check out what the excellent folk at tweedegolf.nl have been up to: tweedegolf/statime !

In practical terms however adding AVB support to your project means adding a XMOS transputer (remember those?) to your BOM and a proprietary firmware license to your per-product cost.

The demo is free though and the devkit is not prohibitively expensive.

You also get USB Audio Class 2 support basically out of the box and their xC programming language is 100% pure awesome for when you need some of that late-80's/early-90's high performance retro-computing action in your life.

One thing about AVB is that, unlike Dante, you can't just run it over any old ethernet switch or NIC.

Specifically, your switch and/or NIC needs to support:

IEEE 802.1Q: Stream Reservation Protocol (SRP) and Traffic Shaping (FQTSS)
IEEE 802.1AS: Time Synchronization using Generalized Precision Time Protocol (gPTP) on each AVB enabled Ethernet port

For plugging into your desktop, you'll need something like an Intel i210 or just buy a Mac.

For the switch, unless you're made of money you'll still pay a premium and maybe wait for the supply chain crisis to subside.

2 1/2: AES67 (Layer 3/4)

Mainly of interest because there are long-standing promises going back over a decade now that it will allow folk to interoperate AVB/Dante/Ravenna/etc. gear all on the same network.

There may be some techical and probably more commercial lessons to be learnt from its ongoing lack of success in unifying the world and bringing us all to a state of distributed signals nirvana.

The market certainly seems to be heading in another direction.

Clock

Older network audio standards have explored both sync (hijacking one twisted pair of the CAT 5 cable for the clock signal!) and async clock synchronization.

For Dante and AVB clocking is isochronous, with clock recovery performed by comparison of packet times with a IEEE 1588 PTP clock shared across the network.

Latency

Dante: >= 84 μs
AVB: <= 2ms

I think there's some kind of life-lesson here if you think about how the protocol designed by FPGA engineers specifies the latency bounds vs how the protocol designed mainly by network engineers does. :laugh:

Bandwidth

Limit is defined by the speed of the Ethernet link.

For 100Mbps it's around 96 channels of 24-bit sample @ 48kHz.

i.e. usually more than you need but you can always go 1Gbps for those days the entire Berliner Philharmoniker shows up for a session.

The art of the state

So this is pretty much the status quo. There are two big professional standards for audio over ethernet. One proprietary, the other open.

They're not that different. They're both solving the same problems in many of the same ways.

From a usability point of view Dante is marginally easier to work with than AVB.

It's a bit less "plug&play" (Layer-3 means there are IP addresses etc. that need to be configured) but:

a) the configuration tools are a lot nicer (it's easier to write tools for protocols encapsulated in IP packets than raw ethernet frames)

b) once configured, you don't have to fight it every step of the way every time you reboot the studio. (oh yes, your auto-discovery protocols etc. are also all raw ethernet frames!)

There used to be a big price difference between the solutions but eventually they'll equilibrium at around the same price point.

Thing is... neither of them are fun any of us are going to have unless we're building a piece of gear that's going to retail for at least US$ 995 and has an economic model for shipping 10K units over the lifetime of the product.

It's also no fun the moment (and this is an entire blog post of its own) you want to send anything except audio from point A to B. Like control signals. Or OSC. Or DMX. (but maybe MIDI 1.0 if you do your research)

I'm living a Kafkan nightmare

What about the rest of us?

Why can't I supplement my BitWig session with a bunch of Eurorack modules featuring full recall and the ability to arbitrarly route my audio & (audio-rate) control signals between software and hardware?

Why doesn't my guitar distortion pedal just have a POE ethernet jack that does away with batteries and lets me stick it anywhere in the signal chain without having to fiddle with patch leads?

Why do I still get phone calls from, otherwise massively capable, friends asking me why their laptop is suddenly the clock master for the entire studio and not their Ultra-low Jitter hand-crafted US$5 000 oven-controlled [1] oscillator?

I love soldering 1/4" balanced jacks as much as the next person but at some point in life you just have to ask yourself: What's the point of all this gear if we're spending more time trying to make it talk to itself than making music with it? Why didn't we just stop with the acoustic guitar/piano/djembe/violin/didgeridoo/trombone? [2]

In an industry dominated by generational advances why do we keep blocking the smaller instrument makers from participating in a multi-generational process of refining these amazing technologies we keep coming up with by erecting stupid barriers to participation?

Why is it easier (and cheaper) to route low-latency audio across the Internet than across my studio floor?

Why does it cost US$ 1 899 to plug a microphone into my digital audio network? [3]

Why has it become so easy to add MIDI or even USB support to a new design that it's almost an afterthought but audio over ethernet somehow still commands a premium that keeps it out of the hands of anyone but high-end audio interface manufacturers?

Notes:

[1] You can't make this shit up...

[2] Actually, this is still my plan B.

[3] For reference, a really nice way to plug your microphone into an analogue audio network only costs US$ 219.99. If you can't afford that, I'm sure Uli will sell you something for US49.99

I had a dream

I once had a dream.

It's not one of those "if we could all just agree on how to pronounce 'tomato'" dreams.

Standards don't get traction because they're "good" or "solve all the problems" they get traction because a) they solve an important problem, b) they're really easy to implement and c) dirt cheap to manufacture.

So, like MIDI 1.0, it's a modest dream:

We'd like a common specification for sending and receiving time-stamped, multi-channel audio-rate signals over local ethernet with <= 10ms latency. [1]
We'd like it very much if everyone could take a nice long break from trying to tell us what our signals can be. [2]
A reasonably capable engineer should be able to come up with a working implementation from scratch over a month of weekends given only the specification for reference. [3]
For hardware implementations, we'd prefer to not not add more than ~US$20 to our BOM cost. The interface components should be generally available, nor require specialized manufacturing equipment. [4]
While implementations can be licensed under any terms, the specification and a reference implementation should be freely available and not incur any licensing costs. [5]

Notes:

[1] Roughly 96 channels of uncompressed 24-bit words @ 48kHz over 100Mbps is the going rate nowadays. It's a good rate for all kinds of things.

[2] The ability to assign a human-readable name to a channel may already be more semantic content than many musician would prefer their equipment to possess. It's my responsibility if I feel like plugging my guitar output into my LFO's CV input or send a packet across the network that only one bespoke MAX patch can parse. Leave me the hell alone. You're not the boss of me.

[3] Okay, I suck. That's how long it took me to implement USB Midi Class from scratch. I know more than one person reading this who could have done it in one weekend :laugh:

[4] Thumbsuck. I figure this is around what it cost to build a MIDI interface in the 80's. The most expensive bit was probably ~$10 for the Z80/6502/8080. It wasn't cheap, but it was cheap enough that no one had to agonize about adding MIDI to their product. Nowadays the cost is basically the price of the DIN socket. So even if we used a dedicated microcontroller and added the cost of a PHY and a crappy RJ-45 jack we could still come in about the same these days? Much less if your design already has a beefy processor? The little manufacturing knowledge I have is waaaayy out of date so please correct me!

[5] This ended up working out rather well for both the original MIDI implementors and everyone who followed.

Hope

The digital realm is fluid in a way that the material world will never be. Where our muscles and the strength of our materials would otherwise force us to call it a day the digital makes us think we can make it do just one more thing...

So as engineers we each try to solve every problem with our creations.

But we have limits even in the digital realm so not one of us ever manages to complete a creation that solves all the problems.

We may join forces and form commitees to persuade other engineers of how it should be done but too soon enough water flows under the bridge that it becomes impossible to add anything new without changing what has gone before.

What saves us in the physical world are our limitations. We always need to stop before we're done. Nothing we build is easily adapted to another goal. We're working with tools in ways that could never have been planned. We figure out clever ways to duct-tape things together and get the job done. Maybe we even learn to leave wide, smooth areas on our own creations to make it easier for the tape to go on.

This is why I have hope that a digital audio network built from the dumbest of pipes with all the nuance and subtletly of a jack cable could make things just a little easier.

Postscript

Finally, there's one other emerging standard that doesn't get nearly enough credit or attention and which delivers truly remarkable performance.

Check out AAOC !

Play-AV commented Dec 26, 2022

You're preaching to the choir here. It's absolutely absurd how AoIP isn't just the standard. It's completely revolutionary IMO

"Why can't I supplement my BitWig session with a bunch of Eurorack modules featuring full recall and the ability to arbitrarly route my audio & (audio-rate) control signals between software and hardware?

Why doesn't my guitar distortion pedal just have a POE ethernet jack that does away with batteries and lets me stick it anywhere in the signal chain without having to fiddle with patch leads?"

Right?

It's funny like long before I was even aware of AES67, I had sketched out a similar concept as a 'wouldn't this make alot of sense?' idea. Then I put it to rest, only sort of stumbling on AES67 a year and a half ago maybe?
Anyways.
It's been super frustrating, on a whim bought a MOTU Ultralite AVB and it was kind of shit, frankly poor D/A conversion and that didn't get me excited about working around the idea of trying to make AVB + Windows happen with an RTXOS

I settled on a Dante setup and I quite like it but I've been itching for more. For the last month I've been messing around with AES67 in earnest and really like where things are going.
If you're having problems getting any of it up and running, or are worried about needing to buy specific hardware, I've sort of worked around most of it while still maintaining compatibility with reference implementations.
Most of the stuff is scattered around github, but there's a few things that ARENT well documented about clocking and generating a good master clock (that doesn't just work for random AES67 implementations but also works with established, more picky ones,) , and I feel as though there's still a TON to learn.

Let me know what you're working with and I can try to help you out! I seriously did not have to buy much hardware (and indeed could have gotten away with less but again, poor documentation and needing to try shit out)

Author

antoinevg commented Feb 19, 2023

Finally got a moment to reply :-)

In my head I've been bouncing between these for Ethernet transport:

AVB
A custom protocol
AES67

AVB isn't utterly horrible but the need to either buy short-supply AVB switches or make do with point-to-point connections using a software NTP implementation pretty much disqualifies it for most folk just wanting to make music in their smol studio.

Most of my time so far has gone to custom protocol research simply because it's easy but the price you pay is lack of interoperability with gear that is already network capable. One plus though is the potential for non-Ethernet physical transports that could reach lower latencies e.g. https://www.analog.com/en/products/MAX22502E.html (tx @jamesmunns !)

AES67, honestly, intimidates the crap out of me but if - as you say - it's not quite as hairy as it appears I should probably spend a bit of time investigating.

I'd love to hear what your hardware setup is looking like and what you'd recommend for a minimum viable AES67 testbed!

Play-AV commented Feb 20, 2023 •

edited

Loading

Finally got a moment to reply :-)

In my head I've been bouncing between these for Ethernet transport:

AVB

A custom protocol

AES67

AVB isn't utterly horrible but the need to either buy short-supply AVB switches or make do with point-to-point connections using a software NTP implementation pretty much disqualifies it for most folk just wanting to make music in their smol studio.

Most of my time so far has gone to custom protocol research simply because it's easy but the price you pay is lack of interoperability with gear that is already network capable. One plus though is the potential for non-Ethernet physical transports that could reach lower latencies e.g. https://www.analog.com/en/products/MAX22502E.html (tx @jamesmunns !)

AES67, honestly, intimidates the crap out of me but if - as you say - it's not quite as hairy as it appears I should probably spend a bit of time investigating.

I'd love to hear what your hardware setup is looking like and what you'd recommend for a minimum viable AES67 testbed!

So AVB is out for me. I can't really justify what it does or what it brings to the table. AES67, specifically Ravenna has far more appeal to me.
Beyond the obvious lower requirements for switching (though I do recommend a decent Cisco switch), if you're into more 'audiophile' transport, Ravenna technically supports DSD (and frankly, you could probably modify it to stream encoded audio too since it's just AES3 IIRC).

Currently my AES67 setup uses no off the shelf AES67 hardware.

The basic ingredients you'll need for an AES67 network are a, GM (your source of network time), a source system and an endpoint system.
Super easy, I'm sure you already know about all this.

So you're looking at 3 'computers'. Now, these don't have to be expensive, high performance things. I have a few sources running on fanless x86 machines, and my primary endpoint for the AES67 network is a fanless x86 machine too.

The GM however, is a little tricky. You can buy an AES67 'device' and use it's GM, or you can generate your own. In an ideal world, the system you run your GM on, has a GNSS time reference. You probably don't have that, and that's okay. The next closest thing is a system with a NIC that supports PTP hardware timestamping. Intel makes a number of widely available NICs that work. I've mostly been using i210 ones. I use ptp4l to generate a GM (you do need to tweak the settings a bit depending on the system) and it's 'good enough'. I still really need to work on some nagging issues with it but I don't really have the time right now.

Now if you're in a Linux environment for your source and your endpoint, this next step is pretty easy.
The Merging drivers work. However, their proprietary Butler kind of sucks.

https://github.com/bondagit/aes67-linux-daemon
This is very good and also contains a huge amount of setup info to walk you through getting things up and running.

Both your systems are running this, and the device / system generating the GM is on.

If you check the web ui on either system, they should have locked to the GM, if not your GM isn't 'configured' right, or is somehow invisible on the network to your other machines.

Now, on the source system, send some audio into the new 'Ravenna' ALSA device. Go into the Web-UI on that system, and create a "Source".

On the endpoint, you're going to 'receive' this source. Go to it's Web-UI, navigate to sinks. Add a sink, on the resulting popup dialog check "use sdp", the source stream should show up in the drop down.

If you want to actually 'listen' to this source, we can use alsaloop on the endpoint to send the data from the Ravenna device to your chosen 'real' device.

Obviously, popping between a bunch of different WebUIs isn't ideal.

I setup my own little web interface that 'automates' the boring stuff. It sets up the source devices (some running merging's driver, some running other in house things I setup, and there's also a commercial driver in the mix), and lets me essentially, feed my monitoring setup any of the streams on the network, from any of my source systems in a more elegant manner. It handles the alsaloop stuff, optional dsp, intelligent downmixing based on channel layout etc behind the scenes so I don't need to do a bunch of steps. You can probably hack your own thing together too since the linked AES67 daemon has a full REST server that's super easy to understand.

I use Dante for my main high end 5.1 monitoring system, and AES67 for headphones at my desk. I'm pretty satisfied with the AES67 setup, it gets a solid 5+ hours of use a day sometimes.

Addendum:
My primary source is not a linux system, it's Windows. There's a number of options, both commercial and open source for getting audio out of it, into the AES67 network. If you check my profile, you can see the conversation I've been having about AES67-Sender-Enhanced and some of it's issues with Windows that I can't pinpoint. The workaround I have when using said script was modifying the resync interval to 100s. This drastically cut down on a weird issue with a slight dropout on specific resyncs (not all of them). With this, and some prioritization on the OS side (you should setup proper QoS and also push the process priority), out of every 10 resyncs, maybe 1 or 2 causes a tiny 'dropout'. This is pretty manageable, especially if we're talking about playback / non critical monitoring.
The Merging reference WASAPI driver is...... something. It's great when it works but it's quite temperamental due to how it has to 'replace' your existing network driver. It's liable to get 'fixed' by Windows, meaning you have to reinstall the driver. I also experienced a number of system locks (and some BSODs) related to it on fresh systems that weren't really doing anything else. YMMV. Some people are absolutely using it without issue. If you have a legit Merging audio device, they have a much better implementation (supporting ASIO) you can use.

Author

antoinevg commented Feb 23, 2023

💓 thank you so much for sharing your expertise! Will revert after I catch up a little!

Play-AV commented Feb 25, 2023 •

edited

Loading

💓 thank you so much for sharing your expertise! Will revert after I catch up a little!

No worries :) I'm glad I can help.

It should be relatively easy for you to sort this out and get it working in your environment, at least as a cursory exploration.
This stuff is so stupidly cool, especially when you start looking at it on a 'hardware level'. Essentially, as I understand it, using AES67 allows you to send audio data directly out to i2s and using the PTP clocking information, derive a now synchronized audio clock for the actual D/A A/D. Where this is really clever is that depending on your buffering, an incorrect audio 'packet' can be retransmitted, unlike typical USB audio devices.

If you have any questions just ask, you seem like a smart guy so I have faith!

The real dream is a simple, affordable hardware D/A based around an ES9038 that does AES67 directly. I'm using a Topping Dm7 over USB as an endpoint for Dante (VSC + Via) and man, for like $700 CAD it's the best sounding multichannel solution D/A I've played with tbh.

I'm quite tempted to order one of these Aliexpress 2ch AES67 'boxes' to play with. Will report in on how it works if I do. At the very least it'll give me an idea of what a PTP GM is 'supposed' to look like and operate to see if I'm doing anything wrong with my time appliance.
There's a few 32ch Dante units on there too, the pricing isn't ridiculous but my concern is that like my dumb audiophile brain won't like the converters or the output stage. 🙄

Play-AV commented Feb 28, 2023

Just a little update. Ordered the box off Aliexpress. No clue what the heck I'm actually going to get but we'll see. They sell bare boards at just absurdly, stupidly low prices (compared to units from more 'established' companies) so this might be fun if they actually do something.

Felix-Hamcat commented Jan 11, 2025

Thanks @antoinevg it was delightful to read you.
And @Play-AV how did the AliExpress boards turned out ? I'm curious