Audio over ethernet is a large space. Folk have tried to skin this CAT (groan) in many ways and on many network layers including layer-1 (custom MAC), layer-2 (raw ethernet frames), layer-3/4 (UDP/RTP/etc.).
For my purposes I'm mainly interested in a layer 2/3 implementation that can run on >= 400 MHz bare-metal with a cycle or two left over for happy-joy-joy DSP fun-times.
Smuggling audio frames over the Internet is not a main focus for me because there's already a most excellent solution to that problem which is easy to implement on even smollish microcontrollers.
So, of the ~50 or so implementations out there, for the purpose of this discussion, we can probably reduce it down to the two (and maaaaybe a 1/2) protocols that have gotten some commercial traction over the years.
1: Dante/Rednet (Layer 3, IP Packets)
Dante is 100% proprietary Australian tech spawned out of an undead Motorola research lab and suspiciously sensible government grants in the late naughties. Owned by a company called Audinate who have historically licenced it to high end commercial audio gear manufacturers for uncomfortably large sums of money.
Until AVB started getting traction it was the standard for doing ethernet audio.
Implementations have historically been FPGA-based because easy license enforcement, guaranteed data rates, the market wasn't big enough to justify an ASIC and the margins were of the type where BOM cost never really features.
"Rednet" is the OEM version of Dante that Focusrite ships with their higher-end interfaces. I think they were the first "popular" 3rd-party manufacturer to license it.
Lately, Audinate have also started aggressively marketing their Dante "Embedded" Platform that's basically an i.MX 8M SBC running Linux and their protocol stack. Probably a response to AVB's growing success.
2: AVB (Layer 2, raw ethernet frames)
The "open" standard for ethernetted professional audio. Hell, you can even grab the sources on hte GitHubs!
If you can afford an IEEE subscription the full stack is:
- IEEE 802.1AS: Timing and Synchronization for Time-Sensitive Applications (gPTP);
- IEEE 802.1Qav: Forwarding and Queuing for Time-Sensitive Streams (FQTSS);
- IEEE 802.1Qat: Stream Reservation Protocol (SRP);
- IEEE 802.1BA: Audio Video Bridging (AVB) Systems;
- IEEE 1722: Layer 2 Transport Protocol for Time Sensitive Applications (AV Transport Protocol, AVTP); and
- IEEE 1722.1: Device Discovery, Enumeration, Connection Management and Control Protocol (AVDECC).
Aside: check out what the excellent folk at tweedegolf.nl have been up to: tweedegolf/statime !
In practical terms however adding AVB support to your project means adding a XMOS transputer (remember those?) to your BOM and a proprietary firmware license to your per-product cost.
The demo is free though and the devkit is not prohibitively expensive.
You also get USB Audio Class 2 support basically out of the box and their xC programming language is 100% pure awesome for when you need some of that late-80's/early-90's high performance retro-computing action in your life.
One thing about AVB is that, unlike Dante, you can't just run it over any old ethernet switch or NIC.
Specifically, your switch and/or NIC needs to support:
- IEEE 802.1Q: Stream Reservation Protocol (SRP) and Traffic Shaping (FQTSS)
- IEEE 802.1AS: Time Synchronization using Generalized Precision Time Protocol (gPTP) on each AVB enabled Ethernet port
For plugging into your desktop, you'll need something like an Intel i210 or just buy a Mac.
For the switch, unless you're made of money you'll still pay a premium and maybe wait for the supply chain crisis to subside.
2 1/2: AES67 (Layer 3/4)
Mainly of interest because there are long-standing promises going back over a decade now that it will allow folk to interoperate AVB/Dante/Ravenna/etc. gear all on the same network.
There may be some techical and probably more commercial lessons to be learnt from its ongoing lack of success in unifying the world and bringing us all to a state of distributed signals nirvana.
The market certainly seems to be heading in another direction.
Older network audio standards have explored both sync (hijacking one twisted pair of the CAT 5 cable for the clock signal!) and async clock synchronization.
For Dante and AVB clocking is isochronous, with clock recovery performed by comparison of packet times with a IEEE 1588 PTP clock shared across the network.
- Dante:
>= 84 ΞΌs
- AVB:
<= 2ms
I think there's some kind of life-lesson here if you think about how the protocol designed by FPGA engineers specifies the latency bounds vs how the protocol designed mainly by network engineers does. :laugh:
Limit is defined by the speed of the Ethernet link.
For 100Mbps it's around 96 channels of 24-bit sample @ 48kHz.
i.e. usually more than you need but you can always go 1Gbps for those days the entire Berliner Philharmoniker shows up for a session.
So this is pretty much the status quo. There are two big professional standards for audio over ethernet. One proprietary, the other open.
They're not that different. They're both solving the same problems in many of the same ways.
From a usability point of view Dante is marginally easier to work with than AVB.
It's a bit less "plug&play" (Layer-3 means there are IP addresses etc. that need to be configured) but:
a) the configuration tools are a lot nicer (it's easier to write tools for protocols encapsulated in IP packets than raw ethernet frames)
b) once configured, you don't have to fight it every step of the way every time you reboot the studio. (oh yes, your auto-discovery protocols etc. are also all raw ethernet frames!)
There used to be a big price difference between the solutions but eventually they'll equilibrium at around the same price point.
Thing is... neither of them are fun any of us are going to have unless we're building a piece of gear that's going to retail for at least US$ 995 and has an economic model for shipping 10K units over the lifetime of the product.
It's also no fun the moment (and this is an entire blog post of its own) you want to send anything except audio from point A to B. Like control signals. Or OSC. Or DMX. (but maybe MIDI 1.0 if you do your research)
What about the rest of us?
Why can't I supplement my BitWig session with a bunch of Eurorack modules featuring full recall and the ability to arbitrarly route my audio & (audio-rate) control signals between software and hardware?
Why doesn't my guitar distortion pedal just have a POE ethernet jack that does away with batteries and lets me stick it anywhere in the signal chain without having to fiddle with patch leads?
Why do I still get phone calls from, otherwise massively capable, friends asking me why their laptop is suddenly the clock master for the entire studio and not their Ultra-low Jitter hand-crafted US$5 000 oven-controlled [1] oscillator?
I love soldering 1/4" balanced jacks as much as the next person but at some point in life you just have to ask yourself: What's the point of all this gear if we're spending more time trying to make it talk to itself than making music with it? Why didn't we just stop with the acoustic guitar/piano/djembe/violin/didgeridoo/trombone? [2]
In an industry dominated by generational advances why do we keep blocking the smaller instrument makers from participating in a multi-generational process of refining these amazing technologies we keep coming up with by erecting stupid barriers to participation?
Why is it easier (and cheaper) to route low-latency audio across the Internet than across my studio floor?
Why does it cost US$ 1 899 to plug a microphone into my digital audio network? [3]
Why has it become so easy to add MIDI or even USB support to a new design that it's almost an afterthought but audio over ethernet somehow still commands a premium that keeps it out of the hands of anyone but high-end audio interface manufacturers?
Notes:
[1] You can't make this shit up...
[2] Actually, this is still my plan B.
[3] For reference, a really nice way to plug your microphone into an analogue audio network only costs US$ 219.99. If you can't afford that, I'm sure Uli will sell you something for US49.99
I once had a dream.
It's not one of those "if we could all just agree on how to pronounce 'tomato'" dreams.
Standards don't get traction because they're "good" or "solve all the problems" they get traction because a) they solve an important problem, b) they're really easy to implement and c) dirt cheap to manufacture.
So, like MIDI 1.0, it's a modest dream:
-
We'd like a common specification for sending and receiving time-stamped, multi-channel audio-rate signals over local ethernet with <= 10ms latency. [1]
-
We'd like it very much if everyone could take a nice long break from trying to tell us what our signals can be. [2]
-
A reasonably capable engineer should be able to come up with a working implementation from scratch over a month of weekends given only the specification for reference. [3]
-
For hardware implementations, we'd prefer to not not add more than ~US$20 to our BOM cost. The interface components should be generally available, nor require specialized manufacturing equipment. [4]
-
While implementations can be licensed under any terms, the specification and a reference implementation should be freely available and not incur any licensing costs. [5]
Notes:
[1] Roughly 96 channels of uncompressed 24-bit words @ 48kHz over 100Mbps is the going rate nowadays. It's a good rate for all kinds of things.
[2] The ability to assign a human-readable name to a channel may already be more semantic content than many musician would prefer their equipment to possess. It's my responsibility if I feel like plugging my guitar output into my LFO's CV input or send a packet across the network that only one bespoke MAX patch can parse. Leave me the hell alone. You're not the boss of me.
[3] Okay, I suck. That's how long it took me to implement USB Midi Class from scratch. I know more than one person reading this who could have done it in one weekend :laugh:
[4] Thumbsuck. I figure this is around what it cost to build a MIDI interface in the 80's. The most expensive bit was probably ~$10 for the Z80/6502/8080. It wasn't cheap, but it was cheap enough that no one had to agonize about adding MIDI to their product. Nowadays the cost is basically the price of the DIN socket. So even if we used a dedicated microcontroller and added the cost of a PHY and a crappy RJ-45 jack we could still come in about the same these days? Much less if your design already has a beefy processor? The little manufacturing knowledge I have is waaaayy out of date so please correct me!
[5] This ended up working out rather well for both the original MIDI implementors and everyone who followed.
The digital realm is fluid in a way that the material world will never be. Where our muscles and the strength of our materials would otherwise force us to call it a day the digital makes us think we can make it do just one more thing...
So as engineers we each try to solve every problem with our creations.
But we have limits even in the digital realm so not one of us ever manages to complete a creation that solves all the problems.
We may join forces and form commitees to persuade other engineers of how it should be done but too soon enough water flows under the bridge that it becomes impossible to add anything new without changing what has gone before.
What saves us in the physical world are our limitations. We always need to stop before we're done. Nothing we build is easily adapted to another goal. We're working with tools in ways that could never have been planned. We figure out clever ways to duct-tape things together and get the job done. Maybe we even learn to leave wide, smooth areas on our own creations to make it easier for the tape to go on.
This is why I have hope that a digital audio network built from the dumbest of pipes with all the nuance and subtletly of a jack cable could make things just a little easier.
Finally, there's one other emerging standard that doesn't get nearly enough credit or attention and which delivers truly remarkable performance.
Check out AAOC !
You're preaching to the choir here. It's absolutely absurd how AoIP isn't just the standard. It's completely revolutionary IMO
"Why can't I supplement my BitWig session with a bunch of Eurorack modules featuring full recall and the ability to arbitrarly route my audio & (audio-rate) control signals between software and hardware?
Why doesn't my guitar distortion pedal just have a POE ethernet jack that does away with batteries and lets me stick it anywhere in the signal chain without having to fiddle with patch leads?"
Right?
It's funny like long before I was even aware of AES67, I had sketched out a similar concept as a 'wouldn't this make alot of sense?' idea. Then I put it to rest, only sort of stumbling on AES67 a year and a half ago maybe?
Anyways.
It's been super frustrating, on a whim bought a MOTU Ultralite AVB and it was kind of shit, frankly poor D/A conversion and that didn't get me excited about working around the idea of trying to make AVB + Windows happen with an RTXOS
I settled on a Dante setup and I quite like it but I've been itching for more. For the last month I've been messing around with AES67 in earnest and really like where things are going.
If you're having problems getting any of it up and running, or are worried about needing to buy specific hardware, I've sort of worked around most of it while still maintaining compatibility with reference implementations.
Most of the stuff is scattered around github, but there's a few things that ARENT well documented about clocking and generating a good master clock (that doesn't just work for random AES67 implementations but also works with established, more picky ones,) , and I feel as though there's still a TON to learn.
Let me know what you're working with and I can try to help you out! I seriously did not have to buy much hardware (and indeed could have gotten away with less but again, poor documentation and needing to try shit out)