Stack Performance of TCP/IP Stack on different combinations
Data rates for audio transmission on different implementations
For choosing the right design for escher 2 board, i did some recherches. Most of this assumprtions and information is theoretically, not proofed by now.
Needed throughput for AUDIO
Needing low latency down to 4 samples, we cannot compress audio very much. The effort for low latency compression sample by sample could be a more efficient coding scheme than raw, but with them we cannot accomplish high compression rates. So we asume for further inspection raw audio data uncompressed.
Additional overhead is added with Ethernet/IP frames and pauses between packets so on our previous experience with escher1, 10Mbit/sEthernet, using 10bit/Byte is a good assumption.
Maximum theoretical number of channels for different transmission rates as a rough calculation for further decisions:
Assumption: Sample-rate 44100, 16 Bit/Sample 1 Byte = 8Bit needs 10Bit transmission with overhead
Trans. rate | channels | with overhead |
---|---|---|
MBit/sec | numbers/max | numbers/max |
10 | 14,17 / 14 | 11,33 / 11 |
100 | 141,7 / 141 | 113,3 / 113 |
1000 | 1417 / 1417 | 1133 / 1133 |
Also on TCP/IP or half duplex we need more bandwidth; also we need small package sizes for low latency.
Use case 3D Ambisonics 4th order:
As the target use case we build up an installation with up to 64 speakers=channels, with the possibility to query them with microphone signals near them.
For 4th order we need 25 channels Ambisonics channels. On direct to speaker solutions solution we could need up to 64 channels on oversampled Ambisonics solutions (see Ambisonics theory for more info). Feedback should be at least on a 8 microphones at a time.
If we want to mix different UDP Streams on the card we have to multiply by the factor of Mixing channels. Using overhead of 20% which means 10Bits per Byte and estimated numbers we get a fill the standards. Very good behaviour is reached, if we do not flood the ethernet beyond 33% from our experiences:
channels | resolution | samplerate | datarate | 10MBit/s | 100MBit/s | 1GBit/s |
---|---|---|---|---|---|---|
8 | 12 bit | 32000Hz | 3 MBit/s | 30 % | 3 % | 0,3 % |
8 | 16 bit | 44100Hz | 6 MBit/s | 60 % | 6 % | 0,6 % |
25 | 16 bit | 44100Hz | 17 MBit/s | 170 % | 17 % | 1,7 % |
25x4 | 16 bit | 44100Hz | 71 MBit/s | 710 % | 71 % | 7,1 % |
64 | 16 bit | 44100Hz | 45 MBit/s | 450 % | 45 % | 4,5 % |
64x4 | 16 bit | 44100Hz | 180 MBit/s | 1800 % | 180 % | 18,0 % |
8 | 24 bit | 48000Hz | 9 MBit/s | 90 % | 9 % | 0,9 % |
25 | 24 bit | 48000Hz | 28 MBit/s | 280 % | 28 % | 2,8 % |
25x4 | 24 bit | 48000Hz | 115 MBit/s | 1150 % | 115 % | 11,5 % |
64 | 24 bit | 48000Hz | 74 MBit/s | 740 % | 74 % | 7,4 % |
64x4 | 24 bit | 48000Hz | 295 MBit/s | 2950 % | 295 % | 29,5 % |
For our Ambisonics use case we need at least a 100MBit/sec. Anyway some (cheap) networks doesn't reach more then 80% throughput; less than 50% on halfduplex, often seen with cheap GBit-networks. Since Ambisonics channels are broadcasted in local network, we can play with a lot of speakers which can calculate their signal from the bus, which can be more than channels in the bus.
Another format, the 32bit float, which will double the 16Bit, is not used very often in embedded systems without floating point unit, so we do not look at them here. Anyway in Installations not going beyond 100dBA, we do not need much more than 80dB Signal/Noise ratio, if signals are scaled well.
On the other side we have to look if our device can process this data. So GBit/s networks are mostly only used with FPGAs chips or similar, with integrated TCP/IP stack. Here we concentrate on the 100MBit/sec, which can be done with modern microcontrollers, but often not used to the limit of 100MBit/sec as shown below.
Throughput of TCP/IP Stack
It is hard to get reliable numbers, since packet drops are different on different timings within the system, so a continuous data stream is assumed. Also reaction time to incoming packages to be processed are sometimes slower, than the full channel bandwidth needed, which is ok as long not all data has to be used, dropping not needed packages for the micro-controller.
From the help file from Microchip:
as Microchip TCP/IP Stack 5.36.2 - July 14, 2011
I just collected the relevant data, since TCP/IP is used in smallest mode 200 Bytes (mostly common with OSC [OSC]) and UDP has a constant load.
The values are kBytes/sec from the stack, if 100MBit/sec is 10MByte/sec, then we can see how much in percent we reach of this channel for UDP.
uP | ethernet | MIPS | Interface | TCPIP | UDP | 100MB/sec | comment |
---|---|---|---|---|---|---|---|
dsPIC33FJ256GP710 | ENC28J60 | 40 | SPI 8MHz | 75 | 258 | 25,8 % | escher1.0 10Mbit/s |
dsPIC33FJ256GP710 | ENC624J600 | 40 | SPI 8MHz | 94 | 554 | 5,5 % | like in escher 1.0 |
PIC32MX795F512L | ENC624J600 | 80 | SPI 13MHz | 154 | 784 | 7,8 % | |
dsPIC33FJ256GP710 | ENC624J600 | 40 | PSP Mode | 202 | 2038 | 20,4 % | looks good |
PIC32MX360F512L | ENC624J600 | 80 | PSP Mode | 200 | 2071 | 20,7 % | |
PIC32MX795F512L | PHY | 80 | Internal | 393 | 8449 | 84,5 % | |
dsPIC33FJ256GP710 | MRF24WB0M | 40 | SPI, 8MHz | 8 | 48 | WLAN CHIP | |
PIC32MX795F512L | MRF24WB0M | 80 | SPI,20MHz | 9 | 53 | WLAN CHIP |
There have been some rumours, that alternative stacks has more throughput, like the lwip [lwip], but we has not found any numbers for the microchip products. Will do some experiments later, and correct this assumption. But we can see in the tables, that the Interface technology between chipsis also limiting a lot, not only the stacks:
Estimation of throughput:
Using an overhead of 10Bits/Byte (8:2) What we can transmit via 10MBit/sec 1MByte/sec, via 100MBit/sec 10MBytes/sec.
If we look at the table above, these numbers are never reached, except with PIC32 internal 100MBit/sec and Phy-Interface. So we have to drop packages if the transmission buffer is full and unprocessed. For high performance if we use only Ethernet for data transmission, without TCP/IP Stack, I do not know if we can reach the limit, but it looks more likely.
The Interface with SPI between the chips has a maximum at 8MHz, which is 800KByte/sec, so only with parallel bus PSP-Interface the optimal output is reached. On modern chips I saw now ~13Mhz SPIs, but this will not dramatically enforce the performance.
Conclusion
It is really hard to decide, what is the best without doing experiments. From the point of DSP calculation power, the dsPIC is much better, from the point of IP Stack, where DSP function are not used, the PIC32 performs double. But anyway using the PIC32 with Phy-ICs instead of ENC624J600, could be another solution, but not the cheapest: Ideally a board with PIC32 and an dsPICEP would be optimal on performance, but we have additional development time, since have to programm and debug both at the same time. Without a good preparation and filtering of data also the transmission between PIC32 and dsPIC could be a bottle-hole.
Notes
1 The decision will be discussed in an new document, follow up.
2 revised the document on 14.8.2011, since some unclear statements in it.
[lwip] | http://www.sics.se/~adam/lwip/ |
[OSC] | open sound control, see http://opensoundcontrol.org/ |