Stack Performance of TCP/IP Stack on different combinations

Some thoughts on the TCPIP Stack and the microcotroller chip selection.

Data rates for audio transmission on different implementations

For choosing the right design for escher 2 board, i did some recherches. Most of this assumprtions and information is theoretically, not proofed by now.

Needed throughput for AUDIO

Needing low latency down to 4 samples, we cannot compress audio very much. The effort for low latency compression sample by sample could be a more efficient coding scheme than raw, but with them we cannot accomplish high compression rates. So we asume for further inspection raw audio data uncompressed.

Additional overhead is added with Ethernet/IP frames and pauses between packets so on our previous experience with escher1, 10Mbit/sEthernet, using 10bit/Byte is a good assumption.

Maximum theoretical number of channels for different transmission rates as a rough calculation for further decisions:

Assumption:
Sample-rate 44100, 16 Bit/Sample
1 Byte = 8Bit needs 10Bit transmission with overhead
Trans. rate channels with overhead
MBit/sec numbers/max numbers/max
10 14,17 / 14 11,33 / 11
100 141,7 / 141 113,3 / 113
1000 1417 / 1417 1133 / 1133

Also on TCP/IP or half duplex we need more bandwidth; also we need small package sizes for low latency.

Use case 3D Ambisonics 4th order:

As the target use case we build up an installation with up to 64 speakers=channels, with the possibility to query them with microphone signals near them.

For 4th order we need 25 channels Ambisonics channels. On direct to speaker solutions solution we could need up to 64 channels on oversampled Ambisonics solutions (see Ambisonics theory for more info). Feedback should be at least on a 8 microphones at a time.

If we want to mix different UDP Streams on the card we have to multiply by the factor of Mixing channels. Using overhead of 20% which means 10Bits per Byte and estimated numbers we get a fill the standards. Very good behaviour is reached, if we do not flood the ethernet beyond 33% from our experiences:

channels resolution samplerate datarate 10MBit/s 100MBit/s 1GBit/s
8 12 bit 32000Hz 3 MBit/s 30 % 3 % 0,3 %
8 16 bit 44100Hz 6 MBit/s 60 % 6 % 0,6 %
25 16 bit 44100Hz 17 MBit/s 170 % 17 % 1,7 %
25x4 16 bit 44100Hz 71 MBit/s 710 % 71 % 7,1 %
64 16 bit 44100Hz 45 MBit/s 450 % 45 % 4,5 %
64x4 16 bit 44100Hz 180 MBit/s 1800 % 180 % 18,0 %
8 24 bit 48000Hz 9 MBit/s 90 % 9 % 0,9 %
25 24 bit 48000Hz 28 MBit/s 280 % 28 % 2,8 %
25x4 24 bit 48000Hz 115 MBit/s 1150 % 115 % 11,5 %
64 24 bit 48000Hz 74 MBit/s 740 % 74 % 7,4 %
64x4 24 bit 48000Hz 295 MBit/s 2950 % 295 % 29,5 %

For our Ambisonics use case we need at least a 100MBit/sec. Anyway some (cheap) networks doesn't reach more then 80% throughput; less than 50% on halfduplex, often seen with cheap GBit-networks. Since Ambisonics channels are broadcasted in local network, we can play with a lot of speakers which can calculate their signal from the bus, which can be more than channels in the bus.

Another format, the 32bit float, which will double the 16Bit, is not used very often in embedded systems without floating point unit, so we do not look at them here. Anyway in Installations not going beyond 100dBA, we do not need much more than 80dB Signal/Noise ratio, if signals are scaled well.

On the other side we have to look if our device can process this data. So GBit/s networks are mostly only used with FPGAs chips or similar, with integrated TCP/IP stack. Here we concentrate on the 100MBit/sec, which can be done with modern microcontrollers, but often not used to the limit of 100MBit/sec as shown below.

Throughput of TCP/IP Stack

It is hard to get reliable numbers, since packet drops are different on different timings within the system, so a continuous data stream is assumed. Also reaction time to incoming packages to be processed are sometimes slower, than the full channel bandwidth needed, which is ok as long not all data has to be used, dropping not needed packages for the micro-controller.

From the help file from Microchip:

as Microchip TCP/IP Stack 5.36.2 - July 14, 2011

I just collected the relevant data, since TCP/IP is used in smallest mode 200 Bytes (mostly common with OSC [OSC]) and UDP has a constant load.

The values are kBytes/sec from the stack, if 100MBit/sec is 10MByte/sec, then we can see how much in percent we reach of this channel for UDP.

uP ethernet MIPS Interface TCPIP UDP 100MB/sec comment
dsPIC33FJ256GP710 ENC28J60 40 SPI 8MHz 75 258 25,8 % escher1.0 10Mbit/s
dsPIC33FJ256GP710 ENC624J600 40 SPI 8MHz 94 554 5,5 % like in escher 1.0
PIC32MX795F512L ENC624J600 80 SPI 13MHz 154 784 7,8 %  
dsPIC33FJ256GP710 ENC624J600 40 PSP Mode 202 2038 20,4 % looks good
PIC32MX360F512L ENC624J600 80 PSP Mode 200 2071 20,7 %  
PIC32MX795F512L PHY 80 Internal 393 8449 84,5 %  
dsPIC33FJ256GP710 MRF24WB0M 40 SPI, 8MHz 8 48   WLAN CHIP
PIC32MX795F512L MRF24WB0M 80 SPI,20MHz 9 53   WLAN CHIP

There have been some rumours, that alternative stacks has more throughput, like the lwip [lwip], but we has not found any numbers for the microchip products. Will do some experiments later, and correct this assumption. But we can see in the tables, that the Interface technology between chipsis also limiting a lot, not only the stacks:

Estimation of throughput:

Using an overhead of 10Bits/Byte (8:2) What we can transmit via 10MBit/sec 1MByte/sec, via 100MBit/sec 10MBytes/sec.

If we look at the table above, these numbers are never reached, except with PIC32 internal 100MBit/sec and Phy-Interface. So we have to drop packages if the transmission buffer is full and unprocessed. For high performance if we use only Ethernet for data transmission, without TCP/IP Stack, I do not know if we can reach the limit, but it looks more likely.

The Interface with SPI between the chips has a maximum at 8MHz, which is 800KByte/sec, so only with parallel bus PSP-Interface the optimal output is reached. On modern chips I saw now ~13Mhz SPIs, but this will not dramatically enforce the performance.

Conclusion

It is really hard to decide, what is the best without doing experiments. From the point of DSP calculation power, the dsPIC is much better, from the point of IP Stack, where DSP function are not used, the PIC32 performs double. But anyway using the PIC32 with Phy-ICs instead of ENC624J600, could be another solution, but not the cheapest: Ideally a board with PIC32 and an dsPICEP would be optimal on performance, but we have additional development time, since have to programm and debug both at the same time. Without a good preparation and filtering of data also the transmission between PIC32 and dsPIC could be a bottle-hole.

Notes

1 The decision will be discussed in an new document, follow up.

2 revised the document on 14.8.2011, since some unclear statements in it.

[lwip]http://www.sics.se/~adam/lwip/
[OSC]open sound control, see http://opensoundcontrol.org/