Stack Performance of TCP/IP Stack on different combinations

Some thoughts on the TCPIP Stack and the microcotroller chip selection.

Data rates for audio transmission on different implementations

For choosing the right design for escher 2 board, i did some recherches. Most of this assumprtions and information is theoretically, not proofed by now.

Needed throughput for AUDIO

Needing low latency down to 4 samples, we cannot compress audio very much. The effort for low latency compression sample by sample could be a more efficient coding scheme than raw, but with them we cannot accomplish high compression rates. So we asume for further inspection raw audio data uncompressed.

Additional overhead is added with Ethernet/IP frames and pauses between packets so on our previous experience with escher1, 10Mbit/sEthernet, using 10bit/Byte is a good assumption.

Maximum theoretical number of channels for diﬀerent transmission rates as a rough calculation for further decisions:

Assumption:
Sample-rate 44100, 16 Bit/Sample
1 Byte = 8Bit needs 10Bit transmission with overhead

Trans. rate	channels	with overhead
MBit/sec	numbers/max	numbers/max
10	14,17 / 14	11,33 / 11
100	141,7 / 141	113,3 / 113
1000	1417 / 1417	1133 / 1133

Also on TCP/IP or half duplex we need more bandwidth; also we need small package sizes for low latency.

Use case 3D Ambisonics 4th order:

As the target use case we build up an installation with up to 64 speakers=channels, with the possibility to query them with microphone signals near them.

For 4th order we need 25 channels Ambisonics channels. On direct to speaker solutions solution we could need up to 64 channels on oversampled Ambisonics solutions (see Ambisonics theory for more info). Feedback should be at least on a 8 microphones at a time.

If we want to mix different UDP Streams on the card we have to multiply by the factor of Mixing channels. Using overhead of 20% which means 10Bits per Byte and estimated numbers we get a fill the standards. Very good behaviour is reached, if we do not flood the ethernet beyond 33% from our experiences:

channels	resolution	samplerate	datarate	10MBit/s	100MBit/s	1GBit/s
8	12 bit	32000Hz	3 MBit/s	30 %	3 %	0,3 %
8	16 bit	44100Hz	6 MBit/s	60 %	6 %	0,6 %
25	16 bit	44100Hz	17 MBit/s	170 %	17 %	1,7 %
25x4	16 bit	44100Hz	71 MBit/s	710 %	71 %	7,1 %
64	16 bit	44100Hz	45 MBit/s	450 %	45 %	4,5 %
64x4	16 bit	44100Hz	180 MBit/s	1800 %	180 %	18,0 %
8	24 bit	48000Hz	9 MBit/s	90 %	9 %	0,9 %
25	24 bit	48000Hz	28 MBit/s	280 %	28 %	2,8 %
25x4	24 bit	48000Hz	115 MBit/s	1150 %	115 %	11,5 %
64	24 bit	48000Hz	74 MBit/s	740 %	74 %	7,4 %
64x4	24 bit	48000Hz	295 MBit/s	2950 %	295 %	29,5 %

For our Ambisonics use case we need at least a 100MBit/sec. Anyway some (cheap) networks doesn't reach more then 80% throughput; less than 50% on halfduplex, often seen with cheap GBit-networks. Since Ambisonics channels are broadcasted in local network, we can play with a lot of speakers which can calculate their signal from the bus, which can be more than channels in the bus.

Another format, the 32bit float, which will double the 16Bit, is not used very often in embedded systems without floating point unit, so we do not look at them here. Anyway in Installations not going beyond 100dBA, we do not need much more than 80dB Signal/Noise ratio, if signals are scaled well.

On the other side we have to look if our device can process this data. So GBit/s networks are mostly only used with FPGAs chips or similar, with integrated TCP/IP stack. Here we concentrate on the 100MBit/sec, which can be done with modern microcontrollers, but often not used to the limit of 100MBit/sec as shown below.

Throughput of TCP/IP Stack

It is hard to get reliable numbers, since packet drops are different on different timings within the system, so a continuous data stream is assumed. Also reaction time to incoming packages to be processed are sometimes slower, than the full channel bandwidth needed, which is ok as long not all data has to be used, dropping not needed packages for the micro-controller.

From the help file from Microchip:

as Microchip TCP/IP Stack 5.36.2 - July 14, 2011

I just collected the relevant data, since TCP/IP is used in smallest mode 200 Bytes (mostly common with OSC [OSC]) and UDP has a constant load.

The values are kBytes/sec from the stack, if 100MBit/sec is 10MByte/sec, then we can see how much in percent we reach of this channel for UDP.

uP	ethernet	MIPS	Interface	TCPIP	UDP	100MB/sec	comment
dsPIC33FJ256GP710	ENC28J60	40	SPI 8MHz	75	258	25,8 %	escher1.0 10Mbit/s
dsPIC33FJ256GP710	ENC624J600	40	SPI 8MHz	94	554	5,5 %	like in escher 1.0
PIC32MX795F512L	ENC624J600	80	SPI 13MHz	154	784	7,8 %
dsPIC33FJ256GP710	ENC624J600	40	PSP Mode	202	2038	20,4 %	looks good
PIC32MX360F512L	ENC624J600	80	PSP Mode	200	2071	20,7 %
PIC32MX795F512L	PHY	80	Internal	393	8449	84,5 %
dsPIC33FJ256GP710	MRF24WB0M	40	SPI, 8MHz	8	48		WLAN CHIP
PIC32MX795F512L	MRF24WB0M	80	SPI,20MHz	9	53		WLAN CHIP

There have been some rumours, that alternative stacks has more throughput, like the lwip [lwip], but we has not found any numbers for the microchip products. Will do some experiments later, and correct this assumption. But we can see in the tables, that the Interface technology between chipsis also limiting a lot, not only the stacks:

Estimation of throughput:

Using an overhead of 10Bits/Byte (8:2) What we can transmit via 10MBit/sec 1MByte/sec, via 100MBit/sec 10MBytes/sec.

If we look at the table above, these numbers are never reached, except with PIC32 internal 100MBit/sec and Phy-Interface. So we have to drop packages if the transmission buffer is full and unprocessed. For high performance if we use only Ethernet for data transmission, without TCP/IP Stack, I do not know if we can reach the limit, but it looks more likely.

The Interface with SPI between the chips has a maximum at 8MHz, which is 800KByte/sec, so only with parallel bus PSP-Interface the optimal output is reached. On modern chips I saw now ~13Mhz SPIs, but this will not dramatically enforce the performance.

Conclusion

It is really hard to decide, what is the best without doing experiments. From the point of DSP calculation power, the dsPIC is much better, from the point of IP Stack, where DSP function are not used, the PIC32 performs double. But anyway using the PIC32 with Phy-ICs instead of ENC624J600, could be another solution, but not the cheapest: Ideally a board with PIC32 and an dsPICEP would be optimal on performance, but we have additional development time, since have to programm and debug both at the same time. Without a good preparation and filtering of data also the transmission between PIC32 and dsPIC could be a bottle-hole.

Notes

1 The decision will be discussed in an new document, follow up.

2 revised the document on 14.8.2011, since some unclear statements in it.

[lwip]

http://www.sics.se/~adam/lwip/

[OSC]

open sound control, see http://opensoundcontrol.org/