TOE100G-IP Full100G TCP Offload for Alveo Accelerator Card

Nowadays, the large amount of data need to be stored and accessed in the Server inside the Data center. The communication channel for each Server in the same location requires high-bandwidth from the large data size.

Also, the network channel for sharing the data among multiple Servers that are installed at different location is very important.

Why 100G Ethernet?

It will be the critical point and bottleneck of the overall system if the connection for network crossing does not have enough bandwidth or can’t utilize network bandwidth effectively.

Why 100G Ethernet?

100G Ethernet connection is the ideal solution for solving large amount of data problem in the Data Center. Supporting 100 Gbps for transferring data with the reasonable infrastructure cost satisfy the Data center’s requirements.

Why 100G Ethernet?

However, Connecting to 100G Ethernet system by using the 100G Standard network card. There are the researches that show the limited performance when using standard NIC for transferring one TCP session.

100G Ethernet Limitation

About 68 Gbps can be achieved or 68% of the maximum bandwidth of 100G Ethernet.

Besides, sometimes the network performance graph is dropped because CPU and the OS switch to handle other tasks.

100G Ethernet Limitation

Next, let us show the details of CPU task for handling TCP/IP packet by using the standard NIC. The software on CPU consists of many parts for processing each network layers.

Starting from the low layer, Device Driver, Network Subsystem, TCP/IP stack, Socket interface, and the application are implemented.

Offload Engine By Accelerator Card
Ref: https://www.cs.cornell.edu/~qizhec/paper/tcp_2021.pdf

From the CPU bottle-neck, the complete CPU offload engine, implemented by Accelerator card, is purposed. Most CPU tasks for handling TCP/IP packet are handled by the TOE100G-IP and Alveo Accelerator card instead.

Offload Engine By Accelerator Card
Ref: https://www.cs.cornell.edu/~qizhec/paper/tcp_2021.pdf

There are two key hardwares inside the Alveo Accelerator card, TOE100G-IP and DMA engine.

In Sender process, DMA Engine transfer the data from the system memory to TOE100G-IP.

After that, TOE100G-IP builds the Ethernet packet that includes the Application data and transfers to the target system via 100G Ethernet.

In Receiver process, TOE100G-IP extracts the Application data from the received Ethernet packet on 100G Ethernet.

Next, DMA engine transfers the Application data from TOE100G-IP to the system memory. The application can process the data on the system memory.

Offload Engine by Accelerator Card

Let’s see the data flow for Send process in more details.

Firstly, TOE Application generates the data, called TCP Payload, and then write to the Main Memory.

Next, TOE Application sends the request to DMA engine for transferring the data from the Main memory to TOE100G-IP via TOE function.

Finally, TOE Application sends the request to TOE100G-IP for creating Ethernet frame that includes TCP payload data and sending to the target system.

The performance result when the test application writes the incremental data is up to 9,180 MB/s. Without the task to generate incremental data and use dummy data instead, peak performance on 100G Ethernet at 12,300 MB/s can be achieved.

TOE100G-IP on Alveo Card (Send)

In receive process, the data flow is inversed.

TOE100G-IP receives and extracts the TCP payload from Ethernet frame and transfers to DMA Engine.

Next, DMA Engine uploads the data to the Main memory and asserts the signal to the TOE Application that the new data is arrived.

TOE Application reads the data from the Main memory and verifies it. Similar to Send process, the performance is about 9,700 MB/s when the Application verifies the receive data. Without data verification, the Application shows the peak performance at 12,300 MB/s.

TOE100G-IP on Alveo Card (Receive)

Now we show the demo of TOE100G-IP by using two Accelerator systems.

The Accelerator system consists of the Alveo card which is U50 or U250 and the Turnkey system.

Test Environment Set Up

Run the Application, TOE100DMATest, on two Turnkey systems.

The left-side console shows the IP that is initialized by Server mode. The right-side console shows the IP initialized by Client mode.

To show the half duplex transfer, the left-side console selects Send data test menu by using 256 GBs. Jumbo-frame size is applied. The right-side console selects Receive data test menu. Without enable test data generating and verification, 12,300 MB/s can be achieved.

Half Duplex Test U250
Half Duplex Test U250
Half Duplex Test U250
Half Duplex Test U50
Half Duplex Test U50
Half Duplex Test U50

When running full duplex transfer, the performance result is about 10,000 MB/s.

Full Duplex Test U250
Full Duplex Test U250
Full Duplex Test U250
Full Duplex Test U50
Full Duplex Test U50
Full Duplex Test U50

The TOE100G-IP with Alveo card demo can be applied to the Real-time data processing application. The system can transfer the large size data in very short time which is the core feature for this application. When the bandwidth is not enough, the number of 100G Ethernet connections can be increased by adding more Alveo cards.

Real-Time Data Processing

One TOE100G-IP is designed to handle one TCP session data. When multiple TCP sessions are required for transferring many data types, multiple TOE100G-IPs and DMA engines can be integrated to the Accelerator system.

Real-Time Data Processing

When the Accelerator system needs to support both TCP/IP and UDP/IP protocols, the UDP100G-IP can also be integrated and work together with TOE100G-IP.

UDP/TCP Data Processing
Youtube channel: https://www.youtube.com/c/Dgwayweb