SocketXpress with TCP/IP accelerator Reference Design
2.7 AMD 10G/25G Ethernet Subsystem
4.1.1 Socket Creation and Management
4.1.2 SocketXpress Information and Control
4.1.3 Environment Configuration
4.1.5 Data Transfer Operations
4.2.2 Throughput Testing Method
4.2.3 Data Verification Testing Method
This reference design demonstrates how SocketXpress, a custom Linux network socket C library, offloads TCP processing tasks when communicating over network connections. Working seamlessly with Design Gateway's TOE10GLL IP core (TCP Offload Engine), the system enables applications to achieve significantly higher TCP data transfer speeds while maintaining the same user experience and requiring no application recompilation.

Figure 1 The overview of the SocketXpress with TCP/IP accelerator
Traditional network communication relies on the Linux kernel's network stack to handle TCP protocol processing, including connection management, packet segmentation, acknowledgment handling, and flow control. While this software-based approach provides comprehensive protocol support, it can consume substantial CPU resources during high-throughput data transfers, creating performance bottlenecks on resource-constrained edge devices such as the KR260.
Design Gateway presents a reference design that utilizes high-performance network IP cores, specifically the TOE10GLL-IP and 10GEMAC-IP, implemented in the FPGA hardware logic of the KR260 to fully utilize the 10G Ethernet capabilities on 1 TCP connection. The system provides two communication paths: a hardware TCP offload path using TOE10GLL-IP for maximum single-connection throughput with minimal CPU overhead, and a standard network stack path for multiple connections and comprehensive protocol support.
The SocketXpress library uses LD_PRELOAD mechanism to intercept standard socket API calls, implementing intelligent routing between the hardware acceleration path and standard Linux socket. This approach enables socket-based applications to benefit from hardware acceleration while preserving their original socket-based programming model.
This document is divided into three sections based on system components as shown in Figure 1: Hardware, Kernel space, and User space.
· Hardware: User logic and IP cores for acceleration.
· Kernel space: Device drivers providing interface between hardware accelerators and user space application.
a) DG 10GEMAC driver: Ethernet MAC driver with DMA-based packet processing
b) DG TOE driver: Manages DMA-based communication with hardware TCP offload engine
· User space: SocketXpress, Custom socket library that interfaces with hardware accelerators.
Each system component's detailed functionality is described
in the following sections.

Figure 2 SocketXpress with TCP/IP accelerator reference design block diagram
The hardware is connected to the CPU system via an AXI4-Lite interface for control path and AXI4 interface for data path.
For control path, user-space software interacts with hardware registers via memory mapping. The AXI4-Lite interfaces are implemented using LAxi2EMAC (for 10GEMAC-IP) and LAxi2TOE (for TOE10GLL-IP), as shown in Figure 2.
For Tx data path, packet data is moved from CPU DDR memory by AXI DMA and streamed to FIFO buffers. If it's a TOE data path, the data flows through the TOE10GLL-IP before reaching a 2-to-1 MUX. This MUX gives priority to the TOE path first, then forwards the data to the 10GEMAC-IP for transmission.
For Rx data path, packet data flows from the 10GEMAC-IP to both paths. For the TOE path, data goes through PktCombined, then to FIFO, and finally via DMA to CPU DDR memory. For the normal path, packets are first checked by the IPFilter hardware before proceeding to FIFO and DMA to CPU DDR memory.
The user interface of the TOE10GLL-IP connects to UserRegTOE within the LAxi2TOE module to control and monitor TOE operations through a register map. UserRegTOE interfaces with the CPU through AsyncAxiReg using a register interface, while the CPU connects to AsyncAxiReg via an AXI4-Lite interface.
For the DMA Controllers, they are controlled via AXI4-Lite interfaces and configured with 128-bit memory map and stream data widths, supporting scatter-gather operations with unaligned transfers. The DMA control is implemented through two separate Linux platform drivers.
The TOE10GLL-IP operates in Simple mode and connects to the AMD 10G/25G Ethernet Subsystem through a 32-bit AXI4-Stream interface. An IPFilter module is positioned between the TOE and Ethernet subsystem to filter duplicate TCP connection packets. Additionally, a PktCombined module combines payload packets from the TOE10GLL-IP.
This design includes four clock domains:
· CPUClk: Used for CPU communication via the AXI4-Lite bus.
· UserClk: Used as frequency clock domain for AXI DMA data path.
· MacTxClk: Synchronized with the Tx EMAC interface and the Tx user data interface.
· MacRxClk: Synchronized with the Rx EMAC interface and the Rx user data interface.
Details of each module are provided below.
The LAxi2Reg module consists of AsyncAxiReg and UserRegTOE. AsyncAxiReg converts AXI4-Lite signals into a simple Register interface with a 32-bit data bus size, similar to AXI4-Lite standard. Additionally, it includes asynchronous logic to handle clock domain crossing between CPUClk and UserClk domains.
This module is designed to convert the signal interface of AXI4-Lite to be register interface. Also, it enables two clock domains to communicate.
The simple register interface is designed to be compatible with a single-port RAM interface for write transaction. For read transaction, the Register interface is slightly modified from the RAM interface by adding RdReq and RdValid signals to control read latency. Since the address of the Register interface is shared for both write and read transactions, the user cannot perform simultaneous write and read operations. The timing diagram for the Register interface is shown in Figure 3.

Figure 3 Register Interface Timing Diagram
1) To write register, the timing diagram is similar to that of a single-port RAM. The RegWrEn signal is set to 1b, along with a valid RegAddr (Register address in 32-bit units), RegWrData (write data for the register), and RegWrByteEn (write byte enable). The byte enable is four bits wide, where each bit indicates the validity of a specific byte within RegWrData. For example, if RegWrByteEn[0], [1], [2], and [3] are set to 1b, then RegWrData[7:0], [15:8], [23:16], and [31:24] are valid, respectively.
2) To read from a register, AsyncAxiReg sets the RegRdReq signal to 1b, along with a valid value for RegAddr. After the read request is processed, the 32-bit data is returned. The slave detects the RegRdReq being asserted to start the read transaction. During the read operation, the address value (RegAddr) remains unchanged until RegRdValid is set to 1b. Once valid, the address is used to select the returned data through multiple layers of multiplexers.
For register file, UserReg is designed to write/read registers corresponding with write register access or read register request from AsyncAxiReg module. The memory map inside UserReg module is shown in Table 1.
Table 1 Register map Definition of UserRegTOE
|
Register Name |
Description |
|
|
TOE10GLL-IP register |
||
|
TOE_RST_INTREG |
Wr[0]: Mapped to RstB of TOE10GLL-IP |
|
|
0x00004 |
TOE_OPM_INTREG |
Wr[16]: Mapped to ARPICMPEn of TOE10GLL-IP Wr[1:0]: Mapped to DstMacMode of TOE10GLL-IP |
|
0x00008 |
TOE_SML_INTREG |
Wr[31:0]: Mapped to SrcMacAddr[31:0] of TOE10GLL-IP |
|
0x0000C |
TOE_SMH_INTREG |
Wr[15:0]: Mapped to SrcMacAddr[47:32] of TOE10GLL-IP |
|
0x00010 |
TOE_DMIL_INTREG |
Wr[31:0]: Mapped to DstMacAddr[31:0] of TOE10GLL-IP |
|
0x00014 |
TOE_DMIH_INTREG |
Wr[15:0]: Mapped to DstMacAddr[47:32] of TOE10GLL-IP |
|
0x00018 |
TOE_SIP_INTREG |
Wr[31:0]: Mapped to SrcIPAddr of TOE10GLL-IP |
|
0x0001C |
TOE_DIP_INTREG |
Wr[31:0]: Mapped to DstIPAddr of TOE10GLL-IP |
|
0x00020 |
TOE_TMO_INTREG |
Wr[31:0]: Mapped to TimeOutSet of TOE10GLL-IP |
|
0x00024 |
TOE_TIC_INTREG |
Wr[0]: Set ‘1’ to clear read value of TOE_STS_INTREG[2] |
|
0x00030 |
TOE_CMD_INTREG |
Wr[1:0]: Mapped to TCPCmd of TOE10GLL-IP. |
|
0x00034 |
TOE_SPN_INTREG |
Wr[15:0]: Mapped to TCPSrcPort[15:0] of TOE10GLL-IP |
|
0x00038 |
TOE_DPN_INTREG |
Wr[15:0]: Mapped to TCPDstPort[15:0] of TOE10GLL-IP |
|
0x00040 |
TOE_VER_INTREG |
Rd[31:0]: Mapped to IP version of TOE10GLL-IP |
|
0x00044 |
TOE_STS_INTREG |
Rd[20:16]: Mapped to IPState of TOE10GLL-IP Rd[2]: TOE10GLL-IP Interrupt. Asserted to ‘1’ when IPInt is asserted to ‘1’. This flag is cleared by TOE_TIC_INTREG. Rd[1]: Mapped to TCPConnOn of TOE10GLL-IP Rd[0]: Mapped to InitFinish of TOE10GLL-IP |
|
0x00048 |
TOE_INT_INTREG |
Rd[31:0]: Mapped to IntStatus of TOE10GLL-IP |
|
0x0004C |
TOE_DMOL_INTREG |
Rd[31:0]: Mapped to DstMacAddrOut[31:0] |
|
0x00050 |
TOE_DMOH_INTREG |
Rd[15:0]: Mapped to DstMacAddrOut[47:32] |
|
IPFilter register |
||
|
0x00054 |
FILTER_ENABLE |
Wr[0]: Mapped to FilterEn of IPFilter module |
|
Ethernet MAC register |
||
|
0x00058 |
EMAC_LINKSTATUS |
Rd[0]: Mapped to Link status of Ethernet MAC |
|
AXI4 Stream data FIFO register |
||
|
0x00060 |
DMA_TXFIFO_FLUSH |
Wr[0]: set ‘1’ to force AXIS valid signal to ‘1’ |
|
0x00064 |
DMA_TXFIFO_RDCNT |
Rd[31:0]: Mapped to axis_rd_data_count of FIFO |
|
0x00068 |
DMA_RXFIFO_WRCNT |
Rd[31:0]: Mapped to axis_wr_data_count of FIFO |

Figure 4 Rx path block diagram
The IPFilter module shown in Figure 4 filters duplicate TCP connection packets from the 10GEMAC-IP before they reach the network stack. It is controlled by two signals: FilterEn and FilterIPAddr. When FilterEn is set to '1' (enabled) and if the destination IP address matches the value stored in FilterIPAddr, the corresponding packet is dropped.
The PktCombined module illustrated in Figure 4 combines payload packets received from the TOE10GLL-IP to reduce CPU copying operations and lower CPU load. The maximum number of packets to combine is configurable through the generic parameter MaxNumCombinedPkt. PktCombined operates continuously and attempts to combine packets whenever possible by holding the AXI last signal low. The AXI last signal is only asserted to '1' under two conditions: when no incoming packets are available, or when the number of combined packets reaches the MaxNumCombinedPkt limit. Additionally, error packets from the 10GEMAC-IP are dropped within this module.
AXI DMA Controller can be generated by using Vivado IP catalog. The user uses the following settings as shown in Figure 5.
· Enable Scatter Gather Engine : Enable
· Width of Buffer Length Register : 16
· Address Width : 32
· Enable Read Channel : Enable
· Enable Write Channel : Enable
Read Channel
· Number of Channel : 1
· Memory Map Data Width : 128
· Stream Data Width : 128
· Max Burst Size : 256
· Allow Unaligned Transfers : Enable
Write Channel
· Number of Channel : 1
· Memory Map Data Width : 128
· Stream Data Width : 128
· Max Burst Size : 256
· Allow Unaligned Transfers : Enable

Figure 5 Example AXI DMA configuration page
The example of the AXI
DMA Controller in the Ultrascale model is described in the following link: https://www.xilinx.com/products/intellectual-property/axi_dma.html
The LAxi2EMAC module is connected to CPU through AXI4-Lite bus. LAxi2 EMAC consists of AsyncAxiReg and UserRegEMAC. UserRegEMAC is designed to read registers status of AMD 10G/25G Ethernet Subsystem and generate/clear interrupt signal corresponding with write register access or read register request from AsyncAxiReg module. Memory map inside UserRegEMAC module is shown in Table 2.
Interrupt link status is asserted to ‘1’ when detect a link up status changed from AMD 10G/25G Ethernet Subsystem.
Table 2 Register map Definition of UserRegEMAC
|
Address offset |
Register Name |
Description |
|
0x00000 |
EMAC_LINKSTATUS |
Rd[0]: Mapped to Link status of Ethernet MAC |
|
0x00004 |
EMAC_IPVERSION |
Rd[31:0]: Mapped to IP version of Ethernet MAC |
|
0x00008 |
EMAC_CLEAR_IRQ |
Wr[0]: Set ‘1’ to clear Interrupt link status |
https://dgway.com/products/IP/Lowlatency-IP/dg_toe10gllip_data_sheet_xilinx_en/
Ethernet Subsystem can be generated by using Vivado IP catalog. The user uses the following settings as shown in Figure 6.
· Select Core : Ethernet MAC+PCS/PMA 32-bit
· Speed : 10.3125G
· Data Path Interface : AXI Stream
· Num of Cores : 1
Read Channel
· Auto Negotiation Logic : None
Read Channel
· Control and Statistic Interface : Control and Status Vectors

Figure 6 Example of AMD 10G/25G Ethernet Subsystem configuration page
The example of Transceiver wizard in Ultrascale model is described in the following link: https://www.amd.com/products/adaptive-socs-and-fpgas/intellectual-property/ef-di-25gemac.html
This reference design uses the 5.15.0-1027-xilinx-zynqmp kernel image, based on Ubuntu Desktop 22.04 LTS. To facilitate communication between hardware, device driver and user-space software as shown in Figure 7

Figure 7 The overview of the SocketXpress with TCP/IP accelerator
The kernel space component consists of two drivers: the DG 10GEMAC driver enables the Linux network stack to communicate with the 10GEMAC-IP, while the DG TOE driver provides a direct interface for user space applications to communicate with the TOE10GLL-IP.
The DG 10GEMAC driver is modified from the Xilinx driver with unused functions removed and enhanced link status detection capability added. The driver is a Linux network device driver implemented as a platform driver that integrates with the Linux device tree framework for automatic hardware discovery and resource allocation. Upon hardware load, the driver registers with the platform bus and uses device tree matching to detect compatible hardware through the compatible string. The driver provides support for two Ethernet MAC hardware: LL10GEMAC-IP (a low-latency EMAC IP core developed by Design Gateway) and AMD 10G/25G Ethernet Subsystem.
The driver implements a complete network interface that integrates with the Linux network stack. The architecture employs interrupt-driven operation with separate IRQ handling for TX completion, RX packet received, and link status changes, ensuring efficient resource utilization and responsive network performance. The driver supports configurable MTU sizes up to 9000 bytes for jumbo Ethernet frames. Link status monitoring is implemented through IRQ-based detection and reporting, providing real-time network connectivity feedback.
Buffer Descriptor Management and Scatter Gather DMA
The driver employs Scatter Gather (SG) DMA architecture through circular rings of buffer descriptors (BDs) that enable efficient handling of non-contiguous memory buffers. Each BD contains control information, status flags, control flags, and buffer addresses with a crucial "next" pointer that chains descriptors together to form the scatter gather list. The SG DMA capability allows the hardware to automatically process multiple buffer descriptors in sequence without CPU intervention.
Network Stack Integration
The driver integrates seamlessly with the Linux network stack through socket buffer (SKB) management.
For reception, it pre-allocates SKBs using netdev_alloc_skb() and maps their data buffers to DMA-accessible addresses stored in RX descriptors. Upon packet arrival, the driver unmaps the DMA buffer, sets the SKB length with skb_put(), determines the protocol using eth_type_trans(), and delivers the packet to the network stack via netif_receive_skb().
For transmission, the driver handles both linear and fragmented SKBs efficiently. Linear packet data is mapped directly, while fragmented packets use skb_frag_dma_map() to map each fragment individually across multiple buffer descriptors. The driver stores SKB pointers within buffer descriptors to enable proper cleanup during TX completion, using dev_kfree_skb_irq() in interrupt context to free transmitted packets.
For more details, please refer to:
· Official GitHub Repository: https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-xilinx-zynqmp /+git/jammy/tree/drivers/net/ethernet/xilinx?h=master-next
The DG TOE driver is a Linux platform driver that enables access to TCP offload functionality by implementing a character device interface (/dev/dg_stack) that serves as an interface layer enabling SocketXpress library integration with TOE10GLL-IP core, replacing traditional kernel network stack operations. The driver registers with the platform bus using device tree matching through compatible string.
Buffer Descriptor Management and Zero-Copy Architecture
While employing the same Scatter Gather DMA architecture as the EMAC driver with circular buffer descriptor rings, the TOE driver implements a fundamentally different memory management strategy optimized for zero-copy operations. Instead of allocating individual SKBs for each descriptor, the driver allocates large memory regions for both TX and RX operations. These buffers are then subdivided across multiple buffer descriptors, with each BD pointing to its designated segment within the larger memory block. This approach enables direct user-space access to DMA buffers through memory mapping, eliminating costly data copying between kernel and user space.
Character Device Interface and IOCTL Commands
Rather than integrating with the Linux network stack, the driver provides a user-kernel interface through a character device.
File Operations:
· poll() : Monitoring device readiness. Returns POLLIN when received data is available and POLLOUT when transmit buffer space is available. Return ready state when connection is closed.
· mmap() : Memory mapping for zero-copy data access. Exposes TX and RX DMA buffers directly to user space with cached memory access, allowing applications to read and write data without kernel buffer copies.
· write() : Buffers data written to the character device. Copies data from user space to an internal kernel buffer for later transmission.
ioctl commands:
· DG_SEND : Update user Tx BD buffer pointer to kernel space and update DMA BD pointer to start DMA transmit.
· DG_RECV : Retrieve received packet length information in array in size of number of BD
· DG_UpdateUserRxPtr : Update user Rx BD buffer pointer to kernel space
· DG_GetUserRxPtr : Obtain current Rx BD buffer pointer from kernel space
· DG_GetDMATxPtr : Obtain current Tx BD buffer pointer from kernel space
· DG_IOread/DG_IOwrite : Direct access to hardware registers
· DG_GET_MAC_ADDR : Retrieve the device MAC address
· DG_FLUSH_TX/RX : Force DMA data buffer flushing and cleanup data in DMA
· DG_FLUSH_WRBUFFER : Flush buffered data from write() system calls and return it to userspace.
User space includes the SocketXpress library and the TCP testing C program. SocketXpress is a custom Linux network socket C library which works with Design Gateway's TOE10GLL IP core that offloads TCP tasks from CPU, while TCP testing C program is a socket-based C application designed for throughput measurement and data integrity verification. This program serves as a practical demonstration of how existing applications can seamlessly switch from standard Linux socket to SocketXpress library without requiring source code modifications, showcasing the performance benefits of hardware-accelerated TCP processing.
The SocketXpress library is a custom Linux network socket C library that provides hardware-accelerated TCP processing through Design Gateway's TOE10GLL IP core. The library uses LD_PRELOAD to intercept standard POSIX socket API calls and standard C library functions, transparently redirecting IPv4 TCP operations to the TOE hardware while maintaining most API compatibility with existing applications. When the TOE device is already in use or unavailable, the library automatically falls back to the original Linux socket implementation. Functions not intercepted by the library, or intercepted functions not called with a SocketXpress file descriptor, continue to operate using the original Linux implementation. The library has been tested with various applications as shown in Table 3
Table 3 Applications Tested with SocketXpress Library
|
Application Name |
Version |
Test Scenario |
|
Iperf |
2.1.5 |
Established connection to Iperf server and performed bandwidth performance tests |
|
lynx |
2.9.0 |
Browsed web content and navigated to external sites |
|
curl |
7.81.0 |
Retrieved web pages from public websites |
|
Telnet |
0.17-44 |
Connected to remote server and executed basic terminal commands |
|
wget |
1.21.2 |
Downloaded web pages and files from remote servers |
|
ssh |
8.9p1 |
Established secure connection to remote server and executed basic commands |
|
scp |
8.9p1 |
SCP file upload and download operations |
|
ftp |
20210827-4 |
FTP file upload and download operations |
|
git |
2.34.1 |
Cloned remote repository and checked out branches |
|
mysql |
8.0.43 |
Connected to MySQL database server and queried data |
|
links |
2.25 |
Browsed web content and navigated to external sites |
Note: The applications listed have been tested for basic usage scenarios. If you encounter any bugs or require support for additional applications, please contact us.
· Socket
int socket(int domain, int type, int protocol);
The socket creation function implements intelligent routing between DG hardware acceleration and standard Linux socket. When an application requests an IPv4 TCP socket (AF_INET + SOCK_STREAM) and the TOE device is not already in use, the function creates a custom socket by opening character device /dev/dg_stack. For all other socket types or when the TOE device is already in use, the function creates a standard Linux socket using the original socket() system call.
· Bind
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
The bind operation allows applications to specify source IP address and port for SocketXpress connections. For SocketXpress, the function extracts IPv4 address and port information from the sockaddr_in structure and stores them in variables. These values override the default source IP and port configured through environment variables during subsequent connect or accept operations.
· Close
int close(int fd);
The close operation handles proper cleanup of SocketXpress connections. For SocketXpress, it soft reset the TOE10GLL-IP and flush remaining data in dma buffer, then proceeds to close the actual file descriptor using the original close() system call. This ensures both software state and hardware state are properly reset
· getpeername
int getpeername(int socket, struct sockaddr *address, socklen_t *address_len);
Returns remote peer address information for established SocketXpress connections. The function populates a sockaddr_in structure with the target IP address and port stored during connection establishment. Note that in server mode, the TOE10GLL-IP hardware does not provide the target port information, so it remains 0.
· getsockname
int getsockname(int socket, struct sockaddr *address, socklen_t *address_len);
Returns local socket address information for SocketXpress connections. The function populates a sockaddr_in structure with the source IP address and port used for the connection.
· fcntl
int fcntl(int fd, int cmd, ...);
The fcntl function handles file control operations for SocketXpress, supporting only non-blocking mode configuration through F_SETFL with O_NONBLOCK flag and status retrieval through F_GETFL.
· setsockopt
int setsockopt(int sockfd, int level, int optname, const void *optval, socklen_t optlen);
The setsockopt function provides compatibility with standard socket options for SocketXpress. Only TCP_NODELAY option is actually supported and will affect socket behavior.
All other options including SO_SNDBUF, SO_RCVBUF, SO_TIMESTAMP, SO_REUSEADDR, and TCP_MAXSEG are stored in the socket_opts structure but have no functional effect.
· getsockopt
int getsockopt(int sockfd, int level, int option_name, void *option_value, socklen_t *option_len);
The getsockopt function retrieves stored socket option values from the socket_opts structure. It returns previously set values for all supported options, plus SO_ERROR (always 0) and SO_TYPE (always SOCK_STREAM).
Socket options initialize with:
· tcp_nodelay = 0
· tcp_maxseg = 1460
· so_sndbuf = 65536
· so_rcvbuf = 65536
· so_timestamp = 0
· so_reuseaddr = 0
The SocketXpress library supports seamless integration with existing applications through LD_PRELOAD, allowing standard Linux socket applications to use hardware-accelerated TCP processing without recompiling. Applications such as lynx, curl, and custom socket programs can be easily switched from standard Linux socket to the SocketXpress library.
The SocketXpress library replaces standard Linux socket functions using LD_PRELOAD mechanism:
> [Environment variables] LD_PRELOAD=libSocketXpress.so <Application>
The library supports configuration through environment variables that can be specified as additional parameters in the command line:
· SOURCE_IP=<TOE IP>[/subnet_mask]: Specify source IP address for TOE10GLL operations with optional subnet mask. Used for applications that do not explicitly specify an IP address through bind() operations. Supports CIDR notation (e.g., 192.168.11.11/24). If no subnet mask is specified, defaults to /24
· TARGET_IP=<Host IP>: Specify target IP address when using FPGA as a server mode. This sets the expected client IP address for incoming connections
· SOURCE_PORT=<TOE Port>: Specify source port for TOE10GLL operations (default: 60000). For applications that do not specify a port through bind() operations, this port will be used and automatically incremented for each new connection
· GATEWAY_IP=<Gateway IP>: Specify the gateway IP address when the communication requires routing through a specific gateway. Default: Using Gateway IP from Linux ARP table.
· TX_COMBINE=<1|true>: Control TX packet combining behavior. By default, TX combining is enabled to merge data before transmission for efficiency. Setting this to "0" or "false" disables combining, causing data to be sent immediately without merging or waiting. This provides lower latency but may result in lower overall throughput due to sending smaller packets.
· STS_LOG=<0|false>: Enable internal status logging. When set to "1" or "true", the library prints operational messages to terminal. Default is disable.
These environment variables provide default values that can be overridden by explicit bind() operations or connection parameters.
· connect
int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
The connect operation establishes TCP connections using TOE hardware acceleration. The implementation prevents mode conflicts by checking that the SocketXpress is not already in server mode and handles the complete TOE10GLL-IP initialization phase.
Connection Process:
1. Extracts target IP and port from sockaddr_in structure
2. Configures TOE hardware registers for active open mode
3. Compares source and target IP addresses to determine if they are in the same subnet. If not, the system Initiates TOE to send an ARP request to the gateway to obtain the gateway MAC address, then sets TOE to fixed MAC mode using the gateway MAC address
4. Initiates TOE hardware for TCP connection establishment and enables IPFilter hardware
5. Waits for connection completion with signal interrupt support
6. Updates connection state flags and returns 0 on successful completion
Error returns:
· EBUSY : already in server mode
· ENETUNREACH : link is down
· ETIMEDOUT : connection fails
· EINTR : interrupted by signal
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
The accept operation implements server-side connection acceptance using TOE hardware. The implementation prevents mode conflicts by checking that the SocketXpress is not already in client mode and handles the complete TOE10GLL-IP initialization for passive connection establishment.
Connection Process:
1. Configures TOE hardware registers for passive open mode with ARP/ICMP response
2. Initiates TOE hardware for TCP connection establishment and enables IPFilter hardware
3. Waits for incoming TCP connection with signal interrupt support
4. Populates client address information in provided sockaddr structure upon successful connection
5. Creates and returns dummy file descriptor (server_fd) for API compatibility
Client Information: When successful, populates the addr structure with the connecting client's IP address. Port information is not available from the TOE hardware and cannot be retrieved (will be set to 0).
Error returns:
· EBUSY : already in server mode
· ENETUNREACH : link is down
· EINTR : interrupted by signal
· send and write
ssize_t send(int sockfd, const void *buf, size_t len, int flags);
ssize_t write(int fd, const void *buf, size_t count);
The send operations implement data transmission through TOE hardware with support for both blocking and non-blocking modes. Since the SocketXpress implementation does not support send flags, both send() and write() functions operate identically, ignoring any flags parameter passed to send().
When a user call send(), it attempts to merge data before copying to the DMA buffer for efficiency. However, when TCP_NODELAY is set through setsockopt(), it copies data immediately without merging. This provides lower latency but may result in lower overall throughput due to sending smaller packets. By default, it operates in blocking mode and continuously retries until all data is transmitted, similar to standard Linux send() behavior. When non-blocking mode is set, it calls the underlying send function once and returns immediately with EAGAIN error, potentially with partial data sent.
· recv and read
ssize_t recv(int sockfd, void *buf, size_t len, int flags);
ssize_t read(int fd, void *buf, size_t count);
The receive operations implement data reception from TOE hardware with support for both blocking and non-blocking modes. read() is equivalent to recv() with flags set to 0.
Supported flags:
· MSG_PEEK: Peek at data without removing it from the buffer
When a user calls recv(), it copies data from the DMA receive buffer to the user buffer up to the specified length. By default, it operates in blocking mode and does not return until data becomes available or the connection is closed. When non-blocking mode is set, it returns immediately with EAGAIN error if no data is currently available.
· sendmsg
*ssize_t sendmsg(int sockfd, const struct msghdr msg, int flags);
The sendmsg() operation works similarly to send() but handling multiple data buffers. It processes the msghdr structure by combining all iovec buffers into a single contiguous buffer, then internally calls send() to transmit the data. This allows applications to send data from multiple memory locations in a single system call. It does not support any flags parameter.
· recvmsg
*ssize_t recvmsg(int sockfd, struct msghdr msg, int flags);
The recvmsg() operation works similarly to recv() but handles multiple data buffers. It processes the msghdr structure by allocating a temporary buffer, internally calls recv() to receive the data, then distributes the received bytes sequentially across all iovec buffers. This allows applications to receive data into multiple memory locations in a single system call. It does not support any flags parameter.
· fflush
int fflush(FILE *stream);
The fflush() flushes buffered data from a FILE stream and using send() to transfer data through the TOE hardware.
· getc and fgetc
int fgetc(FILE *stream);
int getc(FILE *stream);
The fgetc() and getc() read a single character from receive buffer. getc() operates identically to fgetc(). They internally use recv() to retrieve one byte from the TOE hardware, returning EOF when no data is available.
· fgets
char *fgets(char *s, int size, FILE *stream);
The fgets() reads character-by-character using recv() until a newline is encountered, the size limit is reached, or EOF occurs. The string is null-terminated and includes the newline character if present.
The TCP C testing program is a cross-platform throughput measurement utility designed to demonstrate the performance benefits of the SocketXpress library that works with TOE10GLL-IP compared to standard Linux socket with network stack. This program serves as a practical example of how existing applications can be switched from standard socket implementations to hardware-accelerated TCP processing using TOE10GLL-IP without requiring source code modifications.
The program supports both client and server modes with transmit (TX) and receive (RX) operations. It provides two main testing methodologies: throughput testing for performance measurement and data verification testing for integrity validation.
The program can be compiled on both platforms without additional dependencies:
Linux:
> gcc -o TCP TCP.c
Windows:
> gcc -o TCP.exe TCP.c -lws32_
The program demonstrates seamless integration with the SocketXpress library using LD_PRELOAD, allowing switching from standard Linux socket to hardware-accelerated TCP processing without recompiling:
Standard Socket Operation:
> ./TCP -c|-s -tx|-rx [options]
SocketXpress Library Operation:
> LD_PRELOAD=libSocketXpress.so ./TCP -c|-s -tx|-rx [options]
The program requires mode and operation selection:
Required Arguments:
-c : Client mode (initiates connections)
-s : Server mode (listens for connections)
-tx : Transmit test (send data)
-rx : Receive test (receive data)
Key Optional Arguments:
-b <IP> : Bind source IP for client mode (default: not bind)
-bp <Port> : Bind source Port for client mode (default: not bind)
-p <port> : Port number (default: 60000)
-i <IP> : Target IP for client mode (default: 127.0.0.1)
-buf <size> : Buffer size in MB (default: 1, max: 1024)
-cs <size> : Send chunk size in bytes (default: 16384, max: 1073741824)
-sb <size> : Socket buffer size in KB (default: 1024, max: 1048576)
-nodelay : Enable TCP_NODELAY
-v : Enable verification (default buffer size will be set to 1GB)
For TX: sends 32-bit incremental pattern
For RX: stops after buffer is full and verifies data
The program implements socket optimization through the configure_socket_options() function to enhance throughput performance. When the -nodelay flag is specified, the program enables TCP_NODELAY.
The program configures socket buffer sizes using SO_RCVBUF and SO_SNDBUF options, setting both send and receive buffers to the size specified by the -sb parameter (default 1MB). Proper socket buffer sizing is critical for achieving optimal throughput, particularly on high-bandwidth networks where insufficient buffering can create bottlenecks.
In transmit mode, the program uses the do_transmit() function to send data from a pre-allocated buffer in configurable chunk sizes. The transmission continues until manually stopped with Ctrl+C, using the same buffer content repeatedly for maximum efficiency while tracking total bytes sent. The transmission process sends data in chunks specified by the -cs parameter (default 16KB).
In receive mode, the program uses the do_receive() function to receive data in fixed 1MB chunks for optimal buffer management. The program accumulates total bytes received without storing all data, continuing reception until manually stopped to measure raw network reception performance.
The data verification method validates data integrity during transmission, ensuring that TCP offloading maintains complete data accuracy. This mode is enabled with the -v flag and automatically sets the buffer size to 1GB for comprehensive testing. For proper data verification testing, both the server and client must be run with the -v flag to ensure verification behavior across the entire data path.
On the transmit side, when verification mode is enabled with the -v flag, the fill_buffer_incremental() function fills the buffer with sequential 32-bit integers. This creates a pattern that can be verified on the receiving end, using the entire buffer space for pattern generation. The function reports the range of values generated from 0 to the maximum count.
The TX side then sends this incremental pattern using configured chunk sizes while maintaining buffer offset tracking to ensure pattern continuity.
On the receive side, when verification mode is enabled with the -v flag, the program receives data until the buffer reaches full capacity (1GB by default) and automatically stops reception when buffer is full. The program then calls the verify_data() function to validate the received content by comparing it against the expected incremental pattern that was generated on the TX side.
The verification process provides comprehensive integrity analysis by checking every 32-bit value against its expected sequential position. The process reports the total number of errors found during verification and displays the first 10 errors with their positions and values for debugging purposes. The verification concludes with a clear SUCCESS or FAILED status based on data integrity results.
This verification method ensures that hardware-accelerated TCP processing through TOE10GLL-IP maintains complete data integrity while achieving higher performance compared to software-based TCP stacks.
|
Revision |
Date (D-M-Y) |
Description |
|
1.00 |
23-Jan-26 |
Initial version release |