TOE100G-IP Core Data Sheet

Features 1

Applications 2

General Description. 4

Functional Description. 6

Control Block. 6

·       Reg. 6

·       TCP Stack. 9

Transmit Block. 10

·       Tx Data Buffer 10

·       Tx Packet Buffer 10

·       Packet Builder 10

·       Async Buffer (Tx) 10

Receive Block. 11

·       Async Buffer (Rx) 11

·       Packet Filtering. 11

·       Packet Splitter 11

·       Rx Data Buffer 11

User Block. 12

100G Ethernet (MAC) Subsystem.. 12

Core I/O Signals 13

Timing Diagram.. 15

IP Initialization. 15

Register Interface. 17

Tx FIFO Interface. 18

Rx FIFO Interface. 19

EMAC Interface. 21

Example usage. 24

Client mode (SRV[1:0] = 00b) 24

Server mode (SRV[1:0] = 01b) 24

Fixed MAC mode (SRV[1] = 1b) 25

PKL and TDL setting in Send command. 26

TDL = N times of PKL. 26

TDL = N times of PKL + Residue. 27

Connection termination of unusual case. 28

Verification Methods 29

Recommended Design Experience. 29

Ordering Information. 29

Revision History. 29

 

 

 

  Core Facts

Provided with Core

Documentation

Reference design manual

Demo instruction manual

Design File Formats

Encrypted File

Instantiation Templates

VHDL

Reference Designs & Application Notes

Vivado Project,

See Reference design manual

Additional Items

Demo on KCU116/ZCU111/

Alveo U250/VCK190/

FB2CGHH@KU15P card/

Support

Support Provided by Design Gateway Co., Ltd.

 

 

Design Gateway Co.,Ltd

E-mail:    ip-sales@design-gateway.com

URL:       design-gateway.com

 

Features

·     TCP/IP stack implementation

·     Support IPv4 protocol

·     Support one session for each TOE100G IP (Multisession can be implemented by using multiple TOE100G IPs)

·     Support both Server and Client mode (Passive/Active open and close)

·     Support Jumbo frame

·     Transmitted packet size aligned to 512-bit, transmitted data bus size

·     Total amount of received data aligned to 512-bit, received data bus size

·     Simple data interface by standard FIFO interface at 512-bit data bus

·     Simple control interface by 32-bit single-port RAM interface

·     512-bit AXI4 stream interface with 100G Ethernet MAC

·     Support window scaling feature with selectable buffer size up to 1MB

·     At least 220 MHz user clock frequency recommended

·     Reference design available on KCU116/ZCU111/Alveo U250/FB2CGHH@KU15P/VCK190

·     Not support data fragmentation feature

·     Customized service for following features

·     Unaligned 512-bit data transferring

·     Network parameter assignment by other methods

 

 

Table 1: Example Implementation Statistics (UltraScale+)

Family

Example Device

Buffer size

(Tx and Rx)

Fmax

(MHz)

CLB

Regs

CLB

LUTs

CLB1

IOB

BRAM

Tile

URAM

Design

Tools

Kintex UltraScale+

XCKU5P-FFVB676-2E

64KB

350

10852

10177

2120

-

53

-

Vivado2022.1

1MB

330

10875

10430

2116

-

22.5

64

Vivado2022.1

Zynq UltraScale+

XCZU28DR-FFVG1517-2-E

64KB

350

10826

10163

2209

-

53

-

Vivado2022.1

1MB

330

10851

10418

2101

-

22.5

64

Vivado2022.1

Alveo

U250

64KB

350

10826

10161

2089

-

53

-

Vivado2022.1

1MB

330

10995

10413

2079

-

22.5

64

Vivado2022.1

 

Notes:

1)     Actual logic resource dependent on percentage of unrelated logic

 

 

Table 2: Example Implementation Statistics (Versal)

Family

Example Device

Buffer size

(Tx and Rx)

Fmax

(MHz)

CLB

Regs

CLB

LUTs

Slice1

IOB

BRAM

Tile

URAM

Design

Tools

Versal AI Core

XCVC1902-VSVA2197-2MP-ES

64KB

350

10829

11003

2280

 

51.5

-

Vivado2021.2

1MB

350

10912

11011

2250

 

22.5

61

Vivado2021.2

 

 

Applications

The TOE100G IP core for enabling the data transfer using TCP/IP protocol over 100G Ethernet can transfer data at high speeds with reliability. This solution is frequently used in servers with large amounts of data to process and in test systems that require high-bandwidth data logging from multiple sources. Figure 1 and Figure 2 illustrate some of the FPGA-based applications of TOE100G IP.

 

Figure 1: NVMe over TCP (NVMe-oF) application

 

The first application is the NVMe-oF system, which uses the NVMe over TCP (NVMe/TCP) protocol to enable network access to storage through the NVMe protocol. This allows for low-latency, high-bandwidth data transfer. NVMe-oF protocols include RDMA, InfiniBand, and NVMe/TCP, with the latter being a more cost-effective and extensible option that can be implemented using common network hardware.

Figure 1 provides detailed instructions for implementing the NVMe/TCP Host using two TOE100G IPs, one for Admin command transfer and the other for data transfer. With the TOE100G IP, the Host controller only needs to implement the NVMe and NVMe-oF protocols without the need for the TCP/IP protocol. The TOE100G IP on the data port also offers high-speed data transfer, which is particularly advantageous for CPU-less systems as they do not require a CPU or DDR to implement the host controller.

Another side of NVMe-oF is the NVMe/TCP Target, which connects to the SSD Rack and translates data and commands from 100G Ethernet into NVMe on PCIe protocol for NVMe SSD. Similar to the Host, the Target can be designed using CPU system or hardwire logic with TOE100G IP.

 

 

Figure 2: Data acquisition system

 

In high-resolution data sources, achieving data transfer rates of up to 12 Gbytes/s can be challenging due to the limited hardware systems and communication channels available. One of the most rapid storage solutions to address this issue is the NVMe SSD. Using the 4-lane PCIe Gen5, NVMe SSDs can read or write data at speeds of up to 12000 Mbytes/s. Combining multiple NVMe SSDs as a RAID0 system can further increase the transfer speeds. Interestingly, the speed of NVMe SSDs at 12000 Mbytes/s is comparable to the performance of 100G Ethernet, which can achieve transfer rates of up to 12 Gbytes/s. By integrating both the NVMe Gen5 SSD and 100G Ethernet, it is possible to design a powerful data acquisition system with remote monitoring capabilities, as illustrated in Figure 2.

 

General Description

 

Figure 3: TOE100G IP Block Diagram

 

The TOE100G IP core is a powerful hardware module that implements the TCP/IP stack and connects with a 100G Ethernet Subsystem module for the lower-layer hardware. Its user interface is composed of a Register interface for control signals and a FIFO interface for data signals. TOE100G IP operates with two clock domains: Clk for the user interface and MacClk for the EMAC interface of the 100G EMAC.

To access up to 32 registers, the Register interface uses a 5-bit address. The registers store network parameters, commands, and system parameters. Each TOE100G IP can operate one session to communicate with a single target device. Network parameters must be set before de-asserting the reset signal to execute IP initialization. After the reset operation and parameter initialization are complete, the IP is ready to transfer data with the target device. Network parameters cannot be changed without a reset process and the TOE100G IP has three initialization modes for obtaining the MAC address of the target device. Further details of each mode can be found in the IP Initialization topic.

To transfer data with the user, a 512-bit FIFO interface is used. However, there is no byte enable in the FIFO interface, so the transmitted data from the user must be aligned to 512-bit. The packet length and the total amount of transmitted data must also be aligned to 512-bit. On the other hand, the received data on the Rx FIFO I/F can be read when at least one 512-bit data is available in the Rx data buffer. If the total amount of received data is not aligned to 512-bit, the user cannot read the last data and must wait until the next data is received to fill the remaining byte of 512-bit data for reading the Rx data buffer.

The TOE100G IP uses a 512-bit AXI4-ST interface to connect with the 100G Ethernet subsystem. However, the 100G Ethernet Subsystem of each FPGA model has a different user interface. For instance, the 100G Ethernet Subsystem by Xilinx uses a 512-bit AXI4 Stream interface at 322.266 MHz for UltraScale+ devices. Therefore, the TOE100G IP can connect to the 100G Ethernet Subsystem on UltraScale+ devices directly. On the other hand, the Versal device integrates a 100G Ethernet MAC Subsystem that can configure the user interface to be a 384-bit AXI4 Stream interface at 390.625 MHz for Non-Segmented mode. Therefore, an adapter logic is required to convert data width from 384 bits to 512 bits and vice versa to connect with the TOE100G IP.

The TOE100G IP has two Async buffers that enable the user interface to run on an independent clock that has a lower frequency than the Ethernet MAC clock frequency. However, it is recommended to use 220 MHz or more as the user clock frequency. Using a too slow frequency clock may result in the Async buffer becoming full, and some packets may be lost. Transfer performance is reduced when data recovery process is required.

In accordance with TCP/IP standard, connection establishment is the first step before transferring data. TOE100G IP supports both active open (the IP opens the port) and passive open (the target device opens the port) modes. After a successful connection, data can be transferred via the new connection. To send TCP payload data, the user must set the total transfer size, packet size, and send command to the IP. The TCP payload data is transferred via the TxFIFO interface. Conversely, when the TCP packet is received from the target, the TCP payload data is extracted and stored in the Rx data buffer. The user logic monitors FIFO status to detect the amount of received data and then asserts read enable to read the data via the RxFIFO interface. When there is no more data to transfer, the connection can be terminated by closing the port. TOE100G IP supports both active close (the IP closes the port) and passive close (the target device closes the port) modes.

To meet the requirements of user systems that may be sensitive to memory resources or performance, the buffer size inside the TOE100G IP can be adjusted by the user to accommodate these needs. Specifically, the sizes of the Tx data buffer and Rx data buffer can be modified with a maximum size of 1 MB. Utilizing larger buffer sizes can enhance transfer performance, but 1 MB size requires the use of the window scaling feature of TCP options, which is already implemented in the TOE100G IP. This feature is particularly useful for users who require high-speed data transfers. Further details about the hardware inside the IP are described in the next topic.

 

Functional Description

As shown in Figure 3, TOE100G IP can be divided into three parts, i.e., control block, transmit block, and receive block. The details of each block are described as follows.

Control Block

·       Reg

All parameters of the IP are set via Register interface that consists of 5-bit address signals and 32-bit data signals. The timing diagram of the Register interface is similar to a single-port RAM interface, as shown in Figure 7. The write and read address are the same signals. Table 3 provides a description of each register.

 

Table 3: Register map Definition

RegAddr

[4:0]

Reg

Name

Dir

Bit

Description

00000b

RST

Wr

/Rd

[0]

Reset IP. 0b: No reset, 1b: Reset. Default value is 1b.

Once the network parameters have been assigned, the user can execute system initialization by setting this register to 1b and then 0b. This action loads the parameters into the IP and executes the system initialization. If the user needs to update certain parameters, this process must be repeated by setting this register to 1b and then 0b again. The RST register controls the following network parameters: SML, SMH, DML, DMH, DIP, SIP, DPN, SPN, and SRV.

00001b

CMD

Wr

[1:0]

User command. 00b: Send data, 10b: Open connection (active), 11b: Close connection (active), 01b: Undefined. The command operation begins after the user sets CMD register.

In order to start a new operation by setting this register, the system must first be in the Idle state. To confirm that the system is not busy, the user should read bit[0] of CMD register or RegDataA1 output, which should be equal to 0b.

Rd

[0]

System busy flag. 0b: Idle, 1b: IP is busy.

[3:1]

Current IP status. 000b: Send data, 001b: Idle, 010b: Active open, 011b: Active close,

100b: Receive data, 101b: Initialization, 110b: Passive open, 111b: Passive close.

00010b

SML

Wr

/Rd

[31:0]

Define 32-bit lower MAC address (bit [31:0]) for this IP.

To update this value, the IP must be reset by RST register.

00011b

SMH

Wr

/Rd

[15:0]

Define 16-bit upper MAC address (bit [47:32]) for this IP.

To update this value, the IP must be reset by RST register.

00100b

DIP

Wr

/Rd

[31:0]

Define 32-bit target IP address.

To update this value, the IP must be reset by RST register.

00101b

SIP

Wr

/Rd

[31:0]

Define 32-bit IP address for this IP.

To update this value, the IP must be reset by RST register.

00110b

DPN

Wr

/Rd

[15:0]

Define 16-bit target port number. Unused when the port is opened in passive mode.

To update this value, the IP must be reset by RST register.

00111b

SPN

Wr

/Rd

[15:0]

Define 16-bit port number for this IP.

To update this value, the IP must be reset by RST register.

01000b

TDL

Wr

[31:0]

Total Tx data length in byte unit. The value must be aligned to 64-byte because bit[5:0] are not used. Valid range is 64-0xFFFFFFC0.

The user must first set this register before setting CMD register = Send data (00b). When the IP executes the ‘Send data’ command and asserts Busy to 1b, the system will read this register, allowing the user to subsequently set the TDL register for the next command. If the same TDL is used in the subsequent command, the user is not required to set TDL again.

Rd

[31:0]

Remaining transfer length in byte unit which does not transmit.

 

RegAddr

[4:0]

Reg

Name

Dir

Bit

Description

01001b

TMO

Wr

[31:0]

Define timeout value for awaiting the return of Rx packet from the target. The counter runs based on the Clk signal provided by the user, with the timer unit being equal to 1/Clk. If the packet is not received within the specified time, TimerInt will be asserted to 1b. For further information of TimerInt, please refer to the Read value of TMO[7:0] register. It is recommended to set the TMO to a value greater than 0x6000.

Rd

 

The details of timeout interrupt are shown in TMO[7:0]. Other bits are read for IP monitoring.

[0]-Timeout from not receiving ARP reply packet.

After timeout, the IP resends ARP request until ARP reply is received.

[1]-Timeout from not receiving SYN and ACK flag during active open operation.

After timeout, the IP resends SYN packet for 16 times and then sends FIN packet to close connection.

[2]-Timeout from not receiving ACK flag during passive open operation.

After timeout, the IP resends SYN/ACK packet for 16 times and then sends FIN packet to close connection.

[3]-Timeout from not receiving FIN and ACK flag during active close operation.

After the 1st timeout, the IP sends RST packet to close connection.

[4]-Timeout from not receiving ACK flag during passive close operation.

After timeout, the IP resends FIN/ACK packet for 16 times and then sends RST packet to close connection.

[5]-Timeout from not receiving ACK flag during data transmit operation.

After timeout, the IP resends the previous data packet.

[6]-Timeout from Rx packet lost, Rx data FIFO full, or wrong sequence number.

The IP generates duplicate ACK to request data retransmission.

[7]-Timeout from too small receive window size when running Send data command and setting PSH[2] to 1b. After timeout, the IP retransmits data packet, similar to TMO[5] recovery process.

[21]-Lost flag when the sequence number of the received ACK packet is skipped. As a result, TimerInt is asserted and TMO[6] is equal to 1b.

[22]-FIN flag is detected during sending operation.

[23]-Rx packet is ignored due to Rx data buffer full (fatal error).

[27]-Rx packet lost detected.

[30]-RST flag is detected in Rx packet.

[31],[29:28],[26:24]-Internal test status

01010b

PKL

Wr

/Rd

[15:0]

TCP data length of each Tx packet in byte unit. The value must be aligned to 64-byte because bit[5:0] are not used. Valid from 64-8960. Default value is 1408 bytes, which is the maximum size of non-jumbo frame and aligned to 64-byte.

During running Send data command (Busy=1b), the user must not set this register.

Similar to TDL register, the user does not need to set PKL register again if the next command uses the same packet length.

01011b

PSH

Wr

/Rd

[2:0]

Sending mode when running Send data command.

[0]-Disable to retransmit packet.

0b: Generate the duplicate data packet for the last data packet in Send data command when TDL value is not equal to N times of PKL value to accelerate ACK packet (default).

1b: Disable the duplicate data packet.

[1]-PSH flag value in TCP header for all transmitted packet.

0b: PSH flag = 0b (default).

1b: PSH flag = 1b.

 

RegAddr

[4:0]

Reg

Name

Dir

Bit

Description

01011b

PSH

Wr

/Rd

[2:0]

[2]- Enable to retransmit data packet when Send data command is paused until timeout, caused by the receive window size being smaller than the packet size. This flag is designed to resolve the system hang problem resulting from lost window update packet. Activating data retransmission prompts the target device to regenerate the lost window update packet. All following conditions must be met to initiate data retransmission.

(1) PSH[2] is set to 1b.

(2) The current command is ‘Send data’ and all data are not completely sent.

(3) The receive window size is smaller than the packet size.

(4) Timer set by TMO register is overflowed.

0b: Disable the feature (default), 1b: Enable the feature.

01100b

WIN

Wr

/Rd

[9:0]

Threshold value in 1Kbyte unit to initiate window update packet transmission.

Default value is 0 (Not enable window update transmission).

The IP sends the window update packet when the free space in the Rx data buffer increases by an amount greater than the threshold value from the value in the most recently transmitted packet. For example, if the user sets WIN=000001b (1 Kbyte) and the window size of the most recently transmitted packet is 2 Kbyte, when the user reads 1 Kbyte data from the IP and the free space in the Rx data buffer is updated from 2 Kbyte to be 3 Kbyte, the IP detects that the increased window size is greater than the threshold value of 1 Kbyte (3 KB – 2 KB). As a result, the IP sends the window update packet to update the receive buffer size.

01101b

ETL

Wr

[31:0]

Extended total Tx data length in byte unit. The value must be aligned to 64-byte and bit[5:0] are not used. The user can set this register during the Send data command operation (Busy=1b) to extend the total Tx data length. This allows for continuous data transmission without having to resend a new command to the IP. However, there are some important considerations to use this feature:

1) The ETL register must be programmed when the read value of TDL is greater than the size of the Tx data buffer to ensure that Busy is not de-asserted to 0b before setting the ETL register.

2) The set value of ETL must be less than the maximum value of TDL (0xFFFFFFC0) minus the read value of TDL, to avoid overflow value.

For example, the user sets TDL to 3.5 Gbytes and then sets CMD register to Send data. After the IP completes 2 Gbytes of data (remaining size = 1.5 Gbytes), the user sets the ETL register to 1.5 Gbytes. The total transmit length is equal to 5 Gbytes (3.5 Gbytes of TDL + 1.5 Gbytes of ETL).

01110b

SRV

Wr

/Rd

[1:0]

00b: Client mode (default). When the RST register changes from 1b to 0b, the IP sends an ARP request to obtain the Target MAC address from the ARP reply returned by the target device. The IP busy signal is de-asserted to 0b after receiving the ARP reply.

01b: Server mode. When RST register changes from 1b to 0b, the IP waits for an ARP request from the target to obtain Target MAC address. After receiving the ARP request, the IP generates an ARP reply and then de-asserts the IP busy signal to 0b.

1Xb: Fixed MAC Mode. When the RST register changes from 1b to 0b, the IP updates all internal parameters and then de-asserts IP busy to 0b. Target MAC address is loaded through the DML/DMH register.

Note: In Server mode, when RST register changes from 1b to 0b, the target device must resend an ARP request for the TOE100G IP to complete the IP initialization process.

01111b

VER

Rd

[31:0]

IP version

10000b

DML

Wr

/Rd

[31:0]

Define 32-bit lower target MAC address (bit [31:0]) for this IP when SRV[1]=1b (Fixed MAC).

To update this value, the IP must be reset by RST register.

10001b

 

DMH

Wr

/Rd

[15:0]

Define 16-bit upper target MAC address (bit [47:32]) for this IP when SRV[1]=1b (Fixed MAC).

To update this value, the IP must be reset by RST register.

 

 

·       TCP Stack

The TCP stack is responsible for controlling the modules involved in interfacing with the user and transferring packets via EMAC. The IP operation involves two phases - IP initialization and data transfer. After the RST register transitions from 1b to 0b, the initialization phase begins. The SRV[1:0] are used to set the initialization mode, which can be Client mode, Server mode, or Fixed MAC mode. The TCP stack reads the parameters from the Reg module and sets them in the Transmit and Receive blocks for packet transfer with the target device. Once initialization is complete, the IP enters the data transfer phase.

To transfer data between the TOE100G IP and the target device, three processes are involved: port opening, data transfer, and port closing. The IP supports active open or close by sending SYN or FIN packets when the user sets the CMD register to 10b (port opening) or 11b (port closing). Alternatively, the port can be opened or closed by the target device (passive mode) when the TCP Stack receives SYN or FIN packet. While the port is being opened or closed, the Busy flag is asserted to 1b. Once all packets are transferred, Busy is de-asserted to 0b. The ConnOn signal can be applied to check if the port status is completely opened or closed. The data can be transferred when ConnOn is asserted to 1b (indicating that the port is completely opened).

To send the data, user data is stored in the Tx data and Tx packet buffers. Packet Builder uses the network parameters set by the user to build TCP header, and then the data of Tx data buffer is appended to the TCP packet. The Transmit block then sends the TCP packet to the target device via Ethernet MAC. If the target device receives the data correctly, an ACK packet is returned to Receive block. The TCP Stack monitors the status of the Transmit and Receive blocks to confirm that the data has been sent successfully. If the data is lost, the TCP Stack pauses the current data transmission and initiates the data retransmission process in Transmit block.

When the Receive block receives data, TCP Stack checks the order of the received data. If the data is in the correct order, a normal ACK packet is generated by the Transmit block. Otherwise, the TCP Stack starts the lost data recovery process by instructing the Transmit block to generate duplicate ACKs to the target device.

Table 4: TxBuf/RxBufBitWidth Parameter description

Value of BitWidth

Buffer Size

Implemented Memory type

9

32Kbyte

Block RAM

10

64Kbyte

Block RAM

11

128Kbyte

Ultra RAM

12

256Kbyte

Ultra RAM

13

512Kbyte

Ultra RAM

14

1Mbyte

Ultra RAM

 

 

Transmit Block

Transmit block contains two buffers - Tx data buffer and Tx packet buffer – whose sizes can be adjusted through parameter assignment. A larger buffer size may improve transmit performance. Data from the Tx data buffer is split into packets based on the packet size and stored in the Tx packet buffer. TCP header is constructed using the network parameters from the Reg module and then combined with the TCP data from the Tx packet buffer to form a complete TCP packet. The data in the Tx data buffer is flushed after the target device sends an ACK packet. Once the Send data command is completed, the user can initiate the next command.

·       Tx Data Buffer

The size of this buffer is determined by the “TxBufBitWidth” parameter of the IP, with valid value ranging from 9 – 14 (32KB to 1MB), which corresponds to the address size of a 512-bit buffer as shown in Table 4. This buffer stores data from the user to prepare the transmit packet sent to the target device. Data is removed from the buffer when the target device confirms that the data has been completely received. When the buffer size is large enough, the IP can send multiple data packets to the target device without waiting for an ACK packet to clear the buffer. The user can continuously store new data in the Tx data buffer without waiting for long periods. This results in the best transmit performance on a 100G Ethernet connection. However, if there is significant latency time due to the carrier, networking interface, or target system, all the data in the Tx data buffer may be transferred before an ACK packet is returned to flush the buffer. In such cases, the user must pause filling the buffer with new data, resulting in reduced transmit performance.

If the total user data is greater than the value of the TDL register, the buffer will still have remaining data after completing the current Send command. This data can be applied for the next Send command. All data in the buffer is flushed when the connection is closed or the IP is reset.

Note: The IP cannot send the packet if the data stored in the buffer is less than the transmit size. The IP must wait until the data from user is sufficient to create one packet.

·       Tx Packet Buffer

This buffer stores at least one complete packet before forwarding a packet to Async buffer (Tx).

·       Packet Builder

The TCP packet is comprised of a header and data. The Packet builder first receives network parameters from the Reg module and uses them to construct the TCP header. The TCP and IP checksum are also calculated for the header. Once the header is fully constructed, it is combined with the data from the Tx packet buffer and then transmitted to the Async buffer (Tx).

·       Async Buffer (Tx)

The Async buffer (Tx) transfers packets from the Clk domain to the MacClk domain and includes essential logic to interface with 100G EMAC. When the Clk frequency is too low, it can result in a lower data transfer rate than EMAC interface, causing decreased performance. To avoid this issue, a Clk frequency of at least 220 MHz is recommended.

 

Receive Block

The Receive block contains the Rx data buffer, which stores the data received from the target device. The received data is stored in the buffer when the header in the packet matches the expected value, set by the network parameters inside the Reg module, and when the IP and TCP checksum are correct. If any of these conditions are not met, the received packet is rejected. Increasing the size of the Rx data buffer may improve the receive performance. Additionally, the TOE100G IP can reorder packets if only one packet is out of order. For example, if the packet order is #1, #3, #2 and #4 (where packet #2 is interchanged with packet#3), the TOE100G IP can fix the order. However, if more than one packet is out of order, such as in the case of packet#1, #3, #4, and #2 (where packet #3 and #4 are received before packet#2), the TOE100G IP is unable to reorder the packets. In this scenario, the data needs to be retransmitted, and duplicate ACK packets must be generated.

·       Async Buffer (Rx)

The Async buffer (Rx) forwards EMAC packets from the MacClk domain to the Clk domain, and includes logic for interfacing with the 100G EMAC. Like the Async Buffer (Tx), it is recommended to use a Clk frequency of at least 220 MHz to prevent performance drops in the transmit direction and avoid the Async buffer (Rx) from becoming full and discarding the received packets from the EMAC.

·       Packet Filtering

This module is responsible for verifying the header of the Rx packet to determine its validity. The packet will be valid if all following conditions are met.

(1)   The network parameters must match the values set in the Reg module, such as the MAC address, IP address, and Port number.

(2)   The packet must either be an ARP packet or a TCP/IPv4 packet without a data fragment flag.

(3)   The IP header length and TCP header length must be valid, with the IP length being equal to 20 bytes and the TCP header length being between 20 and 60 bytes.

(4)   Both the IP checksum and TCP checksum must be correct.

(5)   The data pointer, as decoded by the sequence number, must be within a valid range.

(6)   The acknowledge number must be within a valid range.

 

·       Packet Splitter

The purpose of this module is to extract TCP payload data from incoming packets and store it in the Rx data buffer, after removing the packet header.

·       Rx Data Buffer

The size of the Rx data buffer is determined by the “RxBufBitWidth” parameter of the IP and can range from 9 – 14 (32KB to 1MB). The size of the Rx data buffer is also applied as the window size of the transmit packet with a feature of window scaling. When the Rx data buffer is sufficiently large, the target device can send multiple data packets to the TOE100G IP without having to wait for an ACK packet, which may be delayed by the networking system. Consequently, a larger Rx data buffer can improve the receive performance.

The data is stored in the buffer until it is read by the user. If the user does not read the data from the buffer for long time, the buffer becomes full, and the target device can no longer send data to the IP, resulting in reduced performance. To achieve optimal received performance, it is recommended that the user logic reads the data from the IP as soon as it is available. By doing so, the Rx data buffer will not become full, and the receive performance will not be affected by the full window size.

 

User Block

The core engine of the user module can be designed by state machine to set the command and the parameters through the Register interface. Additionally, the status can be monitored to ensure that the operation has been completed without any errors. The data path can also be connected to the FIFO for sending or receiving data with the IP.

 

100G Ethernet (MAC) Subsystem

The 100G Ethernet (MAC) Subsystem implements the MAC layer with the low-layer protocol, but the interface and features vary depending on the FPGA model. Xilinx offers a 100G Ethernet Subsystem for the UltraScale+ device that implements both the MAC and Physical layers. This interface uses a 512-bit AXI4 stream running at 322.265625 MHz, allowing it to be directly connected with TOE100G IP. More information can be found on the following website.

https://www.xilinx.com/products/intellectual-property/cmac_usplus.html

For the Versal device, the 100G Ethernet MAC Subsystem implements the MAC layer and PCS logic but does not include the transceiver. The user interface, when using Non-Segmented mode and independent clock mode, is a 384-bit interface running at 390.625 MHz or higher. More details can be found on the following website.

https://www.xilinx.com/products/intellectual-property/mrmac.html

 

Core I/O Signals

Descriptions of all parameters and I/O signals are provided in Table 5 - Table 7. The EMAC interface is 512-bit AXI4 stream interface.

 

Table 5: Core Parameters

Name

Value

Description

TxBufBitWidth

9-14

Setting Tx data buffer size. The value is referred to address bus size of this buffer.

RxBufBitWidth

9-14

Setting Rx data buffer size. The value is referred to address bus size of this buffer.

 

Table 6: User I/O Signals (Synchronous to Clk)

Signal

Dir

Description

Common Interface Signal

RstB

In

Reset IP core. Active Low.

Clk

In

User clock. The clock frequency must be equal to or greater than 220 MHz to maintain good performance.

User Interface

RegAddr[4:0]

In

Register address bus. Valid when RegWrEn=1b in Write access.

RegWrData[31:0]

In

Register write data bus. Valid when RegWrEn=1b.

RegWrEn

In

Register write enable. Valid at the same clock as RegAddr and RegWrData.

RegRdData[31:0]

Out

Register read data bus. Valid in the next clock after RegAddr is valid.

ConnOn

Out

Connection Status. 1b: connection is opened, 0b: connection is closed.

TimerInt

Out

Timer interrupt. Asserted to 1b for 1 clock cycle when timeout is detected.

More details of Interrupt status are monitored from TMO[7:0] register.

RegDataA1[31:0]

Out

32-bit read value of CMD register (RegAddr=00001b). Bit[0] is TOE100G IP busy flag.

RegDataA8[31:0]

Out

32-bit read value of TDL register (RegAddr=01000b)

RegDataA9[31:0]

Out

32-bit read value of TMO register (RegAddr=01001b)

Tx Data Buffer Interface

TCPTxFfFlush

Out

Tx data buffer within the IP is reset.

Asserted to 1b when the connection is closed or the IP is reset.

TCPTxFfFull

Out

Asserted to 1b when Tx data buffer is full.

User needs to stop writing data within 4 clock cycles after this flag is asserted to 1b.

TCPTxFfWrCnt[13:0]

Out

Data counter in 512-bit unit of Tx data buffer to show the amount of data in Tx data buffer.

TCPTxFfWrEn

In

Write enable to Tx data buffer. Asserted to 1b to write data to Tx data buffer.

TCPTxFfWrData[511:0]

In

Write data to Tx data buffer. Valid when TCPTxFfWrEn=1b.

Rx Data Buffer Interface

TCPRxFfFlush

Out

Rx data buffer within the IP is reset.

Asserted to 1b when the connection is opened.

TCPRxFfRdCnt[13:0]

Out

Data counter of Rx data buffer to show the number of received data in 512-bit unit.

TCPRxFfLastRdCnt[5:0]

Out

Remaining byte of the last data in Rx data buffer when total amount of received data in the buffer is not aligned to 64-byte unit. User cannot read the data until all 64-byte data is received.

TCPRxFfRdEmpty

Out

Asserted to 1b when Rx data buffer is empty.

User needs to stop reading data immediately when this signal is asserted to 1b.

TCPRxFfRdEn

In

Asserted to 1b to read data from Rx data buffer.

TCPRxFfRdData[511:0]

Out

Data output from Rx data buffer.

Valid in the next clock cycle after TCPRxFfRdEn is asserted to 1b.

 

Table 7: EMAC I/O Signals (Synchronous to MacClk)

Signal

Dir

Description

MacClk

In

The user interface clock of the EMAC has different frequencies depending on the device. For the 100G Ethernet Subsystem on the UltraScale+ device, the clock frequency is 322.265625 MHz. For the 100G Ethernet MAC Subsystem on the Versal device, the clock frequency is 390.625 MHz.

tx_axis_tdata[511:0]

Out

Transmitted data. Valid when tx_axis_tvalid=1b.

tx_axis_tkeep[63:0]

Out

The byte enable of transmitted data. Valid when tx_axis_tvalid=1b.

tx_axis_tvalid

Out

Valid signal of transmitted data.

tx_axis_tlast

Out

Control signal to indicate the final word in the frame. Valid when tx_axis_tvalid=1b.

tx_axis_tuser

Out

Control signal to indicate an error condition. This signal is always 0b.

tx_axis_tready

In

Handshaking signal. Asserted to 1b when tx_axis_tdata has been accepted.

rx_axis_tdata[511:0]

In

Received data. Valid when rx_axis_tvalid=1b.

rx_axis_tvalid

In

Valid signal of received data.

rx_axis_tlast

In

Control signal to indicate the final word in the frame. Valid when rx_axis_tvalid=1b.

rx_axis_tuser

In

Control signal asserted at the end of received frame (rx_axis_tvalid=1b and rx_axis_tlast=1b) to indicate that the frame has CRC error.

0b: normal packet, 1b: error packet.

rx_axis_tready

Out

Handshaking signal. Asserted to 1b when rx_axis_tdata has been accepted.

Typically, the rx_axis_tready signal is always asserted to 1b. However, when the Clk frequency is too low, the available space of the Async buffer (Rx) may be insufficient to store a packet, and as a result, rx_axis_tready may be de-asserted to 0b after receiving the end of packet. The signal is then re-asserted to 1b when the buffer has enough free space to store a packet of maximum size.

 

Timing Diagram

 

IP Initialization

After the RST register value is changed from 1b to 0b, the initialization of TOE100G IP is initialized. Three modes can be executed, Client mode (SRV=00b), Server mode (SRV=01b), or Fixed MAC mode (SRV=1Xb). The information on each mode is presented in the timing diagram below.

 

 

Figure 4: IP Initialization in Client mode

 

As shown in Figure 4, in Client mode, the TOE100G IP sends an ARP request packet and waits for an ARP reply packet returned from the target device. Target MAC address is extracted from ARP reply packet. Upon completion, the Busy signal (bit0 of RegDataA1) is de-asserted to 0b.

 

 

Figure 5: IP Initialization in Server mode

 

As shown in Figure 5, after reset process in Server mode is completed, the TOE100G IP waits for an ARP request packet from the target device. Upon receipt, the TOE100G IP generates an ARP reply packet. The Target MAC address is extracted from ARP request packet. Once the ARP reply packet has been transmitted, the Busy signal is de-asserted to 0b.

 

 

Figure 6: IP Initialization in Fixed mode

 

As shown in Figure 6, after reset process in Fixed MAC mode is completed, the TOE100G IP updates all parameters from the registers. The Target MAC address is loaded from DML and DMH register. Once this process is finished, the Busy signal is de-asserted to 0b.

 

Register Interface

The Register interface is responsible for setting and monitoring all control signals and network parameters during operation. The timing diagram of the interface is similar to that of Single-port RAM, which shares the address bus for write and read access, and has a read latency time of one clock cycle. A Register map of this interface is provided in Table 3.

As shown in Figure 7, to write to the register, the user sets RegWrEn to 1b with the valid values for RegAddr and RegWrData. Before setting RegWrEn to 1b, please confirm that RstB is de-asserted to 1b for at least 4 clock cycles. To read from the register, the user only sets RegAddr, and RegRdData becomes valid in the next clock cycle.

 

 

Figure 7: Register interface timing diagram

 

As shown in Figure 8, before setting the CMD register to initiate a new command operation, the Busy flag must be equal to 0b to confirm that IP is in Idle status. After setting the CMD register, the Busy flag is asserted to 1b and de-asserted to 0b when the command is completed.

 

 

Figure 8: CMD register timing diagram

 

 

Tx FIFO Interface

Tx FIFO interface provides two control signals for the flow control, the full flag (TCPTxFfFull) and the write data counter (TCPTxFfWrCnt). TCPTxFfWrCnt is updated two clock cycles after asserting TCPTxFfWrEn. TCPTxFfFull serves as an indicator of when the internal buffer is almost full and is asserted before it reaches its capacity. It is recommended to pause sending data within four clock cycles after TCPTxFfFull is asserted. Figure 9 shows an example timing diagram for the Tx FIFO interface.

 

 

Figure 9: Tx FIFO interface timing diagram

 

(1)   Before asserting TCPTxFfWrEn to 1b to write the data to TOE100G IP, the full flag (TCPTxFfFull) must not be asserted to 1b and ConnOn must be equal to 1b. To write the data, assert TCPTxFfWrEn to 1b along with TCPTxFfWrData.

(2)   If TCPTxFfFull is asserted to 1b, TCPTxFfWrEn must be de-asserted to 0b within four clock cycles to pause sending data.

(3)   When there is no more data for transferring, the connection may be terminated by active or passive mode. After the port is closed, the following situations are found.

a)     ConnOn changes from 1b to 0b.

b)     TCPTxFfFlush is asserted to 1b to flush all data inside TxFIFO for a while and then de-asserted to 0b.

c)     TCPTxFfWrCnt is reset to 0.

d)     TCPTxFfFull is asserted to 1b to block the new user data and then de-asserted to 0b, similar to TCPTxFfFlush.

 

 

Rx FIFO Interface

The Rx FIFO interface is used to retrieve data stored in the Rx data buffer. To determine if data is available for reading, the Empty flag (TCPRxFfEmpty) is monitored, and the read enable signal (TCPRxFfRdEn) is then asserted to access the data, like a typical FIFO read interface, as illustrated in Figure 10.

 

 

Figure 10: Rx FIFO interface timing diagram by using Empty flag

 

(1)   Check the TCPRxFfEmpty flag to confirm data availability. When data is ready (TCPRxFfEmpty=0b), set TCPRxFfRdEn to 1b to read data from the Rx data buffer.

(2)   The TCPRxFfRdData signal is valid in the next clock cycle.

(3)   Reading data must be immediately paused by setting TCPRxFfRdEn=0b when TCPRxFfEmpty is equal to 1b.

(4)   The user must read all data from the Rx data buffer before creating a new connection. When a new connection is established, all data in the Rx data buffer is flushed, and TCPRxFfFlush is set to 1b. Once the new connection is completed, the ConnOn value changes from 0b to 1b.

(5)   After finishing the Flush operation, TCPRxFfEmpty is asserted to 1b.

 

 

Figure 11: Rx FIFO interface timing diagram by using read counter

 

When the user logic reads data in burst mode, the TOE100G IP provides a read data counter signal to indicate the total amount of data stored in the Rx data buffer in 512-bit unit. For instance, in Figure 11, there are five units of data available in the Rx data buffer. Therefore, the user can set TCPRxFfRdEn to 1b for five clock cycles to read all the data from the Rx data buffer. The latency time to update TCPRxFfRdCnt after setting TCPRxFfRdEn to 1b is two clock cycles.

 

EMAC Interface

EMAC interface of TOE100G IP utilizes a 512-bit AXI4-stream interface to transmit packets. When sending a packet, the TOE100G IP sets tx_axis_tvalid signal to 1b and sets the associated signals (tx_axis_tdata, tx_axis_tkeep, and tx_axis_tlast) to their valid values. During data transmission, the TOE100G IP can temporarily pause the transmission by setting tx_axis_tready to 0b if the target EMAC is not ready to accept the data. Figure 12 provides additional details about the EMAC interface for the Transmit direction.

 

 

Figure 12: Transmit EMAC interface timing diagram

 

(1)   Upon transmitting a packet, the TOE100G IP asserts tx_axis_tvalid to 1b along with the first data on tx_axis_tdata.

(2)   During packet transmission, if the target EMAC is not ready to receive data, tx_axis_tready is de-asserted to 0b. In such cases, the TOE100G IP holds the same value of all signals until tx_axis_tready is re-asserted to 1b.

(3)   When the final data of the packet is transmitted, both tx_axis_tlast and tx_axis_tvalid are asserted to 1b. According to the EMAC specification, tx_axis_tvalid must always remain asserted to 1b during packet transmission and cannot be de-asserted to 0b before the end of packet is transmitted.

 

The Receive EMAC interface of the TOE100G IP can handle discontinuous data stream of a packet, similar to the Transmit EMAC interface. Depending on the frequency of the Clk signal, the behavior of rx_axis_tready can vary. When the Clk signal frequency is equal to or greater than the recommended value (220 MHz) and the data stream is not continuous transferred using small packet size (less than 1408 bytes), rx_axis_tready is always asserted to 1b for receiving data from the EMAC, as shown in Figure 13. However, If the frequency of Clk signal is too low and the packet is continuously transferred, rx_axis_tready may be de-asserted to 0b after receiving the final data of a packet. This occurs when there is insufficient free space in the Async buffer (Rx) to store a packet of the maximum size, as shown in Figure 14.

 

 

Figure 13: Receive EMAC interface timing diagram (Normal)

 

(1)   The TOE100G IP detects the new received packet when when rx_axis_tvalid changes from 0b to 1b. In the same clock cycle, the first data is valid on rx_axis_tdata, and EMAC keeps asserting rx_axis_tvalid to 1b for the continuous transfer of the data packet.

(2)   During the transfer of a data packet, rx_axis_tvalid can be de-asserted to 0b to pause the data transfer. The data transfer resumes when rx_axis_tvalid is re-asserted to 1b.

(3)   The end of the packet is detected when both rx_axis_tlast and rx_axis_tvalid are asserted to 1b. In this cycle, the final data of the packet is valid on rx_axis_tdata.

(4)   In normal case, rx_axis_tready is always asserted to 1b because the TOE100G IP can process all packets in time.

(5)   After the final data of a packet has been transferred, EMAC asserts rx_axis_tvalid to 1b for transferring the first data of the next packet.

 

 

Figure 14: Receive EMAC interface timing diagram (Data lost)

 

(1)   To ensure that all data in a packet is received from the EMAC, the TOE100G IP always asserts rx_axis_tready to 1b during packet transmission. This signal is de-asserted to 0b only after the final data of the packet is completely transferred.

(2)   If the frequency of the Clk signal is too low and the packet is continuously transferred, it may cause the Async buffer (Rx) inside the TOE100G IP to not have enough free space to store the next packet. rx_axis_tready is de-asserted to 0b after receiving the final data of the packet (when rx_axis_tlast=1b and rx_axis_tvalid=1b).

(3)   If rx_axis_tready is de-asserted to 0b, the TOE100G IP discards any incoming packets.

(4)   After the Async buffer (Rx) has enough free space, rx_axis_tready is re-asserted to 1b to indicate that the TOE100G IP is ready to receive and process the next packet from the EMAC.

 

 

Example usage

 

Client mode (SRV[1:0] = 00b)

The steps to set the registers for transferring data in Client mode are outlined below.

1)     Set the RST register=1b to reset the IP.

2)     Set the SML/SMH for MAC address, DIP/SIP for IP address, and DPN/SPN for port number.

Note: DPN is an optional setting when the port is opened by IP (Active open).

3)     Set RST register=0b to start the IP initialization process. The TOE100G IP will send an ARP request packet to get the Target MAC address from the ARP reply packet. The Busy signal is de-asserted to 0b after completing the initialization process.

4)     The new connection can be created by two modes.

a.     Active open: Write CMD register = “Open connection” to create the connection (SYN packet is firstly sent by TOE100G IP). After that, wait until Busy flag is de-asserted to 0b.

b.     Passive open: Wait until “ConnOn” signal = 1b (the target device sends SYN packet to TOE100G IP firstly).

5)     a. For sending data, set TDL register (total transmit length) and PKL register (packet size). Then, set CMD register = “Send Data” to start data transmission. The user can send the data to TOE100G IP via the TxFIFO interface before or after setting the CMD register. Once the command is finished, the Busy flag is de-asserted to 0b. The user can set a new value to the TDL/PKL register and then set CMD register = “Send Data” to start the next transmission.

b. For receiving data, the user should monitor RxFIFO status and read the data until RxFIFO is empty.

6)     Similar to creating the connection, the connection can be terminated by two modes.

a.     Active close: Set CMD register = “Close connection” to close the connection (FIN packet is firstly sent by TOE100G IP). After that, wait until Busy flag is de-asserted to 0b.

b.     Passive close: Wait until “ConnOn” signal = 0b (FIN packet is sent from the target to TOE100G IP firstly).

 

Server mode (SRV[1:0] = 01b)

In Server mode, the MAC address is decoded from ARP request packet instead of ARP reply packet as in Client mode. However, the process for transferring data is the same as that of Client mode. The following steps illustrate an example of Server mode.

1)     Set RST register=1b to reset the IP.

2)     Set SML/SMH for MAC address, DIP/SIP for IP address, and DPN/SPN for port number.

3)     Set RST register=0b to begin the IP initialization process by waiting for an ARP request packet to get the Target MAC address. The IP then creates an ARP reply packet to return to the target device. Once the initialization process is completed, the Busy signal is de-asserted to 0b.

4)     The remaining steps are the same as step 4 – 6 of Client mode.

 

Fixed MAC mode (SRV[1] = 1b)

In Fixed MAC mode, the MAC Address of the target device is loaded from DML and DMH register. The process for transferring data is the same as that of Client and Server mode. The following steps provide an example of how to run TOE100G IP in Fixed MAC mode.

1)     Set RST register=1b to reset the IP.

2)     Set SML/SMH for MAC address of TOE100G IP, DML/DMH for MAC address of the target device, DIP/SIP for IP address, and DPN/SPN for port number.

3)     Set RST register=0b to begin the IP initialization process. Once initialization is completed, the busy signal will be de-asserted to 0b.

4)     The remaining steps are the same as step 4 – 6 of Client mode.

 

 

PKL and TDL setting in Send command

When executing the Send command, the TOE100G IP can operate in two modes based on the value of TDL compared to N times of PKL. The details for each mode are described as follows.

 

TDL = N times of PKL

 

 

Figure 15: TCP packet when TDL = N times of PKL

 

If TDL value is equal to N times of PKL value, the user data is split into N packets and transmitted to the target device, as shown in Figure 15. If the target device responds with an ACK packet for each TCP packet, there will be N ACK packets in the network system. To improve network performance, several ACK packets can be combined into be one packet using the TCP delayed ACK technique. Therefore, the number of ACK packets returned from the target device (M) may be less than the number of data packets from TOE100G IP (N) when running the Send command. The PSH[0] value does not affect this condition. The last data packet (TCP Data#N) is sent only once.

 

TDL = N times of PKL + Residue

 

 

Figure 16: TCP packet when TDL = (N times of PKL) + Residue

 

If TDL value is not equal to N times of PKL value, the data sent to the target device is split into N packets of PKL-byte data and one last packet that contains Res-byte data, as shown in Figure 16. The first step is similar to the condition where TDL is equal to N times of PKL. The IP needs to receive an ACK packet from the target device to confirm that all N-packets have been received completely. After that, the last packet, which contains the residue byte data, is sent to the target device. If the PSH[0] register is set to 0b (default value), the residue packet is sent twice. Otherwise, the last packet is sent only once. The Send command is completed when the target returns an ACK to confirm that the last packet have been received.

Note: If target device is running on an OS that enables the delayed ACK feature, the ACK#M packet, which confirms the acceptance of TCP Data#N, may arrive too late due to timeout condition in some conditions. Therefore, the target device needs to disable the delayed ACK feature or the TDL value should be aligned to PKL value in systems that are sensitive to this latency time.

 

Connection termination of unusual case

 

 

Figure 17: Terminate connection sequence

 

The process of terminating a connection in the normal case is illustrated in Figure 17, where four packets are exchanged between two devices. The first device (Device#0) initiates the connection termination by sending a FIN packet. If the second device (Device#1) agrees to terminate the connection, it responds with an ACK and FIN packet, which may be sent together in one packet or in separate packets. Finally, Device#0 confirms the termination by sending an ACK packet. The TOE100G IP can execute the close connection in two modes, Active and Passive. This section describes the operation of TOE100G IP in some unusual cases.

  1. In the Active mode, TOE100G IP sends a FIN packet to initiate the close and expects to receive ACK and FIN packets from the target. Assumed that a FIN packet sets sequence number (SeqNum) to be N and an acknowledge number (AckNum) to be M, the expected ACK and FIN packet must contain SeqNum=M and AckNum=N+1. If TOE100G IP does not receive the expected packets until timeout (set by the TMO register), it sends a RST packet to terminate the connection immediately without 16 retry times. TOE100G IP also asserts TimerInt and TMO[3] to 1b.
  2. If TOE100G IP receives new data from the target while executing the Active close command, it rejects the data and still waits for the expected ACK and FIN packets. Similar to the first case, if the expected packets are not received until the timeout, TOE100G IP sends the RST packet to terminate the connection.
  3. In the Passive mode, while TOE100G IP is transmitting data to the target, it receives a FIN packet from the target to terminate the connection. TOE100G IP sends an ACK and FIN packet in response, with SeqNum set to the most recently confirmed data acceptance value. After the termination of the connection, the ConnOn and Busy outputs are set to 0b. The user can check the amount of untransmitted data in the TDL register.

 

Verification Methods

The TOE100G IP Core functionality was verified by simulation and also proved on real board design by using KCU116 board, ZCU111 board, Alveo U250 Accelerator card, Silicom FB2CGHH@KU15P board, and VCK190 board.

 

Recommended Design Experience

User must be familiar with HDL design methodology to integrate this IP into their design.

 

Ordering Information

This product is available directly from Design Gateway Co., Ltd. Please contact Design Gateway Co., Ltd. For pricing and additional information about this product using the contact information on the front page of this datasheet.

 

Revision History

Revision

Date

Description

2.0

11-May-2023

Support TCP window scaling feature, add more selectable buffer size, add TCPTxFfWrCnt signal, update TCPRxFfRdCnt, and add Connection termination of unusual case section.

1.2

25-May-2022

Support VCK190

1.1

27-Apr-2021

Add Silicom FB2CGHH@KU15P board and PKL/TDL setting topic

1.0

24-Feb-2021

New release