NVMeTCP-IP Core for 25G Datasheet

Features. 1

Applications. 2

General Description. 3

Functional Description. 4

NVMe/TCP. 5

·       Register File. 5

·       Admin Command Handler 6

·       IO Command Handler 6

·       Read Buffer 7

TCP/IP. 7

·       Admin TCP/IP Controller 7

·       IO TCP/IP Controller 8

·       EMAC IF. 8

User Logic. 8

25G Ethernet System (MAC + PHY) 8

Core I/O Signals. 9

Timing Diagram.. 12

Reset Process. 12

Connection Establishment 13

Write Command. 15

Read Command. 17

Mixed Write and Read Commands. 19

Connection Termination. 20

Error 21

EMAC Interface. 22

Verification Methods. 24

Recommended Design Experience. 24

Ordering Information. 24

Revision History. 24

 

 

  Core Facts

Provided with Core

Documentation

User Guide, Design Guide

Design File Formats

Encrypted File

Instantiation Templates

VHDL

Reference Designs & Application Notes

Vivado Project,

See Reference design manual

Additional Items

Demo on KCU116, VCK190, FB2CGHH@KU15P, Alveo U50

Support

Support Provided by Design Gateway Co., Ltd.

 

Design Gateway Co., Ltd

E-mail:  ip-sales@design-gateway.com

URL:     design-gateway.com

Features

·    Protocol Support: Implement an NVMe over TCP host controller (Initiator) based on NVMe-oF specification rev 1.1 and NVMe specification rev 1.4

·    Target Access: Enable access to an NVMe SSD at the target (Subsystem) using a specified NVMe name (NQN)

·    Command Support: Write and Read commands

·    Performance: High performance operations with write and read speeds of approximately 1900 MB/s

·    Data Interface: Memory-mapped interface

·    Data Size per Command: Support a fixed data size of 4 KB

·    Maximum Command: Up to 256 commands, limited by Read buffer size for the Read command

·    Read Buffer Configurable: Configurable size ranging from 64 KB (16 Cmds) to 1 MB (256 Cmds)

·    Target Compatibility: Compatible with the NVMe/TCP targets that meet the following criteria

·   I/O Queue Command Capsule Support Size (IOCCSZ): At least 260 (104h)

·   Maximum Queue Entries Supported (MQES): At least 511 (1FFh)

·   Maximum Outstanding Commands (MAXCMD): At least 256 (100h)

·   Controller PDU Data Alignment (CPDA): Support value 0, 1, 2, or 5

·   Authentication: No authentication required

·   Logical Block Addressing (LBA) unit: 512 bytes

·    Networking: 25G Ethernet speed utilizing jumbo frame packets

·    Ethernet MAC Interface: 64-bit AXI4 Stream interface operating at 390.625 MHz

·    User Clock Frequency: Must be equal to or greater than 195.3125 MHz (Ethernet MAC clock frequency/2)

·    Available Reference Designs: KCU116, VCK190, Silicom FB2CGHH@KU15P, and Alveo U50

·    Customized Option: Support Ethernet packet transfer using non-jumbo frame size

 

 

Table 1 Example Implementation Statistics (UltraScale+)

Family

Example Device

 

Read BufSize

 

 

Fmax (MHz)

 

 

CLB

Regs

 

 

CLB

LUTs

 

CLB1

BRAM

Tile

URAM

Design Tools

Kintex-UltraScale+

XCKU5P-FFVB676-2E

64KB (min)

350

20,861

15,038

3,517

26

4

Vivado2022.1

1MB (max)

260

21,031

15,285

3,633

26

3

Vivado2022.1

Notes:

1)      The actual logic resource depends on the percentage of unrelated logic.

 

Table 2 Example Implementation Statistics (Versal)

Family

Example Device

 

Read BufSize

 

 

Fmax (MHz)

 

 

CLB

Regs

 

 

CLB

LUTs

 

Slice1

BRAM

Tile

URAM

Design Tools

Versal AI Core

XCVC1902-VSVA2197-2MP-ES

64KB (min)

350

20,914

15,532

4,499

25

4

Vivado2022.1

1MB (max)

350

21,076

15,673

4,438

25

3

Vivado2022.1

Notes:

1)      The actual logic resource depends on the percentage of unrelated logic.

 

Applications

 

Figure 1 NVMeTCP IP for 25G Application

 

The NVMeTCP25G IP core is designed to be integrated into FPGA platforms, serving as a host controller that enhances data transfer capabilities. It is adept at facilitating rapid data movement from sensors and other data-generating devices to server-based storage systems through a 25G Ethernet network leveraging the NVMe/TCP protocol. This capability is particularly vital in environments where real-time data acquisition and storage are crucial.

Moreover, the flexibility of the NVMeTCP25G IP core is highlighted in its ability to support simultaneous data transfer from multiple host systems to a singular storage server. This multiplexing ability not only streamlines the data transfer process but also introduces a layer of operational efficiency, as illustrated in Figure 1 of the document.

The NVMeTCP25G IP utilizes the NVM Qualified Name (NQN) to identify SSDs, so the IP core ensures that data is routed accurately to the intended storage destination. However, it is noted that switching between target SSDs requires the current NVMeTCP25G connection to be terminated before establishing a new link.

 

General Description

 

Figure 2 NVMeTCP IP for 25G Block Diagram

 

The NVMeTCP25G IP plays a crucial role as a host controller, also referred to as an NVMe/TCP initiator, enabling access to an SSD within NVMe/TCP target through a 25G Ethernet connection. Its user interface includes three primary interfaces: the Parameters interface, the Control interface, and the Memory Map interface. Each interface has a distinct function: the Parameters interface configures network settings and timeout durations for establishing connection; the Control interface manages and tracks the operational status and errors, overseeing connection establishment and termination process; and the Memory Map interface facilitates data transfer, managing address allocation for Write and Read operations.

Internally, the NVMeTCP25G IP establishes two crucial TCP connections to communicate with the NVMe/TCP target. The Admin connection is responsible for the setup and termination of the connections and maintaining the link with Keep Alive commands. On the other hand, the IO connection processes Write and Read commands. Each connection is managed by separate TCP/IP Controllers and Command Handlers, ensuring dedicated and efficient control. The EMAC IF acts as a multiplexer, channeling Ethernet packets from both connections through the same EMAC.

The IP’s Register File interface is another component, receiving user inputs to initiate the connection process, accompanied by a Read buffer that reorders incoming data from Read commands to maintain the same sequence of command requests. The IO Command Handler can support up to 256 Write/Read commands, with the maximum number of commands queued being limited by the Read buffer’s size, which is a trade-off between read performance and resource consumption.

The NVMeTCP25G IP operates within two distinct clock domains - the EMAC interface clock and the user logic clock. To accommodate 25G Ethernet speeds, the EMAC interface is clocked at 390.625 MHz with 64-bit data width. The user logic, on the other hand, is required to at least match this bandwidth. This necessitates a user clock frequency of at least 195.3125 MHz, utilizing a 128-bit data bus to ensure efficient data handling and system performance.

For 25G Ethernet System implementations, various options are available connecting to 64-bit EMAC I/F of NVMeTCP25G IP. However, the characteristic and interface requirement are also varied across these different Ethernet System models. This leads to the necessity to include an adapter logic (MAC Adapter) connecting between NVMeTCP25G IP and some 25G Ethernet Systems for seamlessly data transfer.

The reference designs on FPGA evaluation boards are available for evaluation before purchasing.

 

Functional Description

The operation sequence of the NVMeTCP25G IP post-reset is presented in Figure 3, illustrating two main phases: ‘No connection’ and ‘Connection ready’. Initially, users activate the connection, waiting for its successful establishment, a process that includes setting up two TCP ports. Once these ports are successfully initialized, the IP transitions from the ‘No connection’ phase to ‘Connection ready’ phase.

During the ‘Connection ready’ phase, the IP manages two types of commands: Admin and IO. The Admin command type, which includes the Keep Alive command, operates continuously in the background. In contrast, the IO command type, encompassing Write and Read commands, allows user execution. Users can initiate multiple Write and Read commands, subject to a limit of 256 commands or until the Read buffer is full.

If there is no further data to transmit, users can terminate the connection. This process moves the IP back to the ‘No connection’ phase once the termination is successfully completed.

 

 

Figure 3 NVMeTCP25G IP Operation Flow

 

1)     The IP awaits the establishment of the Ethernet link. Once established, the Ethernet System is ready for packet transfer.

2)     User sets HostConnEn to 1b, initiating the connection establishment. This transition from 0b to 1b for HostConnEn activates this process.

3)     The Admin port, forming the initial communication pathway with the target, is established. With the Admin TCP connection ready, NVMe/TCP Protocol Data Units (PDUs) is transferred between the NVMeTCP25G IP and the target. Following a successful response from the target acknowledging the connection, the IO port is set up, connection for Write/Read command execution. Completion of this step signifies the IP’s readiness, transitioning to the ‘Connection ready’ phase.

Note: The NVMe Qualified Name (NQN) exchange between host and target must be unique values, set via HostNQN and TrgNQN (NVMeTCP25G IP input/parameters).

4)     In ‘Connection ready’ phase, the Keep Alive command is dispatched at user-defined intervals. If a timeout occurs without a response, an error is asserted, prompting the termination of both Admin and IO ports.

5)     To deactivate, users switch HostConnEn to 0b when the IP is idle (HostBusy is 0b), triggering the termination process for both ports and transitioning the IP to ‘No connection’ status.

6)     In the ‘Connection ready’ phase, initiating a Write command involves setting HostMMWrite to 1b, followed by data transmission. After initiating the command, proceed to step 8) to check if the IO command queue has sufficient space for additional commands.

7)     Similarly, initiating a Read command with HostMMRead set to 1b requires checking the Read buffer’s remaining capacity. If space is insufficient, the process waits until more space becomes available.

8)     The remaining transfer size is monitored. If no more data needs to be transferred, the IP switches to Idle. Otherwise, it ensures the IO command queue does not exceed 256 commands before issuing additional Write or Read commands. If the queue is full, the process pauses until enough space is freed.

The NVMeTCP IP is designed with a dual-layer hardware structure to independently manage two distinct protocols: NVMe/TCP and TCP/IP. The Command Handler is responsible for NVMe/TCP operations, while the TCP/IP Controller manages TCP/IP functions.

 

NVMe/TCP

The NVMe/TCP protocol implementation encompasses three protocol layers: NVMe, NVMe over Fabrics (NVMe-oF), and NVMe/TCP transport layer. This hardware layer acts as a bridge between the user interface and the TCP/IP Controller, ensuring that both Admin and IO commands are processed individually. Below are further details on the submodules within the NVMe/TCP layer.

 

·       Register File

This submodule is used for storing both system and network parameters set by the user, which are required for communication with the target, like the IP address. These parameters are inputted through the Parameters interface and are required for the generation and interpretation of PDUs exchanged between the Command Handlers (Admin and IO) and the target device. Additionally, the Register File holds system-specific parameters, such as the timeout duration to wait for a response from the target.

 

·       Admin Command Handler

The Admin Command Handler is a submodule, designated to manage the Admin connection, which precedes the IO connection setup. Its operations unfold in a sequential three-step process to ensure efficient connection management.

Initially, the Admin Command Handler undertakes the connection establishment process, involving an extensive exchange of packets. Following the establishment, the submodule engages in periodic execution of the Keep Alive command. This command is crucial for maintaining the connection’s active status, facilitating uninterrupted Write and Read command operations. Users can set the Keep Alive interval, ranging from 0 to 3600 seconds. It’s noted that initiating the Keep Alive command temporarily interrupts packet transfer for Write/Read commands, which may slightly impact performance. The final step involves the execution of the connection termination process, effectively deactivating the connection when necessary.

To support these functions, the Admin Command Handler is structured into three subblocks, the control engine, the transmit (Tx) module, and the receive (Rx) module. The control engine orchestrates the sequence of the PDU transmission for each process. The Tx module generates the PDU during the connection initialization, termination, command transmission, and data transmission. Concurrently, the Rx module interprets incoming PDUs to confirm the success of connection initialization, command transmission, or data reception. If the control engine encounters a timeout while waiting for a PDU, it triggers the termination of both Admin and IO connections.

 

·       IO Command Handler

The IO Command Handler is a module to manage data transfer operations within the NVMeTCP IP, specifically handling Write and Read commands to optimize data transfer performance. This module manages the flow of data, ensuring that commands are processed efficiently and in the correct order.

The module’s functionality is distributed across four main processes: establishing connection, executing Write commands, processing Read commands, and terminating connections. This submodule shares the same structural design with the Admin Command Handler, comprising the control engine, the Tx module, and the Rx module. This design facilitates an approach to manage the various stages of data transfer.

The IO Command Handler has a queue memory, which can store up to 256 Write and Read commands. This capacity is critical for buffering commands and managing the flow of data. However, it is noted that data returned from the target may not always align with the sequence of Read command requests. To address this, the Read buffer is employed to reorder incoming data, ensuring that the data sequence presented to the user matches the original order of the Read command requests. When the Read buffer reaches its capacity, the IP de-asserts the ready signal, preventing additional command requests.

Additionally, the IO Command Handler include a timing mechanism to monitor the wait time for receiving PDU. If a timeout occurs, indicating that a response has not been received within the expected timeframe, the system terminates both Admin and IO connections.

 

·       Read Buffer

The Read buffer is designed to manage and store incoming data from Read commands. The buffer’s size is configured through the “RdBufDepth” parameter, which can be set to a value ranging from 4 to 8, as illustrated in Table 3.

 

Table 3 RxBufDepth Parameter Description

RdBufDepth

Buffer size

Maximum Read Commands

Estimated Read Performance*(1)

4

64 KB

16

400 MB/s

5

128 KB

32

544 MB/s

6

256 KB

64

916 MB/s

7

512 KB

128

1276 MB/s

8

1 MB

256

1900 MB/s

Remark *(1): Estimated read performance is the result in a specific test environment.

 

The Read buffer size determines how many Read commands can be queued and processed. For instance, setting ‘RxBufDepth’ to 8 allows the buffer to allocate 1 MB of space, which is adequate to handle 256 Read commands (each command involves 4 KB of data). Users need to balance resource utilization against read performance. The size of the buffer impacts the number of Read commands that can be stored and processed, influencing the maximum read throughput. Opting for a larger buffer size can improve read performance, not write performance.

 

TCP/IP

The TCP/IP layer is a fundamental component in the NVMeTCP IP, facilitating the transmission of NVMe/TCP protocol packets over the network. Using TCP/IP achieves reliable Ethernet packet transfer. Within this layer, two dedicated modules are designed to handle two distinct TCP ports (Admin and IO) simultaneously, alongside an integrated EMAC IF for packet multiplexing and forwarding to the Ethernet MAC through a 64-bit AXI4-Stream interface.

 

·       Admin TCP/IP Controller

This controller is segmented into three main logic components: the main controller, the Tx module, and the Rx module. The controller manages the opening and closing of ports and overseeing data transmission and reception. The Tx module encapsulates the PDU into an Ethernet packet, appending TCP, IP, and Ethernet headers to ensure proper transmission. Conversely, the Rx module decodes the Ethernet packet, extracting the TCP payload, and conveying it back to the Admin Command Handler.

The TCP/IP provides a lossless data transmission by supporting data retransmission for recovering lost data. Additionally, flow control is implemented by monitoring the available space in the receiver’s buffer before dispatching further data, preventing buffer overflow.

This controller decodes the MAC address of the NVMe/TCP target using two methods depending on TrgMACMode, input from user. In Fixed-MAC mode, the target MAC address is fed directly by user, while in ARP mode, the TCP/IP controller uses an ARP packet to translate the MAC address of the NVMe/TCP target from IP address.

 

·       IO TCP/IP Controller

The IO TCP/IP Controller is designed for high-speed data transfers, particularly for managing Write and Read commands integral to the operation of the NVMeTCP IP. This demand for speed translates to increased resource utilization, due to the necessity for larger buffer sizes. A sufficiently sized buffer is crucial to ensure that data transfers with the NVMe/TCP target are executed at peak performance. The controller is optimized for using jumbo frames to enhance data transmission efficiency. However, for environments where jumbo frames are not supported, contact our sales team for further information.

 

·       EMAC IF

The EMAC IF manages the shared Ethernet System utilized by both the Admin and IO TCP/IP Controllers. It functions as a switch, selecting which TCP/IP Controller is active at any given moment, ensuring efficient packet transfer management. The switching mechanism is designed to alternate the active module following the complete transfer of a packet.

 

User Logic

The user interface of the NVMeTCP25G IP employs a Memory-mapped interface for executing both Write and Read commands. The core engine of user logic can be designed by a state machine, managing the connection processes, including establishment and termination. This state machine is also responsible for data transfer operation, where it handles to issue the Write and Read requests while managing address value assignments.

If the parameters remain constant, users can assign using constant values in the HDL code, simplifying the design. Additionally, the data flow interface utilizes ‘valid’ and ‘ready’ signals.

 

25G Ethernet System (MAC + PHY)

The 25G Ethernet System, integral to the NVMeTCP25G IP, encompasses the Media Access Control (MAC) and the Physical (PHY) layers, providing a robust foundation for Ethernet-based data transmission. The EMAC layer can be implemented by various solutions, both Soft IP core and Hard IP core. The PHY layer is typically provided at no additional cost by FPGA vendor. The solutions of Ethernet System are described below.

Solution 1: DG 25GEMAC/PCS+RS-FEC IP                                         (MAC Adapter is not required)

https://dgway.com/products/IP/GEMAC-IP/dg_xxvgmacrsfecip_data_sheet_xilinx/

Solution 2: 10G/25G Ethernet Subsystem (Soft IP)                                (MAC Adapter is required)

https://www.xilinx.com/products/intellectual-property/ef-di-25gemac.html

Solution 3: Versal Multirate Ethernet MAC Subsystem (Hard IP)             (MAC Adapter is required)

https://www.xilinx.com/products/intellectual-property/mrmac.html

 

Core I/O Signals

Descriptions of all parameters and I/O signals are provided in Table 4 - Table 6.

 

Table 4 Core Parameters

Name

Value

Description

HostNQNH

[1783:128]

Unicode char

Represent the upper 207 bytes of the NVMe Qualifed Name (NQN) of the host system.

Note: If the HostNQN fits within 16 bytes, assignable to HostNQNL input size, then HostNQNH should be set to all zeros.

KeepAliveSet

0 – 3600

Specify the interval time for sending Keep Alive in seconds. Setting this to 0 to disable the Keep Alive transmission feature.

RdBufDepth

4 – 8

Configure the Read buffer size. Refer to Table 3 for additional details.

 

Table 5 User I/O Signals (Synchronous to Clk)

Signal name

Dir

Description

Common Interface

RstB

In

Reset IP core. Active Low.

Clk

In

User clock for running NVMeTCP25G IP. The frequency must be more than or equal to 195.3125 MHz.

Parameters Interface

Note: All inputs must keep the same value while HostConnEn=1b and HostConnStatus=0b

HostMAC[47:0]

In

The MAC address of the host system.

HostIPAddr[31:0]

In

The IP address of the host system.

HostAdmPort[15:0]

In

The Admin port number of the host system.

HostIOPort[15:0]

In

The IO port number for the host system.

TrgMACMode

In

Target MAC Address Mode.

0b: Target MAC address by ARP, 1b: Fixed target MAC address by user (TrgMAC[47:0]).

TrgMAC[47:0]

In

MAC address of the target system using Fixed-MAC mode (TrgMACMode=1b).

TrgIPAddr[31:0]

In

The IP address of the target system.

TCPTimeOutSet[31:0]

In

The timeout duration before initiating the retransmission process, measured in 1/Clk frequency units or 5.12 ns at 195.3125 MHz. It is recommended to set this to 3 times of the Round-trip time (RTT) or to 1 second if 3 times of RTT is less than 1 second.

NVMeTimeOutSet[31:0]

In

The timeout duration to wait for a response from the target before asserting error, measured in 1/Clk frequency units or 5.12 ns at 195.3125 MHz. Setting this to 0 disables the timeout. It is recommended to set this to 4 times of the TCPTimeOutSet to accommodate TCP/IP retransmission before asserting an error.

HostNQNL[127:0]

In

The lower 16 bytes of the host system’s NVMe Qualified Name (NQN), defined as a Unicode string. The total HostNQN size is 223 bytes, with 207 bytes set by HostNQNH and 16 bytes by HostNQNL. Excess characters should be filled with 00h, similar to TrgNQN. This is used when the target system grants access permissions for the SSD to specific hosts.

TrgNQN[1783:0]

In

The NVMe Qualified Name (NQN) of the SSD within the target system, defined as a Unicode string. The maximum size of TrgNQN is 223 bytes. For example, if the NQN is “dgnvmettest”, configure TrgNQN[7:0]=’d’, TrgNQN[15:8]=’g’, …, TrgNQN[87:80]=’t’ accordingly, with remaining characters (TrgNQN[1783:88]) set to 0 (null). This NQN is utilized by the host to identify and interact with the correct SSD in the target system.

Control Interface

IPVersion[31:0]

Out

IP version number

TestPin[127:0]

Out

Internal test pin

EthLinkup

In

The status of the Ethernet link. Set to 1b when the Ethernet link is successfully established.

HostConnEn

In

Enable/Disable the connection with the target.

Setting it from 0b to 1b initiates the connection establishment process.

Setting it from 1b to 0b triggers the connection termination process.

Ensure the IP is idle (HostBusy=0b) before modifying this signal. The HostBusy will then be set to 1b to indicate the start of the connection establishment or termination process and revert to 0b once the operation is completed.

HostConnStatus

Out

The connection status with the target. 0b: No connection, 1b: An active connection.

 

Signal name

Dir

Description

Control Interface

HostBusy

Out

The busy status of the IP. 0b: Idle, 1b: Busy.

Set to 1b after changing HostConnEn value or processing Write/Read commands.

HostError

Out

Set to 1b when an error is detected, indicated by HostErrorType being non-zero.

It can be reset by setting RstB to 0b.

HostErrorType[31:0]

Out

Error types.

[0] – An error if the target system is not found.

[1] – An error if the target requires authentication.

[2] – An error when certain target parameters are unsupported (refer to ‘TrgCAPStatus’ signal for details).

- I/O Queue Command Capsule Supported Size (IOCCSZ) is less than 260.

- Maximum Queue Entries Supported (MQES) is less than 511.

- Maximum Outstanding Commands (MAXCMD) is less than 256.

[3] – An error when LBA unit is not supported (LBA unit is not equal to 512 bytes).

[7:4] – Reserved

[8] – An error when the Admin port fails to establish a connection successfully.

[9] – An error when the Admin port does not receive a response within specified time.

[10] – An error when status register of the Admin completion entry is incorrect, with additional information available in the ‘TrgAdmStatus’ signal.

[11] – Reserved.

[12] – An error when the Admin port receives an unrecognized packet.

[13] – An error when the Admin port receives a termination request unexpectedly.

[14] – An error when the Admin port detects unsupported CPDA (not equal to 0,1,2, or 5) during connection establishment (refer to ‘TrgCAPStatus’ signal for more details).

[15] – A critical error within the Admin TCP/IP Controller, with more details in ‘AdmTCPStatus’ signal.

[16] – An error when the IO port is unable to establish a connection successfully.

[17] – An error when the IO port does not receive a response within specified time.

[18] – An error when status register of the IO completion entry is incorrect, with additional information available in the ‘TrgIOStatus’ signal.

[19] – Reserved.

[20] – An error when the IO port receives an unrecognized packet.

[21] – An error when the IO port receives a termination request unexpectedly.

[22] – An error when the IO port detects unsupported CPDA (not equal to 0,1,2, or 5) during connection establishment (refer to ‘TrgCAPStatus’ signal for more details).

[23] – A critical error within the IO TCP/IP Controller, with more details in ‘IOTCPStatus’ signal.

[31:24] – Reserved

TrgLBASize[47:0]

Out

The total capacity of the SSD in 512-byte units. Bits[2:0] are set to 000b to align with 4 KB units, as the IP supports data transfer in 4KB units. The default value is 0, and this signal becomes valid after the connection establishment process is completed.

TrgCAPStatus[79:0]

Out

Status of the target capabilities.

[31:0] – I/O Queue Command Capsule Supported Size (IOCCSZ).

[47:32] – Maximum Queue Entries Supported (MQES).

[63:48] – Maximum Outstanding Commands (MAXCMD).

[71:64] – Controller PDU Data Alignment (CPDA) of Admin port

[79:72] – Controller PDU Data Alignment (CPDA) of IO port

TrgAdmStatus[15:0]

Out

Status output from the Admin command.

[0] - Reserved

[15:1] – Status field value of the Admin Completion Entry

TrgIOStatus[15:0]

Out

Status output from the I/O command

[0] - Reserved

[15:1] – Status field value of the IO Completion Entry

AdmTCPStatus[31:0]

Out

Status from the Admin TCP/IP controller.

[0] – Set to 1b when TCP/IP Controller retries sending an ARP request packet.

[1] – Set to 1b when TCP/IP Controller retransmits a ‘SYN’ packet.

[2] – Reserved.

[3] – Set to 1b when TCP/IP Controller transmits a ‘RST’ packet during TCP open operation.

[4] – Set to 1b when TCP/IP Controller retransmits FIN and ACK packets.

 

Signal name

Dir

Description

Control Interface

AdmTCPStatus[31:0]

Out

[5] – Set to 1b when TCP/IP Controller retransmits data packet.

[6] – Set to 1b when TCP/IP Controller generates duplicate ACK packets to request the retransmission of lost data.

[7] – Set to 1b when TCP/IP Controller retransmits data packet to prompt the target to return an ACK packet for updating the window size value.

[20:8] – Reserved

[21] – Set to 1b when a received ACK packet indicates a lost packet.

[22] – Set to 1b upon receiving connection termination request during data transmission to the target (Fatal error).

[23] – Set to 1b when the received buffer of TCP/IP Controller is overflow (Fatal error).

[26:24] – Internal test status

[27] – Set to 1b when a received data packet indicates a lost packet.

[29:28] – Internal test status

[30] – Set to 1b when receiving a RST packet (Fatal error).

[31] – Internal test status

IOTCPStatus[31:0]

Out

Status from the IO TCP/IP controller. The description of each bit matches the AdmTCPStatus.

Memory Mapped Interface

HostMMAddr[47:0]

In

The Memory-mapped address for Write/Read commands in 512-byte units, requiring alignment to 4 KB boundaries. The maximum value is the (TrgLBASize – 8).

HostMMRead

In

Set to 1b, sending a 4 KB Read command. This signal must not be asserted simultaneously with a 4 KB Write command execution, setting HostMMWrite to 1b.

Upon a request of Read command, 4 KB read data is returned via HostMMRdData.

HostMMWrite

In

Set to 1b, sending a 4 KB Write command and 4 KB Write data. This signal must not be asserted simultaneously with HostMMRead assertion.

HostMMWtReq

Out

The readiness of the IP to accept the Write/Read command request and Write data.

0b: Accept the request/data, 1b: Not ready.

HostMMWrData[127:0]

In

Write data during Write command execution, valid when HostMMWrite is set to 1b.

HostMMRdData[127:0]

Out

Read data returned from Read command, valid when HostMMRdValid is set to 1b.

HostMMRdValid

Out

Set to 1b to indicate the validity of HostMMRdData. The read data transmission can be paused with the HostMMRdPause signal set to 1b. Following this, the HostMMRdValid is set to 0b, with a latency of 2 clock cycles from the assertion of HostMMRdPause.

HostMMRdPause

In

Set to 1b to pause receiving Read data when the user is not ready to accept data.

 

Table 6 EMAC I/O Signals (Synchronous to MacClk)

Signal name

Dir

Description

MacClk

In

EMAC interface clock which is equal to 390.625 MHz for 25G Ethernet.

MacTxData[63:0]

Out

Transmitted data to the EMAC. Considered valid when MacTxValid is set to 1b.

MacTxKeep[7:0]

Out

The byte enable for MacTxData. Set to 1b to indicate the corresponding bytes are valid for transmission, with bit[0] for MacTxData[7:0], bit[1] for MacTxData[15:8], and so on.

MacTxValid

Out

Valid signal of MacTxData. Set to 1b to indicate MacTxData is ready for transmission.

MacTxLast

Out

Asserted to 1b to indicate the final data of a packet. Valid when MacTxValid is set to 1b.

MacTxReady

In

Handshaking signal. Set to 1b when the EMAC is ready to accept MacTxData. During data transmission, this signal must be maintained at 1b for continuous data transmission from the transmission of the first data of a packet until the last data.

MacRxData[63:0]

In

Received data. Valid when MacRxValid is set to 1b.

MacRxValid

In

Valid signal of MacRxData. This signal is set to 1b continuously from the reception of the first data of a packet until the last data.

MacRxLast

In

Asserted to 1b to indicate the final data of the packet. Valid when MacRxValid is set to 1b.

MacRxUser

In

Control signal asserted at the end of received frame (when MacRxValid and MacRxLast are set to 1b) to indicate that the frame status. 1b: Normal packet, 0b: A packet with CRC error.

MacRxReady

Out

Handshaking signal. Asserted to 1b when MacRxData has been accepted. After the reception of the final data of a packet, MacRxReady is set to 0b for 2 clock cycles, preparing for the next data reception.

 

Timing Diagram

Reset Process

 

Figure 4 Reset Process Timing Diagram

 

1)     Ensure that the Clk signal is stable. Following this, set RstB to 1b to initiate the reset process.

2)     The IP then waits for the EthLinkup to be asserted, which confirms that the Ethernet link has been successfully established.

3)     Throughout the reset operation, the HostBusy signal is maintained at 1b, indicating that the IP is in the process of resetting. Once the reset process is finalized, HostBusy is reverted to 0b, signaling that the IP is ready for further operations, specifically for establishing a connection with the target system.

4)     To commence the connection establishment, the user change HostConnEn from 0b to 1b. This action triggers the process that enables the IP to start establishing a connection with the target system.

 

Connection Establishment

 

Figure 5 Connection Establishment Timing Diagram

 

1)     Ensure that both HostConnStatus and HostBusy are set to 0b, indicating that the IP is idle and the connection is inactive. Next, configure the necessary parameters, which includes HostMAC, HostIPAddr, HostAdmPort, HostIOPort, TrgMACMode, TrgMAC, TrgIPAddr, TCPTimeOutSet, NVMeTimeOutSet, HostNQNL, and TrgNQN. After setting these parameters, change HostConnEn to 1b to initiate the connection establishment. Maintain the parameter values stable throughout this process.

2)     Once the process begins, HostBusy is set to 1b, indicating the IP is establishing a connection.

3)     Upon the successful establishment, HostConnStatus changes to 1b. Following this, TrgLBASize provides the storage capacity and HostBusy is set to 0b, signifying that the IP is ready for data transmission, allowing the user to send Write/Read commands.

 

 

Figure 6 Failure Connection Establishment Timing Diagram

 

Figure 6 illustrates an example of failure connection establishment, which can be found if the parameter values are incorrect. The reset is required to resolve this problem. Further details in step-by-step are described below.

1)     If the connection establishment process fails, indicated by setting the HostError to 1b. The error status is identified via the HostErrorType signal. During this failure state, HostConnStatus remains at 0b (no active connection), while HostBusy is set to 1b (incomplete operation).

2)     To address the error, the system requires a reset. Before attempting to re-establish the connection, ensure that any issues leading to the error are resolved. Then, initiate the reset process by setting RstB to 0b.

3)     During the reset, the error flag (HostError) and the error status (HostErrorType) are cleared, returning them to normal status.

 

Write Command

 

Figure 7 Write Command - Single Mode Timing Diagram

 

During a Write command, the memory-mapped interface utilizes HostMMWrite to initiate a write request along with the corresponding write data to the IP. The first cycle of asserting HostMMWrite includes sending the target address and the first data of the transfer. Each write request involves transferring 4 KB of data, equivalent to 256 cycles of 128-bit data. The IP can use HostMMWtReq set to 1b to pause a write request or data transfer at any point during the operation. Below are detailed steps for executing a single Write command request.

1)     The user asserts HostMMWrite to 1b to send a Write request. The target address is specified on HostMMAddr, and the first data (D0) is placed on HostMMWrData. If HostMMWtReq is set to 0b in that clock cycle, it indicates that the request and the first data have been accepted. Subsequent data (starting with D1) can be sent by maintaining HostMMWrite at 1b. HostMMAddr is not required for the remaining data transfer, but ensure that HostMMAddr values at the first cycle are aligned to 4 KB, with bits[2:0] set to 000b.

2)     Once the write request is accepted, the IP sets HostMMWtReq to 1b for at least 6 clock cycles to pause the next write request or data transfer, allowing time for pre-processing of the Write command. During this pause, data transfer can resume once the IP is ready, and HostBusy is set to 1b, indicating the IP is processing the command.

3)     The user has the option to pause the data transmission by setting HostMMWrite to 0b and can resume by reasserting it to 1b.

4)     Upon transferring the last data (D255) for this Write request, if there are no further commands pending in the queue, the IP de-asserts HostBusy to 0b, indicating the system has returned to an idle state.

 

 

Figure 8 Write Command - Multiple Mode Timing Diagram

 

Figure 8 shows an example when multiple Write commands are transmitted to the IP. To send the second command in the next clock cycle after the first command is completed, the user sets HostMMWrite to 1b. HostMMWtReq is asserted to 1b after receiving the first data, like the first Write command. However, if the command queue is full (256 commands), the IP will set HostMMWtReq to 1b at the end of the Write command. Once all the Write commands have been executed, HostBusy is de-asserted to 0b.

 

Read Command

 

Figure 9 Read Command - Single Mode Timing Diagram

 

When executing a Read command, the data received upon request is consistently fixed at 4 KB, or 256 cycles of 128-bit data. This data is transferred continuously, but the user has the option to pause the data transmission during any cycle using HostMMRdPause. Below is a detailed step-by-step guide on sending a single Read command request.

1)     To send a Read command, the user sets HostMMRead to 1b, while specifying the desired memory address on HostMMAddr. It is crucial to ensure alignment to 4 KB units by setting bits[2:0] to 000b.

2)     Upon receiving the Read command, the IP sets HostMMWtReq to 1b for a minimum of 6 clock cycles to handle pre-processing. During this time, HostBusy is also set to 1b to indicate that the command is currently being processed.

3)     Once the data is returned to the IP by the target, the IP begins transferring the 4 KB data to the user. This is done by setting HostMMRdValid to 1b and streaming the data via HostRdData. HostMMRdValid remains asserted at 1b throughout the transfer of 256 cycles of 128-bit data, ensuring continuous data flow.

4)     If the user is not ready to receive more data using the data transfer cycle, HostMMRdPause can be set to 1b. This pauses the data transmission.

5)     Two clock cycles after asserting HostMMRdPause to 1b, HostMMRdValid is set to 0b, pausing the data transmission. The data transmission can be resumed by resetting HostMMPause to 0b.

6)     After all commands are processed and there are no remaining commands, HostBusy is de-asserted to 0b, indicating that the IP has returned to an idle state.

 

 

Figure 10 Read Command - Multiple Mode Timing Diagram

 

The IP features a command queue that enables the handling of multiple Write/Read command requests concurrently, significantly enhancing overall data transfer performance. The maximum number of Read commands that can be queued depends on two factors: the total capacity of the command queue, which is capped at 256 commands, and the configured Read buffer size. The maximum number of Read commands that can be requested is determined by dividing the Read buffer size by 4 KB.

To send multiple Read requests, the user sets HostMMRead to 1b and assigns the corresponding address on HostMMAddr for each command. This process allows the user to stack multiple commands without waiting for the completion of earlier ones. As data is received from the target, 4 KB of data corresponding to each command request is transferred back to the user. This transfer is initiated by setting HostMMRdValid to 1b. There will be at least a single clock cycle gap between each data transfer. The sequence in which the 4 KB data blocks appear on HostMMRdData corresponds directly to the sequence of Read command requests specified by HostMMAddr. Once all queue commands have been processed and no further commands remain, HostBusy is reset to 0b, indicating that the IP has returned to an idle state.

 

Mixed Write and Read Commands

 

Figure 11 Mixed Write and Read Commands Timing Diagram

 

The IP allows Write and Read commands to be issued without needing to wait for the completion of preceding commands. This flexibility helps optimize the throughput and efficiency of data operations. However, it is crucial to manage the shared signals, HostMMAddr and HostMMWtReq, to ensure accurate command processing. To send mixed Write and Read commands, the user sequentially initiate each command by setting either HostMMWrite or HostMMRead to 1b, along with the corresponding address on HostMMAddr. It is crucial not to set both HostMMWrite and HostMMRead to 1b simultaneously, which is not supported by the interface.

Upon receiving each command, HostMMWtReq is set to 1b during the pre-processing phases. For Read commands, the sequence of 4 KB data blocks received will correspond directly to the order of Read command requests made. After all command requests have been processed, including any Write or Read operations, HostBusy is reset to 0b.

 

Connection Termination

The process of terminating a connection in the IP involves a few steps to ensure that the termination is handled completely.

 

 

Figure 12 Connection Termination Timing Diagram

 

1)     Before initiating termination, confirm that HostBusy is set to 0b. This indicates that the IP is currently idle, with no commands being processed. This check ensures that this request does not interrupt any active command processing. Change the value of HostConnEn from 1b to 0b. This action signals the request to terminate the current connection.

2)     Following the request, HostBusy is set to 1b. This change signifies that the IP has started processing the termination request.

3)     Once the termination process is completed, both HostConnStatus and HostBusy are reset to 0b. HostConnStatus returning to 0b indicates that the connection is no longer active, and HostBusy being set to 0b confirms that the IP has returned to an idle state.

 

Error

When the IP encounters an error during the execution of an operation, it triggers an error notification process.

 

 

Figure 13 HostError and HostErrorType Timing Diagram

 

1)     The IP detects an issue, setting the HostError flag to 1b to indicate an error has occurred. Concurrently, the HostErrorType signal is updated to a non-zero value, specifying the details of the error. Each bit within HostErrorType corresponds to a different type of error, pointing to specific issues that can be further investigated using related status signals:

·       Bit[2]     : Check TrgCAPStatus[63:0] for unsupported features of target system.

·       Bit[10]   : Check TrgAdmStatus to review the Status register, output from the Admin command execution.

·       Bit[14]   : Check TrgCAPStatus[71:64] for CPDA value returned from Admin port of the target system.

·       Bit[15]   : Check AdmTCPStatus for the Admin TCP/IP controller’s status.

·       Bit[18]   : Check TrgIOStatus to review the Status register, output from the IO command execution.

·       Bit[22]   : Check TrgCAPStatus[79:72] for CPDA value returned from IO port of the target system.

·       Bit[23]   : Check IOTCPStatus for the IO TCP/IP controller’s status.

2)     After identifying and addressing the cause of error, the user should reset the IP to clear the error state and restart operations. This is done by setting RstB to 0b initiating the IP reset process.

3)     The reset action clears the HostError and HostErrorType signals, effectively resetting all error statuses related to previous processes.

 

EMAC Interface

The EMAC (Ethernet MAC) interface of the NVMeTCP25G IP employs a 64-bit AXI4-stream interface, optimized for data handling. This interface is designed to continuously transmit packet data without the ability to pause until the final data of the packet is transmitted. This characteristic necessitates that MacTxReady remains continuously asserted to 1b from the transmission of the first data packet until the final data packet, as illustrated in Figure 14.

From this limitation, the NVMeTCP25G IP is compatible with the DG 25GEMAC/PCS+RS-FEC IP core and can be adapted to work with the AMD Xilinx Ethernet Subsystems (10G/25G Ethernet Subsystem and Versal Multirate Ethernet MAC Subsystem) by incorporating a specific logic with a FIFO (MAC Adapter) to manage interfacing requirements.

 

 

Figure 14 Transmit EMAC Interface Timing Diagram

 

1)     The IP asserts MacTxValid to 1b and begins transmitting the first data (D0) on MacTxData. Except for the final data, all 64-bit data are valid, thus MacTxKeep is set to FFh for each data transfer, indicating full byte validity. The first data values are held until MacTxReady is confirmed to be asserted at 1b.

2)     Upon receiving the first data, EMAC asserts MacTxReady to 1b, indicating readiness to accept this and all subsequent data in the packet. This readiness continues until the end of the packet, ensuring a continuous data transfer in each packet.

3)     As the final data packet (Dn-1) is sent, the IP asserts MacTxValid and MacTxLast to 1b, signaling the end of the data packet. For this final transmission, MacTxKeep may not be set to FFh if the upper bytes of the data are not valid, adjusting based on the actual data length.

4)     Once the entire packet has been transferred, MacTxReady may be de-asserted to 0b. This action pauses the data transmission, allowing for any necessary processing before the start of the next packet.

 

Similar to Transmit EMAC interface, the Receive EMAC interface ensures that the data within a single packet is received continuously without interruption. The valid signal, MacRxValid, must be asserted to 1b from the start to the end of the packet, as depicted in Figure 15.

 

 

Figure 15 Receive EMAC Interface Timing Diagram

 

1)     The reception process begins when MacRxValid transitions from 0b to 1b, indicating the arrival of the first data (D0) of a packet on MacRxData. Following this, both MacRxValid and MacRxReady are maintained at 1b throughout the packet transfer. This continuous assertion is necessary to facilitate the uninterrupted transfer of the remaining data in the packet.

2)     The final data within a packet (Dn-1) is identified when both MacRxValid and MacRxLast are asserted to 1b simultaneously. During this time, MacRxUser also becomes a valid signal to read. If the packet is free of errors, MacRxUser will be set to 0b; otherwise, if an error is detected, the packet will be discarded by the IP.

3)     Once the packet reception is completed, the IP de-asserts MacRxReady for 2 clock cycles to allow for packet post-processing. After this pause, the EMAC can resume the reception of subsequent packets.

 

Verification Methods

The NVMeTCP25 IP Core functionality was verified by simulation and also proved on real board design by using KCU116, VCK190, Silicom FB2CGHH@KU15P, and Alveo U50.

 

Recommended Design Experience

Experience design engineers with a knowledge of Vivado Tools should easily integrate this IP into their design.

 

Ordering Information

This product is available directly from Design Gateway Co., Ltd. For pricing and additional information about this product, please refer to the contact information on the front page of this datasheet.

 

Revision History

Revision

Date (D-M-Y)

Description

2.00

1-Oct-24

Update data block size per command to 4 KB, and Core I/O signals

1.02

24-Feb-23

Update Core I/O signals and resource utilization

1.01

11-May-22

Update resource utilization

1.00

25-Mar-22

Initial release