NVMeTCP IP Core for 10G Data Sheet

Features 1

Applications 2

General Description. 3

Functional Description. 5

NVMe/TCP. 7

·      Register File. 7

·      Admin Command Handler 7

·      IO Command Handler 7

·      Read buffer 8

TCP/IP. 8

·      Admin TCP/IP Controller 8

·      IO TCP/IP Controller 9

User Logic. 9

10G/25G Ethernet MAC and PHY. 9

Core I/O Signals 10

Timing Diagram.. 14

Reset process 14

Connection Establishment 15

Write command. 17

Read command. 19

Mixed Write and Read command. 21

Connection Termination. 22

Error 23

EMAC Interface. 24

Verification Methods 26

Recommended Design Experience. 26

Ordering Information. 26

Revision History. 26

 

 

 

 

  Core Facts

Provided with Core

Documentation

Reference Design Manual

 Demo Instruction Manual

Design File Formats

Encrypted HDL

Instantiation Templates

VHDL

Reference Designs & Application Notes

Vivado Project,

See Reference Design Manual

Additional Items

Demo on KCU105/ZCU102/ZCU106

Support

Support Provided by Design Gateway Co., Ltd.

 

 

Design Gateway Co.,Ltd

E-mail:    ip-sales@design-gateway.com

URL:       design-gateway.com

Features

·     NVMe/TCP (NVMe over TCP) host controller (Initiator), based on NVMe-oF specification rev 1.1 and NVMe specification rev 1.4

·     Access one NVMe SSD on the target (Subsystem), selected by NVMe name (NQN)

·     Command: Write and Read

·     High performance: Write at 1200 Mbyte/s and Read at 1200 Mbyte/s (1Mbyte buffer) or less

·     Data interface: Memory-mapped interface

·     Data size per command: Fixed at 4 Kbytes

·     Maximum command: 256 or less, limited by Read buffer size for Read command

·     Configurable Read buffer size: 32Kbytes (up to 8 Read Cmd) - 1Mbytes (up to 256 Read Cmd)

·     Supported NVMe/TCP target:

·     IOCCSZ (I/O Queue Command Capsule Support Size): More than or equal to 260 (104h)

·     MQES (Maximum Queue Entries Supported): More than or equal to 256 (100h)

·     MAXCMD (Maximum Outstanding Commands): More than or equal to 256 (100h)

·     Authentication: Not required

·     Networking: 10Gb Ethernet speed in the same network for transferring ARP request/reply packet by using jumbo frame packet

·     Ethernet MAC interface: 64-bit AXI4 Stream interface at 156.25 MHz

·     User clock frequency: 156.25 MHz, the same clock as EMAC interface

·     Available reference design: KCU105/ZCU102/ZCU106 board

·     Customized service

·     NVMe/TCP Target in different network that cannot transfer ARP packet to get Target MAC address

·     The network that does not support jumbo frame packet

 

 

Table 1: Example Implementation Statistics

Family

Example Device

Read

BufSize

Fmax

(MHz)

CLB Regs

CLB LUTs

CLB

IOB

BRAMTile1

URAM

Design

Tools

Kintex-UltraScale

XCKU040FFVA1156-2E

32KB (min)

156.25

12800

12104

2269

-

46.5

-

Vivado2019.1

Kintex-UltraScale

XCKU040FFVA1156-2E

1MB (max)

156.25

12913

12577

2520

-

294.5

-

Vivado2019.1

Zynq-Ultrascale+

XCZU9EG-FFVB1156-2E

32KB (min)

156.25

12800

12096

2470

-

46.5

-

Vivado2019.1

Zynq-Ultrascale+

XCZU9EG-FFVB1156-2E

1MB (max)

156.25

12913

12568

2654

-

294.5

-

Vivado2019.1

Zynq-Ultrascale+

XCZU7EV-FFVC1156-2E

32KB (min)

156.25

12828

12094

2347

-

34.5

2

Vivado2019.1

Zynq-Ultrascale+

XCZU7EV-FFVC1156-2E

1MB (max)

156.25

12942

12333

2423

-

34.5

33

Vivado2019.1

 

 

Applications

 

Figure 1: NVMeTCP IP for 10G Application

 

The storage server is the system that contains many SSDs to store big capacity data from several sources. Most servers are installed in the control room while the data sources are on-site recorded. Without network system, the data source system must be directly connected to the storage server for transferring the recorded data to the SSD. Therefore, it needs to lose time to connect each data source to the storage server.

As shown in Figure 1, the record system that integrates NVMeTCP10G IP is able to transfer data to the storage server via 10Gb network at very high speed rate. Also, many NVMe/TCP host systems can be connected to the same network for accessing the storage server at the same time. It is more convenient for the host to transfer data to the SSD via the network. Besides, time usage for transferring the recorded data from the sensors to the storage server is much reduced.

NVMeTCP10G IP supports to transfer data with one SSD, selected by the SSD name (NQN: NVMe Qualified Name) that is configured by the target system. As shown in Figure 1, it is possible to connect three NVMeTCP10G IPs for transferring three data streams to three SSDs at the same time. To switch the active SSD of NVMeTCP10G IP, it needs to terminate the connection of the current SSD and then establish the new connection to the new SSD.

 

 

General Description

 

Figure 2: NVMeTCP IP for 10G Block Diagram

 

NVMeTCP10G IP implements the host controller (another name is NVMe/TCP initiator) to access one SSD inside NVMe/TCP target (another name is NVMe/TCP subsystem) via 10Gb Ethernet. The user interface of NVMeTCP10G IP is divided into three groups, i.e., Parameters interface, Control interface, and Memory map interface. Parameter interface is applied to assign the network parameters and timeout value of NVMeTCP10G IP during connection establishment process. Control interface is applied for creating and terminating the connection with the target system. Also, the status and error are included in Control interface for IP monitoring. Last, Memory map interface is applied for sending Write and Read command and transferring data of Write and Read command.

To connect the NVMe/TCP target, two TCP ports are established by NVMeTCP10G IP. The first port is the Admin connection to handle connection establishment and termination. Also, Keep alive command is transmitted in this port to keep the connection active. While the second port is the IO connection to handle Write command and Read command. Therefore, two TCP ports are controlled independently by using two TCP/IP controllers and two Command Handlers. EMAC I/F is the port multiplexer for transferring Ethernet packet of two TCP ports to the same EMAC.

According to NVMe/TCP protocol, NVMe/TCP Protocol Data Units (PDU) is assigned to be TCP Payload in TCP/IP packet for transferring between the host system and the target system via TCP/IP protocol which is the reliable protocol for networking data transferring. One TCP packet may consist of one PDU, many PDUs, or a part of PDU. Therefore, the NVMe/TCP processor and the TCP/IP processor to handle PDU and TCP/IP packet must be designed seperately. As shown in Figure 2, Admin/IO command handlers are designed to create and decode PDU following NVMe/TCP protocol while Admin/IO TCP/IP controllers are designed to create and decode TCP/IP packet following TCP/IP protocol.

Register File is the interface module for receiving the user parameters to start the connection establishment. Besides, there is Read buffer for re-orderring the received data in Read command to have the same sequence as the requested order. IO command handler is designed to support up to 256 Write/Read commands. However, the maximum command in the queue is also limited by Read buffer size. When Read buffer is full, the IP is not ready to receive the new Write/Read command. Read buffer size can be adjusted to balance the resource utilization and read performance. Larger buffer size may increase the read performance because maximum number of queues to send Read command is increased.

NVMeTCP10G IP runs in one clock domain that is synchronous to EMAC interface, 156.25 MHz for 10Gb Ethernet speed. The EMAC interface of NVMeTCP10G IP can connect to DG 10G25GEMAC IP directly while special logic (TenGMacIF) must be included in Tx EMAC interface for connecting with 10G/25G Ethernet Subsystem, Xilinx IP core.

The reference design on FPGA evaluation boards are available for evaluation before purchasing.

 

Functional Description

Figure 3 shows the operation flow of NVMeTCP10G IP after IP reset is de-asserted. The flow has two phases – No connection and Connection ready. The user must enable the connection to start the NVMe/TCP host function. After that, NVMeTCP10G IP creates two TCP ports for communicating with NVMe/TCP target. When the ports are created successfully, it changes from No connection phase to Connection ready phase.

There are two command types run in Connection ready phase – Admin command and IO command. Keep alive command is auto-run command which is always run as background process in Connection ready phase. While IO commands (Write and Read command) are controlled by the user. User can send multiple Write commands and Read commands until the command queue or Read buffer is not ready (The amount of operating commands = 256 or Read buffer is full). When the user disable the connection, the opreation to terminate the connection is run. After finishing the termination, the IP returns to No connection phase. More details of the operation flow are described as follows.

 

 

Figure 3: NVMeTCP10G IP Operation Flow

 

 

1)     IP waits until the Ethernet connection shows link up status. After linkup signal is asserted, EMAC IP core must be ready for transferring data via EMAC interface.

2)     IP monitors HostConnEn signal that is equal to ‘0’ as default value. When HostConnEn changes from ‘0’ to ‘1’ by user, the IP starts the connection establishment process.

3)     To connect the target, the Admin port is opened for creating the communication channel. After TCP connection is ready, the PDU to initialize the connection and connect the target are transferred. It waits until the response to accept the connection request is received. Before transferring the data by using Write/Read command with the NVMe/TCP target, the IO port must be created. Similar to Admin port, IO connection must be initialized by sending the PDU and the target must return the response to accept the IO connection. After finishing the initialization process, the IP changes to Connection ready status.

Note: During connection initialization, the host and the target exchange NVMe Qualified Name (NQN). NQN of the host and the target are assigned by user (HostNQN and TrgNQN: NVMeTCP10G IP input/parameters). NQN should be set by unique value.

4)     During connection active, Keep alive command is always sent every specific time, assigned by constant value to IP. KeepAlive timer is run when the connection is ready. When timeout is detected, the IP sends Keep alive command to NVMe/TCP target. The operation is successful when the response is returned. If there is no response returned until the IP is timeout, the error will be asserted and two TCP ports will be closed.

5)     When there is no more data for transferring and the IP is in Idle status (HostBusy=’0’), user can de-assert HostConnEn to ‘0’ to terminate the connection with NVMe/TCP target system. The IP closes IO port and then closes the Admin port. After that, the IP changes to No connection status.

6)     To send Write command in Connection ready phase, user asserts HostMMWrite=’1’ and transfers Write data to the target. After that, the IP skips to step 8) to check remained size of IO command queue.

7)     To send Read command in Connection ready phase, user asserts HostMMRead=’1’ to send Read command to the target. After that, the IP calculates the remained Read buffer size. If the size is not enough for sending the next Read command, the IP will wait until the buffer is free enough. Next, the IP goes to step 8) to check remained size of IO command queue.

8)     If operated IO command in the queue is less than 256, the IP will be able to receive the new Write/Read command. Otherwise, the IP must wait until the IO queue has free space. After all Write and Read commands in the IO queue are operated successfully, the IP returns to Idle status which supports to disable the connection by the user.

As shown in Figure 2, the IP implements two protocols by two hardwares, i.e., NVMe/TCP protocol by the Command handler and TCP/IP protocol by TCP/IP controller.

 

 

NVMe/TCP

This hardware group implements three protocol layes – NVMe (Admin command and IO command), NVMe over Fabrics, and NVMe/TCP transport. The parameters interface, control interface, and memory mapped interface from user are decoded and converted to create PDU to TCP/IP hardware. Admin command and IO command are handled by different modules.

·       Register File

The parameters of the target system such as IP address for creating Admin and IO connection are set by the user via Parameters interface. Register File loads the parameters and then forwards them to both Admin and IO command handler. The parameters are not used after the connections are established successfully. Besides, Register File loads Timeout value to be waiting time of the response that is returned by the target system.

·       Admin Command Handler

Admin connection is created before IO connection. Also, it is terminated after IO connection. The operation of Admin command handler are seperated to three processes, i.e., the connection establishment which is the most complex process, Keep alive command to keep the connection active for running Write/Read command, and the connection termination which is the last process. The time period to send keep alive command can be configured by “KeepAliveSet” parameters in HDL code. The valid value of KeepAliveSet is 0-3600 (second). Keep alive command and response may be transferred at the same time as the data in Write/Read command. Therefore, the keep alive operation may interrupt the Write/Read operation and slightly reduce the performance.

The logics inside Admin command handler consists of three parts. First is the controller to manage the PDU order for each process. Second is Tx module for creating the PDU for initializing connection, terminating connection, transmitting command, and transmitting data. Third is Rx module for decoding the PDU which may be the response of connection initialization, the response of the command, or the returned data. Admin connection and IO connection are terminated if the controller waits the received PDU until timeout.

·       IO Command Handler

There are four processes for IO Command handler, i.e., the connection establishment, Write command, Read command, and the connection termination. Similar to Admin command handler, it consists of the controller, Tx module, and Rx module. The main function of IO command handler is to handle the Write and Read command to transfer data with achieving the best performance. The processor is designed to support up to 256 Write and Read commands in the queue. However, the data packet from the target may be returned out-of-order. It needs to include Read buffer to re-order the packet. The IP cannot receive the new command if Read buffer is full. Similar to Admin command handler, there is the timer to check timeout condition to wait for the received PDU. If timeout is detected, Admin connection and IO connection are terminated by the controller.

·       Read buffer

The buffer size can be configured by “RdBufDepth” parameters in HDL code. The valid value of RdBufDepth is 3-8. More details are shown in Table 2.

Table 2: RxBufDepth parameter description

RdBufDepth

Buffer size

Maximum

Read command

Estimated

Read Performance*(1)

3

32 Kbyte

8 commands

600 Mbyte/s

4

64 Kbyte

16 commands

600 – 800 Mbyte/s

5

128 Kbyte

32 commands

600 – 900 Mbyte/s

6

256 Kbyte

64 commands

800 – 900 Mbyte/s

7

512 Kbyte

128 commands

1000 Mbyte/s

8

1 Mbyte

256 commands

1200 Mbyte/s

Remark *(1): Estimated Read performance is the performance in our test environment. The real performance depends on the resource of the target system.

Read buffer size limits the maximum number of Read command that can be sent in the queue. The maximum value is 8 or 1 Mbyte size for sending 256 Read commands to IO command queue. While the minimum value is 3 or 32 Kbyte for sending up to 8 Read commands in the queue. It needs to trade-off between the resource utilization and the maximum number of Read commands which may be effect to the maximum read performance. Larger buffer size may get better read performance. While Write performance does not depend on Read buffer size.

 

 

TCP/IP

The hardware implements TCP and IP protocol to build and decode the packet for transferring with Ethernet MAC controller. There are two TCP ports which are active at the same time, so EMAC IF must be designed to be the multiplexer to transfer the packet of two sources to one EMAC via 64-bit AXI4-Stream interface.

·       Admin TCP/IP Controller

Admin TCP/IP controller consists of three logic parts. First is the main controller to control the process of each operation such as opening the port, sending the data, receiving the data, and closing the port. Some processes such as sending the data and receiving the data may run at the same time. Second is Tx module that encapsulates the PDU to be Ethernet packet by adding TCP header, IP header, and Ethernet header. Third is Rx module that decodes the Ethernet packet and extracts only TCP payload returned to Admin command handler.

TCP/IP is lossless protocol, so the logic needs to support data retransmission for data recovery when data lost is found. Besides, TCP/IP protocol has the flow control by reading the free size of received buffer from the receiver before sending more data.

Admin port is run for connection initialization, connection termination, and keep alive transmission. Comparing to IO port, there is less packet for transmission in Admin port. To optimize the resource utilization, TCP buffer size inside Admin TCP/IP controller is not large.

TCP/IP controller uses ARP packet for translating MAC address of NVMe/TCP target from IP address. Therefore, the IP that is NVMe/TCP host must stay in the same network as NVMe/TCP target. Please ask our sales to customize the IP for supporting the network crossing feature.

·       IO TCP/IP Controller

Comparing Admin TCP/IP controller, IO TCP/IP controller requires very high speed performance to transfer the data in Write and Read command. Therefore, TCP buffer size in IO TCP/IP controller is large to transfer Ethernet packet with NVMe/TCP target continuously under flow control process. To achieve the best performance, the Ethernet packet transmitted by IO TCP/IP controller is Jumbo-frame. Therefore, the network device and NVMe/TCP target must support Jumbo-frame packet.

Note: Please contact our sales if the user needs to use the target system that installed in different network or does not support Jumbo-frame packet.

·       EMAC IF

The transmitted packet from Admin TCP/IP controller and IO TCP/IP controller may be sent to EMAC at the same time. Therefore, EMAC IF is designed to receive the packet from both Admin and IO TCP/IP controller and then forward the packet to EMAC. Also, the received packet from EMAC is forwaded to Admin and IO TCP/IP controller.

 

 

User Logic

User logic designs the interface logic with NVMeTCP10G IP by using Memory mapped interface for Write and Read command. The logic assigns the address and asserts the request to the IP for starting the operation, similar to standard memory mapped bus interface. While the parameters of NVMeTCP10G IP can be assigned by constant value. The control interface is applied to monitor the status if some error conditions are found. Without error condition, the user uses one signal, HostConnEn, to enable and disable the connection with the target. Therefore, the user logic is simple design for transferring data with NVMe/TCP target system when using NVMeTCP10G IP.

 

10G/25G Ethernet MAC and PHY

Ethernet MAC implements the MAC layer for 10/25Gb Ethernet. NVMeTCP10G IP can directly connect with DG 10G25GEMAC IP while the additional logic with small FIFO (TenGMacIF in Figure 2) must be included when connecting with 10G/25G Ethernet subsystem, Xilinx IP core. While 10G Ethernet PCS/PMA for 10GBASE-R is no charge Xilinx LogiCORE.

More details of DG 10G25GEMAC IP core are described in following website.

https://dgway.com/products/IP/GEMAC-IP/dg_10g25gemacip_data_sheet_xilinx.pdf

More details of 10/25G Ethernet Subsystem (Ethernet MAC and Ethernet PCS/PMA) are described in the following website.

https://www.xilinx.com/products/intellectual-property/ef-di-25gemac.html

 

Core I/O Signals

Descriptions of all parameters and I/O signals are provided in Table 3 and Table 4.

 

Table 3: Core Parameters

Name

Value

Description

HostNQNH[1783:128]

Unicode char

Upper 207 bytes of NVMe Qualifed Name (NQN) of the host system

Note: If HostNQN can be assigned within 16 bytes which is fit to HostNQNL input size, the value of HostNQNH can be fixed to all zero value.

KeepAliveSet

0 – 3600

Time period to send Keep alive in second unit. Set 0 to disable keep alive transmission feature.

RdBufDepth

3 - 8

Setting Read buffer size. More details are shown in Table 2.

 

Table 4: Core I/O Signals

Signal

Dir

Description

Common Interface

RstB

In

Reset IP core. Active Low.

Clk

In

Clock input that is synchronous to EMAC interface. 156.25 MHz for 10Gb Ethernet.

Parameters Interface

Note: All inputs must be valid when HostConnEn=’1’ and HostConnStatus=’0’.

HostMAC[47:0]

In

MAC address of the host system.

HostIPAddr[31:0]

In

IP address of the host system.

HostAdmPort[15:0]

In

Admin Port number of the host system

HostIOPort[15:0]

In

IO Port number for the host system

TrgIPAddr[31:0]

In

IP address of the target system

TCPTimeOutSet[31:0]

In

Timeout value before starting retransmission process. Time unit is 1/Clk frequency or 6.4 ns for 156.25 MHz. It is recommended to use 3 times of RTT (Round-trip time) or 1 sec when 3 times of RTT is less than 1 sec.

NVMeTimeOutSet[31:0]

In

Timeout value to wait the response returned from the target before asserting error. Time unit is equal to 1/(Clk frequency) or 6.4 ns for 156.25 MHz. When NVMeTimeOutSet is set to 0, Timeout is disabled. It is recommended to use 4 times of TCPTimeOutSet to allow TCP/IP retransmission before error.

HostNQNL[127:0]

In

Lower 16 bytes of NVMe Qualifed Name (NQN) of the host system, defined as a string of Unicode characters. Total size of HostNQN is 223 bytes, 207 bytes by parameter assignment (HostNQNH) and 16 bytes by this input (HostNQNL). When the name is shorter than maximum size, the remained character is set to 0x00, similar to TrgNQN. This value is applied when the target system sets the permission to allow specific hosts to access the SSD.

TrgNQN[1783:0]

In

NVMe Qualifed Name (NQN) of the SSD inside the target system, defined as a string of Unicode characters. Maximum size of TrgNQN is 223 bytes. When the name is shorter than maximum size, the remained character is set to 00h. For instance, when NQN is “dgnvmettest”, set TrgNQN[7:0]=’d’, TrgNQN[15:8]=’g’, …, TrgNQN[87:80]=’t’. While remained charater (TrgNQN[1783:88]) = 0 (null). The host uses this value to select the active SSD at the target system. NQN of the SSD is configured by NVMe/TCP target system.

 

Signal

Dir

Description

Control Interface

IPVersion[31:0]

Out

IP version number

TestPin[127:0]

Out

Reserved to be IP Test point

HostConnEn

In

Enable/Disable the connection with the target.

Change from ‘0’ to ‘1’ to create the connection with the target.

Change from ‘1’ to ‘0’ to terminate the connection with the target

Before changing the signal, please confirm that the IP is Idle (HostBusy=’0’). After that, HostBusy is asserted to ‘1’ to start connection establishment/termination. HostBusy is de-asserted to ‘0’ after finishing the operation.

HostConnStatus

Out

Connection Status.

‘0’: No connection with the target, ‘1’: The connection is active.

HostBusy

Out

IP busy status. ‘0’: Idle, ‘1’: Busy.

Asserted to ‘1’ when HostConnEn changes the value to enable/disable the connection or User sends Write/Read command. De-asserted to ‘0’ after the IP completes the operation.

HostError

Out

Error flag. Asserted to ‘1’ when HostErrorType is not equal to 0.

The flag is cleared by asserting RstB to ‘0’.

HostErrorType[31:0]

Out

[0] – Error when the target is not found

[1] – Error when the target requires authentication

[2] – Error when some target parameters are not supported.

Please see more details in TrgCAPStatus signal.

- IOCCSZ (I/O Queue Command Capsule Supported Size) is less than 260.

- MQES (Maximum Queue Entries Supported) is less than 256.

- MAXCMD (Maximum Outstanding Commands) is less than 256

[7:3] - Reserved

[8] – Error when Admin port cannot be established successfully

[9] – Error when Admin port does not receive the response until timeout

[10] – Error when status register of Admin completion entry is not correct.

Please see more details in TrgAdmStatus signal.

[11] – Error when Admin received packet is not aligned to 8 bytes(1)

[12] – Error when Admin port receives unknown packet

[13] – Error when Admin port receives terminate request

[15:14] – Reserved

[16] – Error when IO port cannot be established successfully

[17] – Error when IO port does not receive the response until timeout

[18] – Error when status register of IO completion entry is not correct.

Please see more details in TrgIOStatus signal.

[19] – Error when IO received packet is not aligned to 8 bytes(1)

[20] – Error when IO port receives unknown packet

[21] – Error when IO port receives terminate request

[31:22] – Reserved

Remark:

(1): Please contact our sales if your target system returns the packet that is not aligned to 8 bytes (64-bit).

 

Signal

Dir

Description

Control Interface

TrgLBASize[47:0]

Out

Total capacity of SSD in 512-byte unit. Bit[2:0] are equal to 000 to align 4Kbyte unit because the IP supports transferring data in 4Kbyte unit. Default value is 0. This signal is valid after finishing the connection establishment process.

TrgCAPStatus[63:0]

Out

[31:0] – IOCCSZ (I/O Queue Command Capsule Supported Size)

[47:32] – MQES (Maximum Queue Entries Supported)

[63:48] – MAXCMD (Maximum Outstanding Commands)

TrgAdmStatus[15:0]

Out

Status output from Admin command

[0] - Reserved

[15:1] – Status field value of Admin Completion Entry

TrgIOStatus[15:0]

Out

Status output from I/O command

[0] - Reserved

[15:1] – Status field value of IO Completion Entry

Memory mapped Interface

HostMMAddr[47:0]

In

The address in 512-byte unit for Write/Read command. Valid when HostMMWrite=’1’ in Write command or HostMMRead=’1’ in Read command. HostMMAddr[2:0] must be always set to “000” to align 4 Kbyte. Maximum value of HostMMAddr is equal to TrgLBASize – 8 (4 Kbyte).

HostMMRead

In

Assert to ‘1’ for sending Read command. After the IP receives one Read command (HostMMRead=’1’ and HostMMWtReq=’0’), 4Kbyte read data is returned on HostMMRdData with asserting HostMMRdValid to ‘1’.

Note: HostMMRead must not be asserted to ‘1’ when Write command with 4Kbyte data is in request.

HostMMWrite

In

Assert to ‘1’ for sending Write command with 4Kbyte write data to the IP.

Note: HostMMWrite must not be asserted to ‘1’ at the same time as HostMMRead=’1’

HostMMWtReq

Out

The response of Write/Read request.

‘0’: Accept Write/Read command request and Write data.

‘1’: Not ready to receive Write/Read command or Write data.

HostMMWrData[63:0]

In

Write data in Write command. Valid when HostMMWrite=’1’.

The value must be latched if HostMMWtReq=’1’.

HostMMRdData[63:0]

Out

Read data returned after receiving Read command. Valid when HostMMRdValid=’1’.

HostMMRdValid

Out

Valid signal of HostMMRdData. Asserted to ‘1’ to return Read data in Read command.

 

Signal

Dir

Description

MAC Interface

EthLinkup

In

Ethernet Linkup status. Asserted to ‘1’ when the ethernet link is established.

MacTxData[63:0]

Out

Transmitted data to EMAC. Valid when MacTxValid=’1’.

MacTxKeep[7:0]

Out

Byte enable of MacTxData. Valid when MacTxValid=’1’.

Bit[0],[1],…,[7]=’1’ when MacTxData[7:0],[15:8],…,[63:56] are valid.

MacTxValid

Out

Valid signal of MacTxData.

MacTxLast

Out

Asserted to ‘1’ to indicate the final data of the packet. Valid when MacTxValid=’1’.

MacTxReady

In

Handshaking signal. Asserted to ‘1’ when MacTxData has been accepted.

This signal must not be de-asserted to ‘0’ before transmitting the final data of a packet

MacRxData

In

Received data. Valid when MacRxValid=’1’.

MacRxValid

In

Valid signal of MacRxData. After transferring the first data of packet, the signal must be asserted to ‘1’ until transferring the final data of a packet to transfer one packet continuously.

MacRxLast

In

Asserted to ‘1’ to indicate the final data of the packet. Valid when MacRxValid =’1’.

MacRxUser

In

Control signal asserted at the end of received frame (MacRxValid=‘1’ and MacRxLast=‘1’) to indicate that the frame has CRC error.

‘1’: Normal packet, ‘0’: Error packet.

MacRxReady

Out

Handshaking signal. Asserted to ‘1’ when MacRxData has been accepted.

MacRxReady is de-asserted to ‘0’ for 2 clock cycles after receiving the final data of the frame.

 

 

Timing Diagram

 

Reset process

 

 

Figure 4: Timing diagram of Reset process

 

1)     Clk signal must be stable before de-asserting RstB to ‘0’. After RstB is de-asserted, the IP starts the reset process.

2)     The IP confirms the Ethernet link is established by checking EthLinkup. The IP waits until EthLinkup is asserted to ‘1’.

3)     HostBusy is asserted to ‘1’ during reset process. After finishing IP reset process, HostBusy is de-asserted to ‘0’. The IP is ready for connecting with the target system.

4)     The user can assert HostConnEn to ‘1’ to start the connection establishment process with the target. It is recommend to de-assert HostConnEn to ‘0’ during reset operation.

 

Connection Establishment

 

Figure 5: Timing diagram of Connection establishment

 

1)     Before asserting HostConnEn to ‘1’, HostConnStatus and HostBusy must be equal to ‘0’ to show that the IP is Idle and the connection is not active. All parameters (HostMAC, HostIPAddr, HostAdmPort, HostIOPort, TrgIOAddr, TCPTimeOutSet, NVMeTimeOutSet, HostNQNL, and TrgNQN) must be valid when asserting HostConnEn to ‘1’. Also, all parameters must latch the value until the connection establishment process is finished.

2)     HostBusy is asserted to ‘1’ when the connection establishment is not complete.

3)     After finishing the process, HostConnStatus changes to ‘1’ and TrgLBASize is valid for reading. HostBusy is de-asserted to ‘0’. Now the connection is ready for sending Write/Read command.

 

 

Figure 6: Error during running Connection establishment

 

When the connection establishment is failed, the error signal is asserted, as shown in Figure 6. The error may be caused from wrong parameter setting. After error is asserted, the user needs to assert RstB signal to reset the IP.

1)     HostError is asserted to ‘1’ and HostErrorType shows the error details in connection establishment process. Unlike the successful condition, HostConnStatus is not asserted to ‘1’ and HostBusy is not de-asserted to ‘0’ in failure condition.

2)     The IP needs to be reset by asserting RstB to ‘0’ for error recovery.

3)     After reset is asserted, HostError is de-asserted to ‘0’ and the error status are cleared.

 

 

Write command

 

 

Figure 7: Timing diagram of Write command (Single mode)

 

Data size of one Write command is fixed to 4 Kbytes (512 x 64-bit). Write command is requested at the same clock as sending the first data (D0) on HostMMWrData. The IP asserts HostMMWtReq to ‘1’ when the IP is not ready to receive the command or the data. Therefore, user must latch the input until the IP is ready. After finishing the Write command, HostBusy is de-asserted to ‘0’.

1)     User asserts HostMMWrite to ‘1’ to send Write request. At the same time, the start write address and the first write data must be valid on HostMMAddr and HostMMWrData respectively. If HostMMWtReq is de-asserted to ‘0’, the next write data (D1) will be sent in the next clock cycle. Otherwise, the first data must latch the value until HostMMWrReq is de-asserted to ‘0’.

Note: HostMMAddr must be aligned to 4Kbyte size by setting bit[2:0] to 000b.

2)     After the IP accepts the write request, HostMMWtReq is asserted to ‘1’ for pre-processing the Write command. Therefore, the second data (D1) must latch the value until HostMMWrReq is de-asserted to ‘0’. Also, HostBusy is asserted to ‘1’ to protect the connection termination by the user before Write command is finished.

Note: The minimum cycle that HostMMWtReq is asserted to ‘1’ after receiving Write command request for pre-processing is equal to 9 cycles.

3)     During transferring the data, the user may de-assert HostMMWrite to ‘0’ when the next write data is not ready. User re-asserts HostMMWrite to ‘1’ with the valid write data when the data is ready.

4)     When the IP is not ready to receive the write data, HostMMWtReq is asserted to ‘1’ to pause data transmission. HostMMWrData must latch the value.

5)     When the final data (D511) is received, the IP asserts HostMMWtReq to ‘1’ for post- processing Write command.

Note: The minimum cycle that HostMMWtReq is asserted to ‘1’ for post-processing is equal to 1 cycle.

6)     The IP de-asserts HostBusy to ‘0’ when all commands are completely processed. After that, user can terminate the connection if there is no more data for transferring.

 

 

Figure 8: Timing diagram of Write command (Multiple mode)

 

Figure 8 shows the example when multiple Write command are transferred to the IP. User asserts HostMMWrite to ‘1’ to send the second command in the next clock cycle after the first command is completed. However, the second command must be latched at least one cycle because HostMMWtReq is asserted to ‘1’ for post-processing the first command. Totally, HostMMWtReq is asserted to ‘1’ at least 10 clock cycles per Write command (9 cycles for pre-processing and 1 cycle for post-processing). After finishing all Write commands, HostBusy is de-asserted to ‘0’.

 

 

Read command

 

 

Figure 9: Timing diagram of Read command (Single mode)

 

Similar to Write command, the data size of one Read command is fixed to 4 Kbytes (512 x 64-bit). After Read command is requested, 4 Kbyte data is returned on HostMMRdData continuously. HostMMWtReq is asserted to ‘1’ when the IP is not ready to receive the next command. When all Read commands are processed completely, HostBusy is de-asserted to ‘0’.

1)     User asserts HostMMRead to ‘1’ to send Read request. Also, the start read address must be valid on HostMMAddr.

Note: HostMMAddr must be aligned to 4Kbyte size by setting bit[2:0] to 000b.

2)     Similar to Write command, after the IP accepts the read request, HostMMWtReq is asserted to ‘1’ for pre-processing Read command. Also, HostBusy is asserted to ‘1’ to protect the connection termination by the user.

Note: The minimum cycle that HostMMWtReq is asserted to ‘1’ for pre-processing is equal to 9 cycles.

3)     4 Kbyte data of Read command is returned on HostMMRdData by asserting HostMMRdValid to ‘1’ for 512 cycles continuously.

4)     The IP de-asserts HostBusy to ‘0’ when all commands are completely processed. After that, user can terminate the connection if there is no more data for transferring.

 

 

Figure 10: Timing diagram of Read command (Multiple mode)

 

Figure 10 shows the example when multiple Read command are transferred to the IP. User asserts HostMMRead to ‘1’ to send the second command in the next clock cycle after the first command is accepted. However, the second command and the address must be latched at least 9 cycles, the pre-processing time of the IP. The command bus and the data bus are run parallely. The new Read command can be requested to the IP without waiting the returned data on HostMMRdData.

When the target system returns the data of Read command, the IP core re-arrange the data in the same order as the command request. 4 Kbyte data of the first Read command is returned before 4 Kbyte data of the second Read command. The gap size between each 4 Kbyte received data is at least 2 clock cycles. After all received data is returned and all Read commands are completely processed, HostBusy is de-asserted to ‘0’.

 

 

Mixed Write and Read command

 

 

Figure 11: Timing diagram of Mixed Write and Read command

 

Figure 11 shows the example when mixed Write and Read command are requested by user. HostMMAddr and HostMMWtReq are shared signals for Write and Read command, so it is not allowed user to assert HostMMWrite and HostMMRead to ‘1’ to send Write and Read command at the same time. The order of command request in Figure 11 is Read @ A0, Write @ A1, and Read @A2. After the IP receives Write/Read command, HostMMWtReq is asserted to ‘1’ to run pre-processing at least 9 clock cycles. Also, it is asserted to ‘1’ at least one cycle for post-processing Write command.

Similar to Read command in Multiple mode, the order of received data in HostMMRdData is similar to the order of the command request. As shown in Figure 11, 4 Kbyte data of the first command is returned before 4 Kbyte data of the third command.

After finishing all IO command operations, HostBusy is de-asserted to ‘0’.

 

 

Connection Termination

 

 

Figure 12: Shutdown command timing diagram

 

1)     Before de-asserting HostConnEn to ‘0’, HostBusy must be de-asserted to ‘0’ to confirm that there is no incomplete command remained. After that, user de-asserts HostConnEn to ‘0’ to request the connection termination.

2)     HostBusy is asserted to ‘1’ when the IP starts the connection termination process.

3)     After the connection is terminated successfully, HostConnStatus and HostBusy are de-asserted to ‘0’. HostConnStatus is de-asserted to ‘0’ before HostBusy is de-asserted.

 

 

Error

 

 

Figure 13: Timing diagram of HostError and HostErrorType

 

When the operation cannot run successfully, HostError is asserted to ‘1’. The error type is latched to HostErrorType. Besides, there are more status signals related to HostErrorType to check the details of the error type such as TrgCAPStatus, TrgAdmStatus, and TrgIOStatus. When the error is detected, the IP must be recovered by asserting RstB signal to restart the IP.

1)     When some errors are detected in IP core, HostErrorType are not equal to 0. After that, HostError is asserted to ‘1’ to show the error condition to user. The user reads HostErrorType to check the type of the error. Some error types show more details in Status signals.

HostErrorType[2]=’1’     : Read TrgCAPStatus to check unsupported Target system

HostErrorType[10]=’1’    : Read TrgAdmStatus to check Status register of Admin command

HostErrorType[18]=’1’    : Read TrgIOStatus to check Status register of IO command

2)     After user checks the error and solves the problem in the system, user restarts the IP by asserting RstB to ‘0’. All logics in the IP is in reset condition.

3)     HostError and HostErrorType are cleared. There is no error status found.

 

EMAC Interface

EMAC interface of NVMeTCP10G IP is designed by using 64-bit AXI4-stream interface. The limitation is that the IP cannot pause data transmission before the final data of the packet is transmitted. Therefore, MacTxReady must be always asserted to ‘1’ between the first data and the final data of a packet, as shown in Figure 14.

From the limitation, NVMeTCP10G IP can directly connect with DG 10G25GEMAC IP core while a special logic with small FIFO must be added to interface with Xilinx 10G/25G Ethernet Subsystem.

 

 

Figure 14: Transmit EMAC interface timing diagram

 

(1)   The IP asserts MacTxValid to ‘1’ and sends the first data (D0) on MacTxData. All 64-bit data are valid except the final data that may be valid for some bytes. Therefore, MacTxKeep is equal to FFh for every data, except the final data. The inputs latch the value until MacTxReady is asserted to ‘1’.

(2)   EMAC asserts MacTxReady to ‘1’ to accept the first data. After that, MacTxReady is asserted to ‘1’ to accept all remained data in the packet until end of packet. Therefore, one packet data is transferred continuously.

(3)   The IP asserts MacTxValid and MacTxLast to ‘1’ when the final data (Dn-1) of the packet is sent to MacRxData. MacTxKeep may not be equal to FFh when some upper bytes are not valid.

(4)   After finishing transferring the packet, MacTxReady may be de-asserted to ‘0’ to pause data transmission of the next packet.

Similar to Transmit EMAC interface, the data of one packet must be received continuously in Receive EMAC interface. Valid signal must be asserted to ‘1’ between the start of the packet and the end of the packet, as shown in Figure 15.

 

Figure 15: Receive EMAC interface timing diagram

 

(1)   The IP detects the first data (D0) of a packet when MacRxValid changes from ‘0’ to ‘1’. The first data is valid on MacRxData. After that, MacRxValid and MacRxReady must be equal to ‘1’ until the end of packet to transfer the remained data of the packet continuously.

(2)   The final data (Dn-1) on MacRxData is received when MacRxValid and MacRxLast are asserted to ‘1’. At the same time, MacRxUser is valid to read. When the packet has no error, MacRxUser is equal to ‘1’. Otherwise, the packet is rejected by the IP.

(3)   After receiving the end of packet, the IP de-asserts MacRxReady for 2 cycles to complete the packet post processing. Therefore, EMAC must support to pause the data packet transmission for 2 clock cycles.

Note: Typically, at least two cycle gap size is detected after finishing transferring Ethernet frame to transfer the control word.

 

Verification Methods

The NVMeTCP10G IP Core functionality was verified by simulation and also proved on real board design by using KCU105, ZCU102, and ZCU106 board.

 

Recommended Design Experience

Experience design engineers with a knowledge of Vivado Tools should easily integrate this IP into their design.

 

Ordering Information

This product is available directly from Design Gateway Co., Ltd. Please contact Design Gateway Co., Ltd. for pricing and additional information about this product using the contact information on the front page of this datasheet.

 

Revision History

Revision

Date

Description

1.0

4-Nov-2021

Initial Release

1.1

24-Mar-2022

Correct IOCCSZ