LL10GEMAC IP Core Datasheet

Features 1

Applications 2

Reference design. 3

General Description. 7

Functional Description. 8

Transmit Block. 8

·       Tx Controller 8

·       CRC32 Cal 8

·       64B/66B Encoder 8

·       Scramble. 8

·       Tx Gearbox. 9

Receive Block. 9

·       Rx Gearbox. 9

·       Rx Controller and Synchronization. 9

·       Descramble. 9

·       64B/66B Decoder 9

·       CRC32 Cal 10

10GbE PMA (10GBASE-R) 10

Core I/O Signals 11

Timing Diagram.. 12

IP Initialization. 12

Transmit interface. 13

Receive Interface. 14

Verification Methods 16

Recommended Design Experience. 16

Ordering Information. 16

Revision History. 16

 

 

  Core Facts

Provided with Core

Documentation

Reference Design Manual

 Demo Instruction Manual

Design File Formats

Encrypted file

Instantiation Templates

VHDL

Reference Designs & Application Notes

Vivado Project,

See Reference Design Manual

Additional Items

Demo on ZCU102,

Alveo U50 and U250 cards

Support

Support Provided by Design Gateway Co., Ltd.

 

Design Gateway Co.,Ltd

E-mail:    ip-sales@design-gateway.com

URL:       design-gateway.com

 

Features

·     Support for 10G Ethernet MAC and PCS

·     Direct connection with 32-bit PMA using AMD Xilinx IP wizard.

·     Low latency solution with a round-trip latency of 65.1 ns (18.6 ns for Tx path, 21.7 ns for Rx path, and 24.8 ns for PMA latency)

·     AXI4-Stream interface for easy integration with user logic

·     Minimal resource consumption

·     Minimum Tx packet size of 5 bytes

·     FCS (CRC-32) insertion and checking to ensure data integrity

·     64B/66B Encoding and Decoding in compliance with the IEEE802.3ae specification

·     Support for the 10GBASE-R standard

·     Zero padding appended on the Tx interface, while zero padding is not removed on the Rx interface

·     Separate clock domains for transmit and receive interfaces operating at 322.265625 MHz

·     Reference design available, including

-        A Loopback demo on the AMD Xilinx development board (ZCU102)

-        An Accelerated Algorithmic Trading (AAT) demo on the Alveo accelerator cards (U50 and U250)

 

Table 1: Example Implementation Statistics

Family

Example Device

Fmax

(MHz)

CLB Regs

CLB LUTs

CLB

IOB

BRAMTile

Design

Tools

Kintex-Ultrascale

XCKU040FFVA1156-2E

322.266

1093

1366

263

-

-

Vivado2019.1

Zynq-Ultrascale+

XCZU7EV-FFVC1156-2E

322.266

1093

1361

269

-

-

Vivado2019.1

Virtex-Ultrascale+

XCVU9P-FLGA2104-2L

322.266

1093

1362

263

-

-

Vivado2019.1

Notes: Actual logic resource dependent on percentage of unrelated logic

 

Applications

The low-latency network access has become crucial for various real-time applications such as High-Frequency Trading (HFT), Data Centers, and Real-Time Control Systems within industries like Automotive and Industrial Automation. The Low-Latency 10 Gigabit Ethernet MAC (LL10GEMAC) IP Core offers an efficient and high-performance solution tailored for low-latency 10G Ethernet networking.

 

 

Figure 1: LLGEMAC Application

 

In High-Frequency Trading (HFT), where executing trades in microseconds is critical, the LL10GEMAC IP core ensures fast data transmission with minimal response times. By integrating both the Ethernet MAC and Physical Coding Sublayer (PCS) into the core, it enables high-speed, low-latency communication. The core’s user interface, which operates via a 32-bit AXI4 stream interface, simplifies the development of TCP and UDP network engines through High-Level Synthesis (HLS), significantly reducing development cycles. This allows developers to build optimized, high-performance trading systems that meet the requirements of HFT requirements.

 

Reference design

The LL10GEMAC IP provides two reference designs: the Loopback demo design on the AMD Xilinx development board and the Accelerated Algorithm Trading (AAT) demo on Alveo accelerator cards.

 

 

Figure 2: LL10GEMAC Latency on Loopback Demo

 

The Loopback demo is designed to test and validate the functionality of the LL10GEMAC IP core by generating small packets and verifying the return packets through loopback logic. This demo provides a measurement of round-trip latency, allowing users to evaluate the performance of the LL10GEMAC in real-time conditions. The round-trip latency, measured from the transmission (Tx) path to the reception (Rx) path via the LL10GEMAC’s AXI4-Stream interface, is 65.1 ns, equivalent to 21 clock cycles at a frequency of 322.265625 MHz, as illustrated in Figure 2.

 

Another reference design for the LL10GEMAC IP is the Accelerated Algorithm Trading (AAT) demo, which is modification of the AMD Xilinx AAT demo. In this design, the 10G/25G Ethernet Subsystem from AMD Xilinx is replaced with the LL10GEMAC IP core. Other modules, such as the TCP/IP engine, UDP/IP engine, Market data processing, Algorithm logic for generating trading orders, and the Order generator, are developed using High-Level Synthesis (HLS) for simplified maintenance. The AAT reference design is available for the Alveo accelerator cards.

 

 

Figure 3: LL10GEMAC IP Core in AAT Demo

 

In comparison to the 10G/25G Ethernet Subsystem, the LL10GEMAC IP offers lower latency and reduced resource consumption. For more details, refer to the AAT demo reference design document that utilizes the LL10GEMAC IP core.

 

During the execution of the loopback demo, the round-trip latency includes both the latency of the LL10GEMAC IP core and the AMD Xilinx 10G PMA. Detailed latency values for the PMA can be found in AR#68177 at this link: https://adaptivesupport.amd.com/s/article/68177. In the loopback demo, the UltraScale+ GTH Transceiver block is configured using a low-latency strategy, as shown in Figure 4.

 

 

Figure 4: AMD Xilinx PMA for 10G Ethernet by Raw Data (Ref-UG576 UltraScale GTH Transceiver)

 

The latency value of the PMA logic can be calculated as follows.

Tx Latency

1)     Tx Fabric Interface                    = 32 UI

2)     To TX PCS/PMA boundary        = 32 UI

3)     To Serializer                              = 64 UI

4)     PMA                                         = 15 UI

The total maximum Tx Latency is 143 UI.

Rx Latency

1)     PMA                                         = 60.5 UI

2)     PMA to PCS                              = 16 UI

3)     Rx Fabric Interface                    = 32 UI

The total maximum Rx Latency is108.5 UI.

Therefore, the total maximum PMA latency during the loopback demo is 143 UI (Tx) + 108.5 UI (Rx). In the loopback demo using the ZCU102 (UltraScale+ GTH transceiver) with a clock frequency of 322.265625 MHz, this total PMA latency corresponds to approximately 8 clock cycles or 24.8 ns.

Note: The transfer speed of 10G Ethernet is 10.3125 Gbps, meaning that 1 UI (Unit Interval) is equivalent to 0.097 ns (1/10.3125G)

 

General Description

 

 

Figure 5: LL10GEMAC Block Diagram

 

The LL10GEMAC IP core integrates both the MAC (Media Access Control) layer and the PCS (Physical Coding Sublayer) to provide a complete 10G Ethernet solution. Upon power-up, the Rx controller and synchronization mechanism initialize to calibrate the receive interface with the PMA (the transceiver), ensuring data block locking. The received data from the PMA is continuously monitored to determine the link status. Once a stable connection is established, the LL10GEMAC IP core asserts the ready signal to the user interface, enabling packet transmission to begin.

During packet transmission, the LL10GEMAC IP core automatically appends a preamble and Start Frame Delimiter (SFD) to create the packet header. At the end of packet, zero padding (for small packets), the Frame Check Sequence (FCS), and Interframe Gap (IFG) are added to form the packet footer. The data is then processed by a 64B/66B encoder and scrambled before being sent to the Tx Gearbox module, which transmits 32-bit data to the PMA.

On the receiving end, the LL10GEMAC IP core first re-aligns the incoming packet using the Rx Gearbox. The data is then descrambled and decoded using the Descramble and 64B/66B decoder blocks, respectively. After decoding, the FCS of the packet is verified and removed along with the preamble and SFD, leaving only the Ethernet data, which is forwarded to the AXI4 stream interface (AXI4-ST I/F). If the FCS is incorrect, the LL1GEMAC IP core asserts an error signal to the AXI4-ST I/F.

During packet transmission, it is essential that the user ensures the packet data remains available until the end of the packet, with the data valid signal consistently asserted. However, the ready signal from the LL10GEMAC IP core may de-assert for one clock cycle every 32 clock cycles due to the 64B/66B encoder’s characteristics.

Similarly, the user must be prepared to receive packets continuously, as the LL10GEMAC IP core does not have an internal receive buffer. During packet reception, the data valid signal will de-assert for one clock cycle every 32 clock cycles to pause data transmission and synchronize with the 64B/66B decoder.

 

Functional Description

The LL10GMEAC IP supports simultaneous bidirectional data transmission, as shown in Figure 5. The Transmit (Tx) and Receive (Rx) logics operate independently in separate clock domains.

 

Transmit Block

Within the LL10GEMAC IP, data packets received from the AXI4-ST interface is processed by CRC32 calculation (FCS), 64B/66B encoding, scrambling, and alignment before being transmitted to the PMA.

 

·       Tx Controller

The Tx Controller handles packet framing by adding both the header and footer. The header consists of a 7-byte preamble and a 1-byte Start of Frame Delimiter (SFD), while the footer contains a 4-byte Frame Check Sequence (FCS) based on CRC-32 and 1-byte End of Frame Delimiter (EFD). If the packet is too short, zero-padding is appended after the Ethernet data. Once each frame is transmitted, an Idle period is inserted as the Interframe Gap (IFG) between packet. The Tx Controller monitors control signals on the AXI4-ST interface to detect new frames. The ready signal on AXI4-ST may temporarily de-assert to allow the 64B/66B encoder to insert the packet header while the Gearbox processes the data.

 

·       CRC32 Cal

This module calculates the 32-bit CRC for each data packet using a 32-bit data bus from the AXI4-ST interface. The FCS (CRC-32) is appended to the packet footer following the Ethernet data, which may also include zero-padding. The polynomial used for CRC-32 follows the IEEE802.3ae standard, ensuring compliance with Ethernet transmission:

P(X) = X32 + X26 + X23 + X22 + X16 + X12 + X11 + X10 + X8 + X7 + X5 + X4 + X2 + X + 1

 

·       64B/66B Encoder

The LL10GEMAC IP core employs 64B/66B encoding for 10G Ethernet transmission, ensuring clock recovery by the Clock Data Recovery (CDR) module. This encoding technique minimizes overhead and is optimized for 10GBASE-R, as per the IEEE802.3ae specification. The encoder processes the packet, including its header and footer, before passing it to the Scramble module.

 

·       Scramble

To avoid long sequences of repetitive bits (1b and 0b), the encoded data is scrambled using the scrambling polynomial defined by the IEEE802.3ae:

P(X) = X58 + X39 + 1

This step ensures data integrity and optimal transmission characteristics.

 

·       Tx Gearbox

To comply with the 64B/66B encoding, the 64-bit data from the user is encoded into 66-bit frames. Since the PMA interface only supports 32-bit data, the Tx Gearbox module realigns and formats the encoded and scrambled data to 32 bits. Due to the higher bandwidth of the Scramble module compared to the PMA interface, the user data transmission is paused for one clock cycle every 32 clock cycles. During this pause, the 64B/66B encoder inserts its header for transmission.

 

Receive Block

The Receive Block consists of several submodules that handle the reception and processing of incoming data. It performs the reverse operations of the Transmit Block and includes a Synchronization block for additional functionality.

 

·       Rx Gearbox

The Rx Gearbox processes the 32-bit parallel data stream received from the PMA. It splits the data stream into a 2-bit header and a 64-bit data field, without assuming any data alignment. To ensure correct data ordering, the Rx Gearbox employs bit-slip logic, which realigns the data until the received data bus is properly aligned. The control logic for the bit-slip function is managed by subsequent module.

 

·       Rx Controller and Synchronization

Following the completion of the reset sequence, the Synchronization submodule monitors the incoming data from the Rx Gearbox. Its purpose is to fine-tune data alignment by sending slip signals until the data is properly locked. Continuous monitoring of data alignment persists even after the Ethernet connection is established, and corrective measures are taken if any misalignment is detected.

Once the packet is descrambled and decoded, the Rx Controller verifies the Start of Frame Delimiter (SFD) and Frame Check Sequence (FCS). If any errors are found in the received packet, the controller asserts an error signal to the AXI4-ST interface. The header and footer are stripped from the packet before forwarding it to the AXI4-ST interface, but any zero-padding within the packet is retained to minimize latency in the Rx path.

 

·       Descramble

This submodule descrambles the data output from the Rx Gearbox before passing it to the 64B/66B Decoder.

 

·       64B/66B Decoder

The 64B/66B Decoder submodule decodes the descrambled data, identifying the link status, Start of Frame, and End of Frame. It provides data and control outputs that are monitored by the Rx controller to validate the packet’s data sequence.

 

·       CRC32 Cal

This submodule is identical to the CRC32 Cal module in the Transmit Block. However, in the Receiver Block, the calculated CRC32 is applied to verify the FCS extracted from the received packet. If the received FCS does not match the calculated CRC32, an error is asserted on the Rx AXI4-ST interface.

 

10GbE PMA (10GBASE-R)

The 10GBASE-R Physical Media Attachment (PMA) is provided by AMD Xilinx without the charge and is generated using the UltraScale FPGAs Transceivers Wizard. This wizard provides a template that helps users configure the transceiver parameters for 10GBASE-R operation. To ensure seamless integration with the LL10GEMAC IP core, the following settings must be modified from the default value in the BASE-R template.

For more information about the Transceiver Wizard and how to configure it, please refer to the following link.

https://www.xilinx.com/products/intellectual-property/ultrascale_transceivers_wizard.html

 

Core I/O Signals

Descriptions of all I/O signals are provided in Table 2.

Table 2: Core I/O Signals

Signal

Dir

Description

User Interface

Linkup

Out

1b-Link up, 0b-Link down. Asserted to 1b when the Ethernet connection is successfully established and the PMA returns an Idle code on the Receive interface. This signal is synchronous to RxClk.

RxPCSLock

Out

1b-Locked, 0b-Not Locked. Asserted to 1b when data block has been successfully locked. This signal is synchronous to RxClk.

TxTestPin[7:0]

Out

Reserved to be IP Test point. This signal is synchronous to TxClk.

RxTestPin[7:0]

Out

Reserved to be IP Test point. This signal is synchronous to RxClk.

IPVersion[31:0]

Out

IP version number

Tx AXI4 stream interface (Synchronous to TxClk)

tx_axis_tdata[31:0]

In

Transmitted data of AXI4-stream interface. Valid when tx_axis_tvalid is set to 1b.

tx_axis_tkeep[3:0]

In

Byte enable of the 32-bit tx_axi_tdata. Each bit corresponds to the validity of a byte of tx_axis_tdata. Asserted to 1b when that byte is valid. Bit[0], [1], [2], and [3] correspond to tx_axis_tdata[7:0], [15:8], [23:16], and [31:24], respectively. When tx_axis_tvalid is 1b, tx_axis_tkeep is equal to Fh for sending 32-bit data in each packet except for the last data (tx_axis_tlast=1b). For the last data, tx_axis_tkeep can be 1h, 3h, 7h, or Fh to indicate that 1 to 4 bytes of data are valid, respectively.

tx_axis_tvalid

In

Assert to 1b to transmit data. This signal must remain continuously asserted to 1b from the start to the end of the packet. The minimum transmitted data size is 5 bytes.

tx_axis_tlast

In

Assert to 1b to indicate the final word in the frame. Valid only when tx_axis_tvalid=1b.

tx_axis_tuser

In

Assert to 1b to discard transmit packet. Valid only when tx_axis_tvalid=1b and tx_axis_tlast=1b.

tx_axis_tready

Out

A Handshaking signal indicating that tx_axis_tdata has been accepted. Asserted to 1b when the data is completely received. If de-asserted, the value of tx_axis_tdata, tkeep, tvalid , tlast , and tuser must remain latched until tx_axis_tready is re-asserted to 1b.

Rx AXI4 stream interface (Synchronous to RxClk)

rx_axis_tdata[31:0]

Out

Received data. Valid when rx_axis_tvalid=1b.

rx_axis_tkeep[3:0]

Out

Byte enable for the received data. Each bit indicates the validity of a corresponding byte in rx_axis_tdata (set to 1b when it is valid). Bit[0], [1], [2], and [3] correspond to rx_axis_tdata[7:0], [15:8], [23:16], and [31:24], respectively. The signal is valid when rx_axis_tvalid=1b.

rx_axis_tvalid

Out

Asserted to 1b when the received data is valid. During packet transmission, this signal may be temporarily de-asserted to 0b to pause data transmission.

rx_axis_tlast

Out

Assert to 1b to indicate the final word in the frame. Valid only when rx_axis_tvalid=1b.

rx_axis_tuser

Out

Valid at the end of the frame transmission (rx_axis_tlast=1b and rx_axis_tvalid=1b) to indicate if the frame contains an error. 0b: normal packet, 1b: error packet (SFD, FCS, or EFD is incorrect).

Tx PMA I/F (Synchronous to TxClk)

TxRstB

In

Reset IP core in TxClk domain, output from the PMA. Active Low.

TxClk

In

Clock output from the PMA for Tx interface. 322.265625 MHz for 32-bit interface.

TxUserData[31:0]

Out

32-bit transmitted data to the PMA.

Rx PMA I/F (Synchronous to RxClk)

RxRstB

In

Reset IP core in RxClk domain, output from the PMA. Active Low.

RxClk

In

Clock output from the PMA for Rx interface. 322.265625 MHz for 32-bit interface.

RxUserData[31:0]

In

32-bit data received from the PMA

 

Timing Diagram

 

IP Initialization

 

Figure 6: Rx Tuning and Linkup Timing Diagram

 

Upon de-assertion of RxRstB to 1b, the receive module inside the IP initiates a synchronization process to align and lock the received data from the PMA. Once the data is successfully locked and the Ethernet connection is established, the Linkup signal is asserted to 1b in the RxClk domain. Simultaneously, in the transmit clock domain (TxClk), the tx_axis_tready signal is asserted to 1b. The initialization process can be further detailed as follows.

 

1)     Initially, the LL10GEMAC IP monitors the RxUserData transmitted by the PMA to check for proper alignment. If misalignment is detected, the IP initiates the data re-alignment process.

2)     The LL10GEMAC IP executes the bit-slip process to adjust and correct the data alignment. This process continues until the correct alignment is achieved. Once the data is properly aligned, the RxUserData becomes reliable for further processing.

3)     Upon successful completion of the alignment process, the RxPCSLock signal is asserted to 1b, indicating that the data alignment is now stable and correct.

4)     The aligned data is then descrambled and 64B/66B decoded. The LL10GEMAC IP then waits for the detection of the Idle code. Once the Idle code is detected, the Linkup signal is asserted to 1b, signifying that the Ethernet connection has been successfully established.

5)     After the initialization process is completed, the LL10GEMAC IP asserts tx_axis_tready to 1b, signaling that the IP is ready to accept data transmission from the user.

 

Transmit interface

 

Figure 7: Transmit interface timing diagram

 

When a new frame is transmitted from the user, the IP performs several operations to prepare the packet for transmission. These operations include inserting the packet header: Preamble and SFD and footer: FCS and EFD. All data generated by the IP, including the header and footer, undergoes encoding, scrambling, and re-alignment to a 32-bit format. For short packets (less than 60bytes), zero-padding is added before transmitting the FCS.

The packet transmission process is as follows.

1)     The IP detects a new packet when tx_axis_tvalid is asserted to 1b while the IP is in a ready state (tx_axis_tready=1b). The tkeep input should always be Fh, except for the last data in cases of unaligned 32-bit data.

Note: tx_axis_tready may be temporarily de-asserted to 0b for 1-2 clock cycles after receiving the first data. Therefore, the second data (D1) must retain its value until tx_axis_tready is re-asserted to 1b.

2)     The IP transmits a 7-byte preamble and the SFD to the PMA.

3)     The first user data is transmitted from the IP to the PMA. The latency in the Tx path, measured between the first data on tx_axis_tdata and the first data on TxUserData, is typically 6-8 clock cycles, depending on the sequence within the Gearbox.

4)     When tx_axis_tready is de-asserted to 0b, all input signals from the user (tx_axis_tdata, tx_axis_tkeep, tx_axis_tvalid, and tx_axis_tlast) must maintain their values until tx_axis_tready is re-asserted to 1b. Normally, tx_axis_tready is de-asserted to 0b for one cycle every 32 cycles to pause data transmission while the 64B/66B encoding process adds the packet header.

5)     When the last data is detected (tx_axis_tvalid, tx_axis_tlast, tx_axis_tready are set to 1b), the IP de-asserts tx_axis_tready to 0b, pausing data transmission to complete post-processing of the packet. During this cycle, tx_axis_tkeep indicates the byte enable for the last data, which can be 1111b (4 bytes valid), 0111b (3 bytes valid), 0011b (2 bytes valid), or 0001b (1-byte valid).

6)     The IP transmits the encoded and scrambled last data to the PMA, adding zero-padding if the packet length is below 60 bytes.

7)     After receiving the last data from the user, tx_axis_tready remains de-asserted for at least 4 clock cycles. The number of output data from the IP exceeds the input data from the user due to the addition of the packet header and the footer. As a result, tx_axis_tready is temporarily de-asserted to pause further data input during the transmission of the packet header and footer.

8)     An Idle code is always inserted at the end of each packet, serving as the interframe gap (IFG). The IP ensures that at least 9-byte IFG is inserted between packets.

 

Receive Interface

 

Figure 8: Receive interface timing diagram

 

When the IP receives a new packet from the PMA, it re-aligns the 32-bit data, descrambles, and decodes the packet to identify the start and end of the packet. The packet’s header and footer are verified and removed before forwarding the data to the user. If there are errors in the SFD, FCS, or EFD, an error signal (rx_axis_tuser) is asserted to 1b to the user.

 

1)     The IP begins processing after detecting the SFD code in the decoded data from the Rx PMA interface.

2)     After detecting the first data from the PMA, the IP waits 7 clock cycles before sending the first data to the user. This latency is due to the internal logic that converts PMA data to user data. On the user interface, the data is valid for all 32 bits except for the last data, which may contain fewer valid bytes. Therefore, rx_axis_tkeep is always 1111b, except for the last data, where it may be 0001b (1 byte valid), 0011b (2 bytes valid), 0111b (3 bytes valid), or 1111b (4 bytes valid).

3)     To remove the header following the 64B/66B decoding process, the valid signal on the user interface (rx_axis_tvalid) is temporarily de-asserted to 0b for 1 clock cyle every 32 clock cycles.

4)     After detecting the end of the packet (EFD code), rx_axis_tvalid is de-asserted to 0b for 2-5 clock cycles while the IP verifies the FCS of the received packet.

5)     If no error is detected, the IP asserts rx_axis_tlast and rx_axis_tvalid to 1b with the last data on rx_axis_tdata, while rx_axis_tuser is de-asserted to 0b. If an error is detected, rx_axis_tuser is asserted to 1b in this clock cycle.

The IP removes the packet’s header and footer before forwarding it to the user, but zero-padding is retained to optimize latency.

 

Verification Methods

The LL10GEMAC IP Core functionality was verified by simulation and also proved on real board design by using ZCU102, Alveo U50, and Alveo U250.

 

Recommended Design Experience

User must be familiar with HDL design methodology to integrate this IP into the design.

 

Ordering Information

This product is available directly from Design Gateway Co., Ltd. Please contact Design Gateway Co., Ltd. for pricing and additional information about this product using the contact information on the front page of this datasheet.

 

Revision History

Revision

Date

(D/M/Y)

Description

1.03

11-Sep-24

Add port RxPCSLock and tx_axis_tuser

1.02

25-May-23

Add AAT demo on Alveo card

1.01

29-Apr-21

Update IP to version 2

1.00

21-May-20

New release