LL10GEMAC-IP with AAT Calypte DMA Demo Instruction

 

1     Overview. 2

2     Target System Setup. 4

2.1   IP Address Configuration for Two 10G Ethernet Ports. 4

2.2   Installation of “tcpreplay” 6

3     Host System Setup. 6

4     Run AAT Calypte DMA Demo. 9

4.1   Initialization. 9

4.2   Market Data Transmission. 12

4.3   AAT-CALYPTEDMA Demo. 14

4.3.1    Ethernet Subsystem.. 14

4.3.2    Line Handler Submodule. 15

4.3.3    Feed Handler Submodule. 16

4.3.4    Order Book Submodule. 17

4.3.5    Data Mover Submodule. 18

4.3.6    Pricing Engine Submodule. 20

4.3.7    Order Entry Submodule. 21

5     Update Hardware via PCIe. 22

5.1   Create the NFW file. 22

5.2   Download the NFW File via PCIe. 23

6     Revision History. 24

 

 


 

1         Overview

This document provides instructions for configuring the Alveo accelerator card and setting up the test environment to run the Accelerated Algorithmic Trading (AAT) demo. The goal is to build a low-latency platform for high-frequency trading (HFT) applications.

This demo is a modified version of AMD’s original AAT demo, optimized to achieve lower latency by introducing two major changes.

1)     Ethernet MAC: The AMD 10G/25G Ethernet subsystem has been replaced with the LL1GEMAC-IP from Design Gateway, a low-latency Ethernet MAC optimized for HFT.

For further technical details: LL10GMEAC-IP datasheet from Design Gateway:

https://dgway.com/products/IP/Lowlatency-IP/dg_ll10gemacip_data_sheet_xilinx_en/

2)     DMA Engine: The AMD DMA Engine has been replaced with Calypte DMA from DYNANIC, used together with the NDK-FPGA framework developed by CESNET. Due to this modification, the development environment has been migrated from Vitis to Vivado, as required by the new PCIe DMA platform.

For further technical details: Calypte DMA datasheet from DYNANIC:

https://dyna-nic.com/wp-content/uploads/2025/05/DMA-Calypte.pdf

The demo operates on the Alveo X3522 accelerator card and demonstrates performance on a 10G Ethernet connection. This board is equipped with two DSFP28 ports, supporting up to two 10G Ethernet channels per DSFP28.

In this demo, two 10G Ethernet channels are required:

1)     Market Data Transmission: Transmits sample market data using the UDP protocol.

2)     Order Transmission: Handles order data transmission using FIX over TCP.

To set up the system, a target PC with two 10G Ethernet ports is required. The sample market data is transmitted using the “tcpreplay” tool, while order reception is monitored by opening a TCP port on the target system. The demo on the Alveo accelerator is initiated by executing the “aat_calypte_sw” application.

The following test environment was configured to produce the results presented in this document.

1)     Supported Alveo cards: X3522.

2)     Host system for the Alveo accelerator card: Turnkey accelerator system (TKAS-D2101). Detailed specifications are available at https://dgway.com/products.html#Turnkey.

3)     Vivado Design Suite installed on the host system to program the Alveo card.

4)     10G Ethernet cable for X3522:

·       Two Ethernet channels by 2xSFP+ Active Optical Cable (AOC):

https://www.10gtek.com/10gsfp+aoc.

·       Four Ethernet channels by 2x50G DSFP Breakout DAC:

https://ascentoptics.com/product/50g-dsfp-to-2x-25g-sfp28-breakout-dac-1m.html

5)     Programming cable for X3522: Alveo Debug Kit (ADK2).

6)     Target system is configured with the following specifications:

·       Operating System: Oracle Linux 8.10

·       Market Data: Sample market data file (cme_input_arb.pcap)

·       Packet Replay: “tcpreplay” package for transmitting market data

·       Ethernet Ports: Two 10G Ethernet ports provided by a 10G Ethernet network card

 

 

Figure 1 LL10GEMAC-IP with AAT-CALYPTEDMA Demo using Alveo X3522 Card

 


 

2         Target System Setup

This section provides step-by-step instructions for preparing the target system, equipped with two 10G Ethernet ports, to transfer market data and order packets with the Alveo accelerator card. The system runs Oracle 8.10 Linux OS.

2.1       IP Address Configuration for Two 10G Ethernet Ports

First, identify the logical names of the two 10G Ethernet ports, which connect to the SFP+#1 and SFP+#2 cables. These logical names may vary based on your test environment, so it is important to configure the correct IP address for the SFP+#1 and SFP+#2 connections.

1)     Open a Linux terminal and use the following command to list the logical names of the 10G Ethernet ports:

>> sudo lshw -C network

 

 

Figure 2 Display Logical Name of 10G Ethernet Ports

 

The output will display information about the network interfaces. For example, Figure 2 shows logical names such as “enp1s0f0” for SFP+#1 and “enp1s0f1” for SFP+#2.


 

2)     Configure the IP address for each port using the “ifconfig” command.

·        Set SFP+#1 (enp1s0f0) to “192.168.10.100”.

·        Set SFP+#2 (enp1s0f1) to “192.168.20.100”.

Additionally, configure the netmask to 255.255.255.0 (i.e., /24 subnet). The command format is as follows.

 

 

Figure 3 Configure IP Address and Netmask

 

3)     After configuring the IP addresses and netmask, verify the settings using the “ifconfig <logical name>” command.

 

 

Figure 4 Verify IP Address and Netmask Setting

 

Ensure that both Ethernet ports are correctly assigned with their respective IP addresses and netmask values.


 

2.2       Installation of “tcpreplay”

To run the AAT-CALYPTEDMA demo, the target PC must have the “tcpreplay” tool installed, which is used to replay packet capture files over a network interface. Run the following command to install the “tcpreplay” package:

>> sudo yum install tcpreplay

 

 

Figure 5 “tcpreplay” Installation

 

This command will install “tcpreplay”, as illustrated in Figure 5. Once the installation is completed, “tcpreplay” will be ready for use in the demo.

3         Host System Setup

This section describes how to prepare the Turnkey Acceleration System (TKAS-D2101 with an Alveo X3522 card), which serves as the host platform for running the AAT-CALYPTEDMA demo.

1)     The NFB framework (software component of the NDK) must be installed to enable packet transfer through PCIe during running this demo. The framework provides drivers and tools for managing FPGA cards and performing on-the-fly hardware updates.

Users can install it either from prebuilt packages (RPMs) available for RHEL/CentOS systems or by building from source, which is recommended for customization and access to the latest features.

Detailed installation instruction is provided in the NDK Installation Guide at following link:

https://cesnet.github.io/ndk-sw/install.html

2)     Connect the Ethernet and programming cable between the Alveo X3522 card and the target system by following the steps below.

i)       Insert two SFP+ transceivers into the SFP+ connectors on the Alveo accelerator card.

ii)      Connect SFP+ no.1 (IP: 192.168.10.100) and SFP+ no.2 (IP: 192.168.20.100) to the 10G Ethernet ports on the target system.

iii)     For programming the card, connect the Flex cable from the Alveo accelerator card to the Alveo Debug kit (ADK2). Ensure the Flex cable is firmly connected.

 

 

Figure 6 SFP+ and Flex Cable Connection on X3522

 


 

3)     Utilize the Vivado Hardware Manager to program the Alveo card. Open the Vivado Hardware Manager and program the board with the required bit file as illustrated in Figure 7.

 

 

Figure 7 Program Alveo by Vivado Tool


 

4)     Warm reboot the system and confirm that the hardware is implemented on the card using the “lspci” command. The console must display “Ethernet controller: Cesnet, z.s.p.o. Device c000” as shown in Figure 8.

 

 

Figure 8 Output of the “lspci” Command After Programming the Alveo Card

 

5)     Boot the AAT-CALYPTEDMA demo on the Alveo card by executing the “AAT_CALYPTE” application.

i)       Navigate to the “AATCALYPTEDMA _X3522” folder:

>> cd <directory>/AATCALYPTEDMA _X3522/

ii)      Execute the application:

>> sudo ./software/AAT_CALYPTE

After execution, AAT applications will initialize, as shown in Figure 9.

 

Figure 9 Status Displayed After Executing the Demo Application on the Host System

 


 

4         Run AAT Calypte DMA Demo

To execute the demo, the user must follow three key steps: Initialization, market data transmission, and test result display. The initialization process establishes the connection and configures the necessary parameters. The market data transmission process involves sending market data from the target system, while the results are received from the Alveo card and displayed on the TKAS-D2101 console. Detailed instructions for each process are provided below.

4.1       Initialization

To run the AAT Calypte DMA demo, both the target system and the Alveo card need to be properly configured. The target must listen on a specific port to receive order packets from the Alveo card, once it has completed processing the market data. Similarly, the Alveo card must be configured to process the market data and send the order packet. This is done using “demo_setup.cfg” or “demo_setup_with_datamover.cfg” script file. Follow the steps below for system initialization.

1)     On the target system’s console, enter the following command to listen on port 12345.

>> nc -l 192.168.20.100 12345 -v

This will configure the target to listen for incoming packets on the specified IP and port.

 

 

Figure 10 Listen TCP Port on Target PC

 

2)     After entering the command, you should see a confirmation message on the console “Listening on <Target PC name> 12345” indicating that the port is listening, as shown in Figure 10.

3)     On the TKAS-D2101 console, run the script file to configure the parameters for processing market data. Two configuration options are available depending on where the Pricing Engine is implemented:

·        Pricing Engine on the Alveo Card: Use the “demo_setup.cfg” script to configure the demo. This method provides lower latency for market data processing.

>> run support/demo_setup.cfg

·        Pricing Engine on the host software: Use the “demo_setup_with_datamover.cfg” script to configure the demo. This method is better suited for more complex market data processing algorithms.

>> run support/demo_setup_with_datamover.cfg

Note: The script file can only be executed after the demo application has been started (see step (5) in section 3-Host System Setup). If user requires to rerun the script, first terminate the demo application using the “exit” command, then restart from step (5) in section 3-Host System Setup.


 

 

Figure 11 Run Demo Configuration Script (Using demo_setup_with_datamover.cfg)

 

4)     TKAS-D2101 will display messages during the setup process, as shown in Figure 11. These messages will indicate the progress and status of the configuration process.

5)     If the parameter configuration is successful, the target console will display a message indicating that the port has been opened successfully, such as “Connection received on 192.168.20.200 62303”, as shown in Figure 12.

 

 

Figure 12 Port Opened Success


 

4.2       Market Data Transmission

To transmit sample market data, follow these steps using the “tcpreplay” on the target PC. You will need to open two terminal windows on the target PC: Target Console#1 and Target Console#2. Target Console#1 will display the details of the received order packet via TCP protocol (through SFP+#2: enp1s0f1), while Target Console#2 will be used to send the sample market data via UDP protocol (through SFP+#1: enp1s0f0). Follow the steps below.

1)     Send the sample market data. Use the “tcpreplay” command to send the provided sample market data file from AMD AAT demo (cme_input_arb.pcap). Execute the following command on Target Console#2 with the four parameters.

>> sudo tcpreplay –intf1=<eth I/F> --pps=<pac/sec> --stats=<stat period> <replay file>

i)       <eth I/F>                : Ethernet interface used to send market data (SFP+#1: enp1s0f0).

ii)      <pac/sec>               : Transfer speed, defined as the number of packets per second.

iii)     <stat period>           : Time interval (in seconds) to display the transmission status on the console.

iv)    <replay file>            : File name of the market data to be transmitted (e.g., cme_input_arb.pcap).

 

 

Figure 13 Sample Market Data Transmission by “tcpreplay”

 

2)     As the data is transmitted, the console will display the status every second, showing the total number of packets transmitted.

3)     On Target Console#1, which is already connected to the listening port, the console will display the received data. The data represents the sample order packet returned by the Alveo card and serves as the result of the AAT-CALYPTEDMA demo.

 

 

Figure 14 The Sample Data of Order Packet on SFP+#2 Channel


 

4.3       AAT-CALYPTEDMA Demo

This section provides example results from market data processing on the Alveo card. The AAT-CALYPTEDMA demo design includes the subsystem and multiple submodules responsible for processing the sample market data. This section focuses on seven key components: Ethernet subsystem, Line Handler submodule, Feed Handler submodule, Order Book submodule, Data Mover submodule, Pricing Engine submodule, and Order Entry submodule. The following sections describe the sample results obtained from these components within the demo design.

4.3.1     Ethernet Subsystem

To view the status of the Ethernet subsystem in the AAT-CALYTEDMA demo system, follow these steps.

1)     Enter the following command in the terminal to display the current status of Ethernet subsystem.

>> ethernet getstatus

 

 

Figure 15 Ethernet Submodule Status

 

2)     The AAT-CALYPTEDMA demo system uses four Ethernet channels: channel0 to channel3. In this example, channel#0 is used for receiving the sample market data, and channel#1 is used for returning the sample order packet. To verify proper operation, check the status of channel#0 and channel#1. During normal operation, the Ethernet link status for these channels should indicate “UP”, confirming that they are functioning correctly.


 

4.3.2     Line Handler Submodule

To view the status of the Line Handler submodule in the AAT-CALYTEDMA demo system, follow these steps.

1)     Enter the following command in the terminal to display the current status of the Line Handler submodule:

>> linehandler getstatus

 

 

Figure 16 Line Handler Submodule Status

 

2)     After entering the command, the console displays the UDP session with its IP address, port number, and split ID. The AAT-CALYPTEDMA system includes two Ethernet ports, connected to two UDP kernels. In this demo, only one UDP kernel (udp0) is initialized, configured to receive data stream from two UDP ports: 14318 and 15318.

This setup emulates two separate UDP streams (as in the full system) to reduce packet loss. Therefore, input port#1 shows a “None” status instead of a UDP session information.

3)     The console also displays statistical parameters from the Line Handler submodule. Before transmitting the sample market data, the processed data count is zero. Once the market data transmission begins (see the right window of Figure 16), this count increases, indicating that the Line Handler submodule is processing the market data.

In this example, Port#0 received 104 packets, of which 52 were discarded due to intentional duplication generated by the demo. These discarded packets do not indicate an error; rather, they demonstrate how the Line Handler filters duplicate data and simulates real traffic conditions involving two UDP kernels in a full deployment.

4.3.3     Feed Handler Submodule

To view the status of the Feed Handler submodule in the AAT-CALYTEDMA demo system, follow these steps.

1)     Enter the following command in the terminal to display the current status of the Feed Handler submodule.

>> feedhandler getstatus

 

 

Figure 17 Feed Handler Submodule Status

 

2)     The console will display the count of processed data in various units, such as bytes, packets, and messages. Before transmitting the sample market data, the processed data count will be zero. After the market data transmission, this count will increase, indicating that the Feed Handler submodule has begun processing the market data.

For example, in Figure 17, the left window shows that the processed data count is initially zero. After transmitting the sample market data, the count increases, confirming the submodule is processing the data.

In Figure 13, an example is shown where 104 packets of sample market data were sent by the target PC. Out of these packets, 53 packets were transmitted from the Line Handler to the Feed Handler for further processing.

 


 

4.3.4     Order Book Submodule

To view the status of the Order Book submodule in the AAT-CALYTEDMA demo system, follow these steps.

1)     Enter the following command to read and display the current order book output from the Order Book submodule.

>> orderbook readdata

 

 

Figure 18 Updated Order Book upon the Processing Completion

 

2)     The console will show the current values in the order book. Initially, before transmitting any sample market data, the order book will be in a clean state. However, after the transmission of all sample market data, the Order Book submodule will update the bid/ask quantities and prices in the order book to reflect the changes in market conditions.

For instance, in Figure 18, the left window displays the clean status of the order book before the market data is transmitted. The right window shows the updated order book after all the sample market data has been processed. The bid/ask quantities and prices are adjusted based on the market data, as updated by the Order Book submodule.

 


 

4.3.5     Data Mover Submodule

To view the status of the Data Mover submodule in the AAT-CALYTEDMA demo system, follow these steps.

1)     Enter the following command in the terminal to display the current status of Data Mover submodule.

>> datamover getstatus

 

 

Figure 19 Data Mover Submodule Status

 

2)     The console will display the current status of Data Mover submodule. This Data Mover is active only when user selects to implement Pricing Engine by host software (utilized “demo_setup_with_datamover.cfg” file in step (3) of section 4.1-Initialization). Before transmitting the sample market data, the processed data count will be zero. After the market data transmission, this count will increase, indicating that the Data Mover submodule has begun delivering the market data. Before transmitting the sample market data, ensure that the software thread and hardware submodule (hardware kernel) are running:

·        HW Kernel Is Running            : true

·        SW Thread Is Running           : true

Note: If the Pricing Engine is configured using “demo_setup.cfg” (implemented on the Alveo card), the statuses of “HW Kernel Is Running” and “SW Thread Is Running” are both false.

In Figure 19, the total number of packets moved to Pricing Engine on the host software is 53 packets. After the host Pricing engine completes the processing, the Data Mover receives 17 packets representing the orders to be delivered to the Order Entry submodule. The remaining 36 packets are not executed by the Pricing Engine.

3)     To measure the RTT (Round-Trip Time) – defined as the time from when a packet is out from the Data Mover submodule, is processed in the host Pricing Engine, and then order packet returns back to the Data Mover submodule - execute the following command while transmitting the sample market data.

>> datamover timing

 

 

Figure 20 Data Mover RTT Result

 

4)     After entering the command, the measurement is operated for 10 seconds. Figure 20 shows the RTT measurement result, including the maximum, minimum, average values, as well as the total number of sample market data packets collected during the 10-second interval.


 

4.3.6     Pricing Engine Submodule

To view the status of the Pricing Engine submodule in the AAT-CALYTEDMA demo system, follow these steps.

1)     Enter the following command in the terminal to display the current status of Pricing Engine submodule.

>> pricingengine getstatus

 

 

Figure 21 Pricing Engine Submodule Status

 

2)     Pricing Engine submodule, operating on the Alveo card, is activated only when user selects to implement Pricing Engine on the FPGA (selected “demo_setup.cfg” file in step (3) of section 4.1-Initialization). Similar to the other components, before transmitting any sample market data, the Pricing Engine will be in a clean state.

As shown in Figure 21, after market data is transmitted and processed, Pricing Engine submodule receives 53 data sets from the updated order book. However, in this case the TX operations are 0, since the Pricing Engine is operating in host pricing engine mode.

 


 

4.3.7     Order Entry Submodule

To check the current status of the Order Entry submodule in the AAT-CALYTEDMA demo system, follow these steps.

1)     Enter the following command to display the current status of the Order Entry submodule.

>> orderentry getstatus

 

 

Figure 22 Order Entry Submodule Status

 

2)     Before transmitting the sample market data, check the TCP connection status in the Order Entry submodule status. Two key indicators to confirm are as follows.

·        Connection Established          : true

Note: If the TCP connection cannot be established successfully, the Connection Established equals “false”.

·        Connection Status                 : SUCCESS

Note: If the TCP connection has been already terminated (no active connection), the Connection Status equal “CLOSED”.

3)     After all market data has been transmitted, the packet count in the Order Entry submodule update from 0 to reflect the total number of messages or frames processed by the Order Entry submodule.


 

5         Update Hardware via PCIe

In certain environments, remote hardware updates are necessary. The AAT-CALYTEDMA system supports hardware updates via PCIe, eliminating the need for a separate programming cable. This process consists of two steps: creating the NFW firmware file and downloading it.

5.1       Create the NFW file

Figure 23 NFW File Creation

1)     Navigate to the directory that contains the bitstream file, then generate the NFW archive using the tar command:

>> tar czvf <nfw_output_file> <bitstream_input_file>

Command and its options:

·        tar            : Utility to create an archive

·        c               : Create a new archive

·        z               : Compress the archive using gzip

·        v               : Verbose mode, lists files as they are added

·        f               : Specify the output filename (<nfw_output_file >)

2)     After execution, an output file with the .nfw extension is created. This file can be downloaded via PCIe using nfb-boot. The .nfw file is a gzip-compressed TAR archive used by the NFB driver to manage firmware slots.

 


 

5.2       Download the NFW File via PCIe

The FPGA firmware can be updated over PCIe using the nfb-boot utility. For detailed information about this tool, follow the link: https://cesnet.github.io/ndk-sw/tools/nfb-boot.html.

Follow these steps to download the NFW file:

 

 

Figure 24 NFW File Programming by nfb-boot

 

1)     Check the list of available slots:

>> nfb-boot -l

2)     The console displays all available slots. In this example, a single X3522 card is installed, so only slot #0 is detected.

3)     Write the updated NFW file to the selected slot:

>> nfb-boot -w <slot_index> <nfw_file>

4)     The console displays the bitstream file size and the flash programming progress until completion.

5)     After programming, select the slot number to boot the device:

>> nfb-boot -F <slot_index>

6)     Perform a cold reboot of the system. After rebooting, the new hardware configuration will be permanently loaded onto the card.


 

6         Revision History

Revision

Date (D-M-Y)

Description

1.01

18-Dec-25

Correct DMA name

1.00

16-Sep-25

Initial version release