{"id":922,"date":"2022-12-09T12:07:07","date_gmt":"2022-12-09T05:07:07","guid":{"rendered":"https:\/\/dgway.com\/blog_E\/?p=922"},"modified":"2022-12-09T12:48:57","modified_gmt":"2022-12-09T05:48:57","slug":"2d-perlin-noise-generator-accelerator","status":"publish","type":"post","link":"https:\/\/dgway.com\/blog_E\/2022\/12\/09\/2d-perlin-noise-generator-accelerator\/","title":{"rendered":"2D Perlin Noise Generator Accelerator"},"content":{"rendered":"<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/dgway.com\/blog_E\/wp-content\/uploads\/2022\/12\/\u0e2a\u0e44\u0e25\u0e14\u0e4c2-1024x576.png\" alt=\"\" class=\"wp-image-933\" srcset=\"https:\/\/dgway.com\/blog_E\/wp-content\/uploads\/2022\/12\/\u0e2a\u0e44\u0e25\u0e14\u0e4c2-1024x576.png 1024w, https:\/\/dgway.com\/blog_E\/wp-content\/uploads\/2022\/12\/\u0e2a\u0e44\u0e25\u0e14\u0e4c2-300x169.png 300w, https:\/\/dgway.com\/blog_E\/wp-content\/uploads\/2022\/12\/\u0e2a\u0e44\u0e25\u0e14\u0e4c2-768x432.png 768w, https:\/\/dgway.com\/blog_E\/wp-content\/uploads\/2022\/12\/\u0e2a\u0e44\u0e25\u0e14\u0e4c2.png 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">2D Perlin Noise Generator Accelerator <br><a href=\"https:\/\/github.com\/dg-hpcdev\/alveo-simple-examples\/tree\/main\/02_perlin\" data-type=\"URL\" data-id=\"https:\/\/github.com\/dg-hpcdev\/alveo-simple-examples\/tree\/main\/02_perlin\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/dg-hpcdev\/alveo-simple-examples\/tree\/main\/02_perlin<\/a><\/figcaption><\/figure>\n<\/div>\n\n\n<p>This is simple example that generate 2D perlin noise in FPGA and write the values to a file as python style lists. The main aim of this example is to introduce the DATAFLOW pragma which enable task level parallelism. However, the main purpose of using DATAFLOW pragma is to show how writing HLS C++ code for hardware generally requires a diffrent style of programming.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>void perlin(int nx, int ny, float* result, const float freq)\n{\n#pragma HLS INTERFACE m_axi port=result\n#pragma HLS DATAFLOW\n    \n    hls::stream&lt;Coord&gt; coord_stream;\n    hls::stream&lt;float&gt; noise_stream;\n\n    const int N = nx*ny;\n\n    gen_coord(coord_stream, nx, ny, freq);\n    perlin_calc(coord_stream, noise_stream, N);\n    write_mem(noise_stream, result, N);\n}<\/code><\/pre>\n\n\n\n<p>In the code snippet above, it might seem like&nbsp;<code>gen_coord<\/code>&nbsp;is executed and then&nbsp;<code>perlin_calc<\/code>&nbsp;and&nbsp;<code>write_mem<\/code>&nbsp;respectively. In real hardware, those three functions are actually executed concurrently with data streams (<code>coord_stream<\/code>&nbsp;and&nbsp;<code>noise_stream<\/code>) connecting them. The data &#8216;flows&#8217; between those 3 tasks, thus the name DATAFLOW.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>gen_coord<\/code>&nbsp;generate x and y coordiate and write them to&nbsp;<code>coord_stream<\/code>&nbsp;one by one.<\/li>\n\n\n\n<li><code>perlin_calc<\/code>&nbsp;reads coordinates from&nbsp;<code>coord_stream<\/code>&nbsp;and writes generated noise values to&nbsp;<code>noise_stream<\/code>.<\/li>\n\n\n\n<li><code>write_mem<\/code>&nbsp;reads noise values from&nbsp;<code>noise_stream<\/code>&nbsp;and writes them to memory.<\/li>\n<\/ul>\n\n\n\n<p>For more complex dataflows, the flow must be carefully designed to prevent any potential deadlocks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setup<\/h2>\n\n\n\n<p>This must be done everytime a new terminal is opened<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>source \/opt\/xilinx\/xrt\/setup.sh\n# Replace &lt;Vitis install path&gt; and &lt;vesion&gt;\nsource &lt;Vitis install path&gt;\/Vitis\/&lt;version&gt;\/settings64.sh\nexport PLATFORM_REPO_PATHS=\/opt\/xilinx\/platforms\n# Change to appropiate platform\nexport PLATFORM=xilinx_u250_gen3x16_xdma_4_1_202210_1<\/code><\/pre>\n\n\n\n<p>For software emulation:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>export XCL_EMULATION_MODE=sw_emu\ncd sw_emu<\/code><\/pre>\n\n\n\n<p>For hardware emulation:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>export XCL_EMULATION_MODE=hw_emu\ncd hw_emu<\/code><\/pre>\n\n\n\n<p>For hardware:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>cd hw<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Build Host Software<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>g++ -Wall -g -std=c++11 ..\/src\/host.cpp -o perlin -I${XILINX_XRT}\/include\/ -L${XILINX_XRT}\/lib\/ -lOpenCL -pthread -lrt -lstdc++<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Build And Link Kenel<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># Replace ${TARGET} with \/ Set TARGET to:\n#  sw_emu if targeting software emulation\n#  hw_emu if targeting hardware emulation\n#  hw if targeting hardware\nv++ -c -t ${TARGET} --platform ${PLATFORM} --config ..\/src\/perlin.cfg -k perlin -I..\/src ..\/src\/perlin.cpp -o perlin.xo\nv++ -l -t ${TARGET} --platform ${PLATFORM} --config ..\/src\/perlin.cfg .\/perlin.xo -o perlin.xclbin<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Configure Emulator<\/h2>\n\n\n\n<p>Only when targeting software\/hardware emulation<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>emconfigutil --platform ${PLATFORM} --nd 1<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Run the host software<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\"><code># Replace &lt;W&gt; with width, &lt;H&gt; with height, and &lt;freq&gt; with frequency\n.\/perlin perlin.xclbin &lt;W&gt; &lt;H&gt; &lt;freq&gt;<\/code><\/pre>\n\n\n\n<p>After running, a text file&nbsp;<code>perlin_hls.txt<\/code>&nbsp;will be generated and contains lists of generated noise value.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is simple example that generate 2D perlin noise in FPGA and write the values to a file as python style lists. The main aim of this example is to introduce the DATAFLOW pragma which enable task level parallelism. However, the main purpose of using DATAFLOW pragma is to show how writing HLS C++ code for hardware generally requires a&#46;&#46;&#46;<\/p>\n","protected":false},"author":1,"featured_media":932,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[],"class_list":["post-922","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hls-development-series"],"_links":{"self":[{"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/posts\/922","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/comments?post=922"}],"version-history":[{"count":5,"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/posts\/922\/revisions"}],"predecessor-version":[{"id":947,"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/posts\/922\/revisions\/947"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/media\/932"}],"wp:attachment":[{"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/media?parent=922"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/categories?post=922"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dgway.com\/blog_E\/wp-json\/wp\/v2\/tags?post=922"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}