Posts

Showing posts from December, 2022

SPO600 - Project Stage 3

 I used the Python tool to modify existing functions for different architectures, and it calls the GCC compiler and passes the necessary flags to enable auto vectorization for that specific code. In the final version, I implemented solution and resolves to all the limitation listed in my previous blogs. However, I was unable to overcome the permissible constraints.  To summarize, auto vectorization is a useful technique for improving the performance of program on ARM CPUs that use SVE, SVE2, and ASIMD instructions. Auto vectorization in the C programming language can be enabled by using the GCC compiler and the appropriate flags. This can help to optimize program and take advantage of the SIMD capabilities of modern ARM processors. I made changes to the tool's final version. The tool can be downloaded from  https://github.com/puja1102/SPO600Project . Improvements: The previous implementation's first limitation was that it did not support exception handling. The tool's final

SPO600 - Project Stage 2

Introduction: This tool generates three versions of the function specified as the second argument. Then, while building the main output file from these functions, this tool employs the ifunc capability to select the best method.  The tool was written in Python, while the ifunc resolver function is written in C. In this blog, I will explain the tool's working mechanism and limitations, as well as the procedure for testing the tool. Accessing the tool: The tool is hosted on GitHub. The tool is available for download at https://github.com/puja1102/SPO600Project . The GPL version 2 licence governs the use of this tool. The tool's main files are as follows: tool.py: the main tool written in Python that builds the functions and generates the output binary files using the best SIMD implementation. template.txt: This is a text file containing the template for the ifunc function, as well as the resolver function for the ifunc. The ifunc.c file is created using this template. This tool

Auto Vectorization

 I'm working on a blog post about the concept of automatic vectorization in parallel computing, which can help reduce the number of cycles and chains in loops. It means that instead of scalar implementation, code can be converted to perform vector operations, which means a single operation can be performed on multiple operands. The GCC compiler is sophisticated enough to use vectorization to optimise code and improve performance. It is possible to accomplish this by using optimization flags such as -03 and -ftree-vectorize. AArch64 supports three extensions: SMID (Single Instruction, Multi Data), SVE, and SVE2. SVE  and SVE2 Scalable Vector Extension 2 (SVE2) is an armv9 extension that provides variable-width SMID capability. The main difference between SVE2 and SVE is the functional coverage of the instruction set. SVE and SVE2 both allow for large amounts of data to be collected and processed. For an aarch64 system, run the following command to enable SVE: gcc -g -O3 -c march=arm

SPO600 - Project Stage 1

 In this blog, I'll go over the first stage of my SP0600 project. The first stage is dedicated to project planning. In this project, we must develop a proof-of-concept tool for creating functions using automatic vectorization. The primary goal of this project is to eliminate all of the setup performed by software developers and to automate the process through the creation of a tool. This tool allows developers to create three versions of a function, and the compiler will select one of them at runtime to generate a single output file. The goal of this project is to create a proof-of-concept tool that will take code that meets certain criteria and automatically build it with the ifunc capability to choose between multiple, autovectorized versions of a function, allowing the code to take advantage of the best SIMD implementation available on the CPU on which it is running.  The limitations of this project are: The tool is only compatible with the aarch64 system. There are only three S