Auto Vectorization

 I'm working on a blog post about the concept of automatic vectorization in parallel computing, which can help reduce the number of cycles and chains in loops. It means that instead of scalar implementation, code can be converted to perform vector operations, which means a single operation can be performed on multiple operands. The GCC compiler is sophisticated enough to use vectorization to optimise code and improve performance.

It is possible to accomplish this by using optimization flags such as -03 and -ftree-vectorize. AArch64 supports three extensions: SMID (Single Instruction, Multi Data), SVE, and SVE2.

SVE  and SVE2

Scalable Vector Extension 2 (SVE2) is an armv9 extension that provides variable-width SMID capability. The main difference between SVE2 and SVE is the functional coverage of the instruction set. SVE and SVE2 both allow for large amounts of data to be collected and processed.

For an aarch64 system, run the following command to enable SVE:

gcc -g -O3 -c march=armv8-a+sve ...

And, the following command can be used to access SVE2: 

gcc -g -O3 -c march=armv8-a+sve2 ...


SMID

A type of parallel processing is Single Instruction, Multi Data. SIMD refers to computers that have multiple processing elements that perform the same operation on multiple data points at the same time.

For the armv8 system, we can use the following commands to build the programme:

march=armv8-a gcc -g -O3 -c



Comments

Popular posts from this blog

Lab - 1 Two Open-Source project reviews

Lab 5 - Algorithm Selection Lab