Workshop

SIMD: efficient, readable, and portable

Date: April 26, 2023
Schedule: 9.00 - 18.30
Language: English
Place: Campus Puerta de Toledo
Price: 320 €
Deadline to enroll: April 19th 2023

Registration includes:

Attendance to the workshop with coffee break and cocktail lunch
~~Free attendance at the Using std::cpp 2023 event on April 27 and 28s for the first 20 registered~~

SIMD

Using hardware resources of a single core efficiently is a prerequisite to efficient multi-threaded codes. Hardware vendors are improving CPU instructionlevel parallelism (ILP) capabilities with almost every new revision.

One of the most significant features for ILP efficiency are SIMD registers and instructions.

All but the smallest embedded CPUs have SIMD registers and instructions. Efficient use of computing resources is more cost and energy efficient than simply buying more and larger computing systems.

Data-parallelism
SIMD on modern CPUs
Additional parallelism at the instruction level
How SIMD relates to memory bandwidth
The different programming models in C++ for explicitly expressing data-parallelism
Type-based vectorization
Data-structure vectorization
Array of struct (AoS), vs. struct of array (SoA), vs. array of vectorized struct (AovS

Trainer

Dr. Matthias Kretz

GSI Helmholtz Center for Heavy Ion Research
C++ Committee, Numerics Chair

Matthias Kretz began programming in primary school and got serious with C++ when he joined the development of the KDE 2 Desktop in its Alpha stages. Working on different GUI applications at first; moving on to library work all over the KDE core infrastructure. For the first release of the KDE Plasma Desktop (4.0) he developed the multimedia subsystem and became the first external contributor to Qt.

At the same time he finished his studies in Physics in Heidelberg, Germany. For his thesis he worked on porting parts of the online-reconstruction software for the ALICE experiment at CERN to the Intel Larrabee GPU1. This motivated his work on SIMD and Vc, an abstraction for expressing dataparallelism via the type system. Vc was the first solution of this kind released as free software2.

His PhD in computer science at the University of Frankfurt was a continuation of his SIMD work, developing higher level abstractions and vectorization of challenging problems. Matthias has been contributing his SIMD work and his expertise in HPC and scientific computing in the C++ committee since 2013. Since 2022 he is chair of SG6 Numerics of the C++ committee. He is also a contributor to GCC and has founded and chaired C++ User Groups at Frankfurt Institute for Advanced Studies and at GSI Helmholtz Center for Heavy Ion Research.

Meeting point for C++ developers and as exchange forum of ideas and experiences around the use of the C++ programming language
Using std::cpp 2023 Event

What will you learn?

You may have read stories about “factors of speedup” achieved over traditional implementations using SIMD features of the CPU. In this tutorial we will develop an understanding of the hardware features that allow such significant speed-ups.
More importantly we will practice how to design data structures and algorithms in order to speed up our own codes.

Most of the code that is written is expressed as serial loops (or algorithms) over given data. While many of our problems are inherently (data-) parallel, we typically resort to control structures that imply serial execution.

At this point, the use of multithreading allows a speedup. The situation on each individual core, however, is not improved. How can C++ help you express data-parallelism so that it can be optimized on each thread/core?

Experience required

This tutorial targets intermediate–advanced level C++ users. It is helpful to know a few basics about assembly; but it is no prerequisite for this tutorial.

Environment

A laptop with a web browser and access to the Internet.