Quick StartΒΆ

SIMDy provides several built-in data types for performing computations inside of a kernel. Bellow is a list of all supported data types, both scalar and vector types.

  • int32, int64 - integer scalar types
  • float32, float64 - real scalar types
  • int32x2, int32x3, int32x4, int32x8, int32x16 - 32-bit integer vector types
  • int64x2, int64x3, int64x4, int64x8 - 64-bit integer vector types
  • float32x2, float32x3, float32x4, float32x8, float32x16 - 32-bit reals vector types
  • float64x2, float64x3, float64x4, float64x8 - 64-bit reals vector types

In addition, you may create arrays and user defined types. The easiest way to start using SIMDy is through @simdy_kernel decorator. We will start with somewhat a trivial example.

from simdy import simdy_kernel, int32

@simdy_kernel
def add(a: int32, b: int32) -> int32:
    return a + b

result = add(int32(33), int32(-5))
print(result)

In above example important thing to notice is how we call add function. Kernels are strongly typed and can only accept data types that are listed above. If we try to call function using add(33, -5) it will cause an error. Inside of kernel you can call another kernel.

from simdy import simdy_kernel, float64

@simdy_kernel
def sqr(x: float64) -> float64:
    return x * x


@simdy_kernel
def distance(x1: float64, y1: float64, x2: float64, y2: float64) -> float64:
    return sqrt(sqr(x2 - x1) + sqr(y2 - y1))


print(distance(float64(0.5), float64(0.4), float64(0.3), float64(0.8)))

Main usage of SIMDy is when you have some heavy computations, like monte carlo simulations, rendering, fluid simulations, machine learning and others. In our next example we will try to calculate pi using monte carlo techniques.

from simdy import simdy_kernel, int64, float64

@simdy_kernel
def calculate_pi(n_samples: int64) -> float64:
    inside = int64(0)
    for i in range(n_samples):
        x = 2.0 * random_float64() - 1.0
        y = 2.0 * random_float64() - 1.0
        if x * x + y * y < 1.0:
            inside += 1
    result = 4.0 * float64(inside) / float64(n_samples)
    return result

print(calculate_pi(int64(100_000_000)))

SIMDy has a number of functions that are always available inside of a kernel like random_float64, min, max, abs, dot, etc. We can also see that all type of conversions must be done explicitly. Main reasons to use SIMDy is to achieve high performance, so to compare execution times, same algorithm is implemented in pure Python and C++. SIMDy version was about 28 times faster from python version and more than 2 times faster from C++ version. One of main advantages is support for vector data types, so in next example same algorithm will be implemented, but using vector types. In each iteration of a loop, four samples will be calculated insted of one.

from simdy import simdy_kernel, int64, float64

@simdy_kernel
def calculate_pi2(n_samples: int64) -> float64:
    inside = int64x4(0)
    for i in range(n_samples):
        r1 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
        r2 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
        x = float64x4(2.0) * r1 - float64x4(1.0)
        y = float64x4(2.0) * r2 - float64x4(1.0)
        inside += select(int64x4(1), int64x4(0), x * x + y * y < float64x4(1.0))

    nn = inside[0] + inside[1] + inside[2] + inside[3]
    result = 4.0 * float64(nn) / float64(n_samples * 4)
    return result

We can further increase performance by using multiple cpu cores.

from multiprocessing import cpu_count

@simdy_kernel(nthreads=cpu_count())
def calculate_pi3(n_samples: int64) -> float64:
    inside = int64x4(0)
    for i in range(n_samples):
        r1 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
        r2 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
        x = float64x4(2.0) * r1 - float64x4(1.0)
        y = float64x4(2.0) * r2 - float64x4(1.0)
        inside += select(int64x4(1), int64x4(0), x * x + y * y < float64x4(1.0))

    nn = inside[0] + inside[1] + inside[2] + inside[3]
    result = 4.0 * float64(nn) / float64(n_samples * 4)
    return result

result = calculate_pi3(int64(25_000_000))
print(sum(result) / cpu_count())

SIMDy also supports arrays. In the next example two arrays will be created, arrays will hold some image data, using SIMDy conversion from sRGB to XYZ colorspace will be shown.

from simdy import simdy_kernel, float64x3, array_float64x3


@simdy_kernel
def random_pixels(arr: array_float64x3):
    for i in range(len(arr)):
        arr[i] = float64x3(random_float64(), random_float64(), random_float64())


@simdy_kernel
def convert_srgb_to_xyz(in_img: array_float64x3, out_img: array_float64x3):
    x = float64x3(0.4124, 0.3576, 0.1805)
    y = float64x3(0.2126, 0.7152, 0.0722)
    z = float64x3(0.0193, 0.1192, 0.9505)

    for i in range(len(out_img)):
        srgb = in_img[i]
        u = srgb / float64x3(12.92)
        v = exp(2.4 * log((srgb + float64x3(0.055)) / float64x3(1.055)))
        c = select(u, v, srgb <= float64x3(0.04045))
        out_img[i] = float64x3(dot(c, x), dot(c, y), dot(c, z))


width = 1200
height = 1200
input_img = array_float64x3(size=width * height)
# input image, we put some random pixels inside
random_pixels(input_img)

output_img = array_float64x3(size=width * height)
convert_srgb_to_xyz(input_img, output_img)