Quick StartΒΆ
SIMDy provides several built-in data types for performing computations inside of a kernel. Bellow is a list of all supported data types, both scalar and vector types.
int32
,int64
- integer scalar typesfloat32
,float64
- real scalar typesint32x2
,int32x3
,int32x4
,int32x8
,int32x16
- 32-bit integer vector typesint64x2
,int64x3
,int64x4
,int64x8
- 64-bit integer vector typesfloat32x2
,float32x3
,float32x4
,float32x8
,float32x16
- 32-bit reals vector typesfloat64x2
,float64x3
,float64x4
,float64x8
- 64-bit reals vector types
In addition, you may create arrays and user defined types. The easiest way to start using SIMDy is through @simdy_kernel
decorator. We will
start with somewhat a trivial example.
from simdy import simdy_kernel, int32
@simdy_kernel
def add(a: int32, b: int32) -> int32:
return a + b
result = add(int32(33), int32(-5))
print(result)
In above example important thing to notice is how we call add
function. Kernels are strongly typed and can only accept data types that
are listed above. If we try to call function using add(33, -5)
it will cause an error. Inside of kernel you can call another kernel.
from simdy import simdy_kernel, float64
@simdy_kernel
def sqr(x: float64) -> float64:
return x * x
@simdy_kernel
def distance(x1: float64, y1: float64, x2: float64, y2: float64) -> float64:
return sqrt(sqr(x2 - x1) + sqr(y2 - y1))
print(distance(float64(0.5), float64(0.4), float64(0.3), float64(0.8)))
Main usage of SIMDy is when you have some heavy computations, like monte carlo simulations, rendering, fluid simulations, machine learning and others. In our next example we will try to calculate pi using monte carlo techniques.
from simdy import simdy_kernel, int64, float64
@simdy_kernel
def calculate_pi(n_samples: int64) -> float64:
inside = int64(0)
for i in range(n_samples):
x = 2.0 * random_float64() - 1.0
y = 2.0 * random_float64() - 1.0
if x * x + y * y < 1.0:
inside += 1
result = 4.0 * float64(inside) / float64(n_samples)
return result
print(calculate_pi(int64(100_000_000)))
SIMDy has a number of functions that are always available inside of a kernel like random_float64
, min
, max
,
abs
, dot
, etc. We can also see that all type of conversions must be done explicitly. Main reasons to use SIMDy is
to achieve high performance, so to compare execution times, same algorithm is implemented in pure Python and C++. SIMDy version
was about 28 times faster from python version and more than 2 times faster from C++ version. One of main advantages is
support for vector data types, so in next example same algorithm will be implemented, but using vector types. In each iteration
of a loop, four samples will be calculated insted of one.
from simdy import simdy_kernel, int64, float64
@simdy_kernel
def calculate_pi2(n_samples: int64) -> float64:
inside = int64x4(0)
for i in range(n_samples):
r1 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
r2 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
x = float64x4(2.0) * r1 - float64x4(1.0)
y = float64x4(2.0) * r2 - float64x4(1.0)
inside += select(int64x4(1), int64x4(0), x * x + y * y < float64x4(1.0))
nn = inside[0] + inside[1] + inside[2] + inside[3]
result = 4.0 * float64(nn) / float64(n_samples * 4)
return result
We can further increase performance by using multiple cpu cores.
from multiprocessing import cpu_count
@simdy_kernel(nthreads=cpu_count())
def calculate_pi3(n_samples: int64) -> float64:
inside = int64x4(0)
for i in range(n_samples):
r1 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
r2 = float64x4(random_float64(), random_float64(), random_float64(), random_float64())
x = float64x4(2.0) * r1 - float64x4(1.0)
y = float64x4(2.0) * r2 - float64x4(1.0)
inside += select(int64x4(1), int64x4(0), x * x + y * y < float64x4(1.0))
nn = inside[0] + inside[1] + inside[2] + inside[3]
result = 4.0 * float64(nn) / float64(n_samples * 4)
return result
result = calculate_pi3(int64(25_000_000))
print(sum(result) / cpu_count())
SIMDy also supports arrays. In the next example two arrays will be created, arrays will hold some image data, using SIMDy conversion from sRGB to XYZ colorspace will be shown.
from simdy import simdy_kernel, float64x3, array_float64x3
@simdy_kernel
def random_pixels(arr: array_float64x3):
for i in range(len(arr)):
arr[i] = float64x3(random_float64(), random_float64(), random_float64())
@simdy_kernel
def convert_srgb_to_xyz(in_img: array_float64x3, out_img: array_float64x3):
x = float64x3(0.4124, 0.3576, 0.1805)
y = float64x3(0.2126, 0.7152, 0.0722)
z = float64x3(0.0193, 0.1192, 0.9505)
for i in range(len(out_img)):
srgb = in_img[i]
u = srgb / float64x3(12.92)
v = exp(2.4 * log((srgb + float64x3(0.055)) / float64x3(1.055)))
c = select(u, v, srgb <= float64x3(0.04045))
out_img[i] = float64x3(dot(c, x), dot(c, y), dot(c, z))
width = 1200
height = 1200
input_img = array_float64x3(size=width * height)
# input image, we put some random pixels inside
random_pixels(input_img)
output_img = array_float64x3(size=width * height)
convert_srgb_to_xyz(input_img, output_img)