SIMDy¶
Blazingly fast computing in Python¶
SIMDy allow you to write high performance kernels directly from Python. A number of Python features can be used inside of a kernel. They include expressions, functions, user-defined classes, conditionals, arrays and loops. To achieve maximum possible performance SIMDy supports vector data types which allows CPU vector unit utilization.
Projects:¶
SIMDy main features are:¶
- automatic generation of AVX-512, FMA, AVX2, AVX, SSE instructions
- high performance
- on-the-fly code generation
- native code generation
- easy to write vectorized code
In example below bellow Monte Carlo simulation (MCS) is applied to Black-Scholes-Merton model. If your CPU have AVX-512 support this example is 4 times faster than C++ implementation.
import time
from multiprocessing import cpu_count
from simdy import simdy_kernel, float64x8, float64, int64
# Option Parameters
S0 = 105.00 # initial index level
K = 100.00 # strike price
T = 1. # call option maturity
r = 0.05 # constant short rate
vola = 0.25 # constant volatility factor of diffusion
NSamples = 10000000 # NSamples * 8 * cpu_count() samples is processed
@simdy_kernel
def rnd_gaus() -> float64x8:
u = random_float64x8()
r = sqrt(-2.0 * log(u))
theta = 2.0 * 3.141592653589793 * random_float64x8()
y = r * sin(theta)
return y
@simdy_kernel(nthreads=cpu_count())
def black_scholes(S0: float64, K: float64, T: float64, r: float64, vola: float64, n: int64) -> float64:
val = float64x8((r - 0.5 * vola * vola) * T)
val2 = float64x8(sqrt(T) * vola)
K8 = float64x8(K)
C0 = float64x8(0.0)
zero = float64x8(0)
for i in range(n):
ST = S0 * exp(val + val2 * rnd_gaus())
C0 += max(ST - K8, zero)
C0 = C0 * exp(float64x8(-r * T))
result = C0[0] + C0[1] + C0[2] + C0[3] + C0[4] + C0[5] + C0[6] + C0[7]
result = result / float64(n*8)
return result
start = time.clock()
val = black_scholes(float64(S0), float64(K), float64(T), float64(r), float64(vola), int64(NSamples))
val = sum(val) / cpu_count()
end = time.clock()
print("Execution time ", end - start)
print("Result ", val)