NumPy Ufunc Creation — Core Concepts

Why this topic matters

NumPy ships around 60 built-in ufuncs, but real projects often need custom element-wise operations — unit conversions, domain-specific formulas, or piecewise functions. Knowing how to create ufuncs properly means your custom functions get broadcasting, type casting, reduction, and output arguments for free, instead of reimplementing those features by hand.

The three approaches

1. np.frompyfunc — quickest to write

import numpy as np

def clamp(x, lo, hi):
    return min(max(x, lo), hi)

clamp_ufunc = np.frompyfunc(clamp, 3, 1)
result = clamp_ufunc(np.array([1, 5, 10, 15]), 3, 12)
print(result)  # [3 5 10 12]

Pros: Works with any Python function. Supports multiple inputs and outputs. Cons: Returns object arrays (not float/int). Calls Python per element — slow for large arrays.

2. np.vectorize — cleaner interface

@np.vectorize
def fahrenheit_to_celsius(f):
    return (f - 32) * 5 / 9

temps = np.array([32, 72, 100, 212])
print(fahrenheit_to_celsius(temps))  # [0. 22.22 37.78 100.]

vectorize infers the output dtype from the first element and returns a proper typed array. It also supports the signature parameter for operations on sub-arrays.

Pros: Returns properly typed arrays. Supports excluded parameters and signatures. Cons: Still calls Python per element — the docs explicitly warn “not for performance.”

3. Numba @vectorize — compiled speed

from numba import vectorize, float64

@vectorize([float64(float64, float64)])
def fast_hypotenuse(a, b):
    return (a**2 + b**2)**0.5

result = fast_hypotenuse(np.random.randn(1_000_000), np.random.randn(1_000_000))

Numba compiles the function to machine code and registers it as a genuine NumPy ufunc. It supports broadcasting, reduction, and even GPU execution with target='cuda'.

Pros: True C-level speed. Full ufunc protocol support. Cons: Requires Numba. Function body must use Numba-compatible operations only.

Ufunc features your custom function inherits

Once registered as a ufunc (especially via Numba), your function gains:

  • Broadcasting: Inputs of different shapes are automatically aligned.
  • Type casting: NumPy converts input dtypes to match your function’s signature.
  • out parameter: Callers can provide a pre-allocated output array to avoid allocation.
  • .reduce(): Fold the function across an axis (like np.add.reduce is sum).
  • .accumulate(): Running fold (like np.add.accumulate is cumsum).
  • .at(): Unbuffered operation at specific indices.

Common misconception

Developers often think np.vectorize makes their function fast. It does not — it is a convenience wrapper, not a compiler. The name is misleading. For actual vectorized performance, you need either pure NumPy expressions (which use pre-compiled C loops) or a JIT compiler like Numba.

When to use each

ScenarioBest choice
Quick prototype, small datanp.frompyfunc or np.vectorize
Production, large arraysNumba @vectorize
Complex logic with branchingNumba @vectorize or @guvectorize
Need GPU accelerationNumba @vectorize(target='cuda')
Cannot install NumbaRewrite as pure NumPy expression

The one thing to remember: np.vectorize is convenience, not speed — for fast custom ufuncs, you need Numba or pure NumPy expressions.

pythonnumpydata-science

See Also