Skip to content

Instantly share code, notes, and snippets.

@MaartenBaert
Created March 24, 2026 13:15
Show Gist options
  • Select an option

  • Save MaartenBaert/3cf4a926f139b0ec551d23bc5920ecc7 to your computer and use it in GitHub Desktop.

Select an option

Save MaartenBaert/3cf4a926f139b0ec551d23bc5920ecc7 to your computer and use it in GitHub Desktop.
NumPy PR: Fix `np.generic.astype` for parametric user-defined dtypes

NumPy PR: Fix np.generic.astype for parametric user-defined dtypes

Summary

scalar.astype(dst) silently fails for scalars belonging to parametric new-style user-defined dtypes (introduced in NumPy 2.0). The intermediate array NumPy creates internally uses the dtype's default descriptor rather than the descriptor of the scalar instance, so setitem either rejects it or stores wrong data. The fix is a one-function change to PyArray_DescrFromScalar in numpy/_core/src/multiarray/scalarapi.c.


Background: the new-style dtype API

NumPy 2.0 introduced a new C-level dtype API (NPY_DT_* slots). A dtype registered with this API can be either:

  • Non-parametric — only one possible descriptor exists (e.g. np.float64). The descriptor is the singleton dtype.singleton.
  • Parametric — each descriptor instance carries parameters (e.g. QuadPrecDType(backend='sleef') vs QuadPrecDType(backend='longdouble')). There is no meaningful singleton, and default_descr must return some arbitrary stub or raise.

Two slots are relevant here:

Slot Purpose
NPY_DT_default_descr Return a default descriptor when only the DTypeMeta class is available (no instance, no scalar).
NPY_DT_discover_descr_from_pyobject Return the correct descriptor for a specific Python object (e.g. extract parameters from a scalar).

The bug

Call chain

When scalar.astype(dst) is called on a np.generic subclass, execution flows through:

scalar.astype(dst)
  → gentype_astype()                          scalartypes.c.src
  → gentype_generic_method(self, ..., "astype")
      arr = PyArray_FromScalar(self, NULL)    scalarapi.c:217
        typecode = PyArray_DescrFromScalar(self)
          descr = PyArray_DescrFromTypeObject(type(self))
            DType = PyArray_DiscoverDTypeFromScalarType(type(self))
            return PyArray_GetDefaultDescr(DType)   ← calls default_descr
      meth = arr.astype                       ← arr has the wrong dtype
      return meth(dst, ...)

gentype_generic_method (the shared backend for all scalar methods that delegate to the array equivalent) converts the scalar to a 0-d array with PyArray_FromScalar(self, NULL). That calls PyArray_DescrFromScalar, which for new-style user dtypes ends up in PyArray_DescrFromTypeObject, which calls PyArray_GetDefaultDescr — i.e. default_descrwithout the scalar instance. For non-parametric dtypes the default descriptor is always correct. For parametric dtypes it is an arbitrary stub.

Concrete example

numpy-quaddtype implements QuadPrecDType, a parametric user dtype with two backends: 'sleef' (128-bit quad via the SLEEF library) and 'longdouble' (the platform long double, 80-bit on x86). Its default_descr always returns a SLEEF-backend descriptor.

When a longdouble-backend scalar calls the inherited .astype(np.float64):

  1. PyArray_FromScalar calls PyArray_DescrFromScalar, which calls default_descr and gets a SLEEF-backend descriptor.
  2. A 0-d array with SLEEF backend is created.
  3. quadprec_setitem stores the scalar into the SLEEF array, converting the longdouble value to SLEEF 128-bit format in the process.
  4. .astype(np.float64) then casts from the SLEEF-converted value, not from the original longdouble value.

The bug is silent: setitem does not enforce backend equality, so no exception is raised, but the scalar's backend parameter has been silently discarded. On x86 where longdouble is 80-bit and SLEEF is 128-bit, the conversion happens to be lossless (extension, not truncation), masking the bug in practice — but the code path is still wrong, and on platforms where longdouble is a native 128-bit type the result could differ.

Why the existing special cases don't help

PyArray_DescrFromScalar already has hand-written special cases for void, datetime64, and timedelta64 (all parametric legacy dtypes). New-style parametric dtypes have no such case and fall through to the broken path.

NumPy already knows this is wrong

In convert_datatype.c, the object→parametric-DType cast path explicitly raises TypeError before reaching default_descr:

// convert_datatype.c ~3365
if (NPY_DT_is_parametric(dtypes[1]) && dtypes[1] != &PyArray_StringDType) {
    PyErr_Format(PyExc_TypeError,
        "casting from object to the parametric DType %S requires "
        "the specified output dtype instance...", dtypes[1]);
    return -1;
}
loop_descrs[1] = NPY_DT_CALL_default_descr(dtypes[1]);

A comment on the adjacent void→X path says: "Possibly this should simply raise for all parametric DTypes."


The fix

Location

numpy/_core/src/multiarray/scalarapi.c, function PyArray_DescrFromScalar.

What was changed

After the existing datetime/timedelta special cases and before the fallthrough to PyArray_DescrFromTypeObject, the following block was added:

/*
 * For new-style user-defined dtypes, PyArray_DescrFromTypeObject would call
 * PyArray_GetDefaultDescr (i.e. the NPY_DT_default_descr slot), which for
 * parametric dtypes returns an arbitrary stub descriptor that does not reflect
 * the scalar's actual parameters.  Use discover_descr_from_pyobject instead,
 * which is given the scalar instance and can extract the correct descriptor
 * from it.  For non-parametric user dtypes, discover_descr_from_pyobject
 * falls back to default_descr anyway, so behaviour is unchanged.
 */
{
    PyObject *DType_obj = PyArray_DiscoverDTypeFromScalarType(Py_TYPE(sc));
    if (DType_obj != NULL) {
        PyArray_DTypeMeta *DType = (PyArray_DTypeMeta *)DType_obj;
        if (!NPY_DT_is_legacy(DType)) {
            PyArray_Descr *result =
                NPY_DT_CALL_discover_descr_from_pyobject(DType, sc);
            Py_DECREF(DType_obj);
            return result;
        }
        Py_DECREF(DType_obj);
    }
}

This block sits immediately before the existing line:

descr = PyArray_DescrFromTypeObject((PyObject *)Py_TYPE(sc));

Why this is correct

  • PyArray_DiscoverDTypeFromScalarType is a registry lookup; it returns NULL for types not registered as user dtypes, so the legacy path is unchanged.
  • The !NPY_DT_is_legacy guard restricts the new path to new-style dtypes only, leaving the datetime64/timedelta64 special cases above untouched.
  • NPY_DT_CALL_discover_descr_from_pyobject is the correct API for "given this object, what dtype does it have?". For non-parametric user dtypes it calls dtypemeta_discover_as_default which calls default_descr, so behaviour is unchanged. For parametric user dtypes it extracts the actual parameters from the scalar instance.

Downstream effect on third-party parametric dtypes

Once this NumPy fix is released, parametric user dtypes that currently override astype on their scalar class to work around this bug (passing the scalar's own descriptor explicitly to np.array()) can remove the override and rely on the inherited np.generic.astype. The override should be kept until the minimum supported NumPy version includes the fix, with a comment explaining why.

For dtypes like numpy-quaddtype that do not override astype but are affected silently (wrong backend used during the intermediate array creation), the fix corrects the behaviour automatically with no code changes required on their side.


Tests added

Tests were added to numpy/_core/tests/test_custom_dtypes.py as class TestDescrFromScalarNewStyleDType.

Limitation: _ScaledFloatTestDType cannot exercise the fix directly

The built-in _ScaledFloatTestDType (SF) is a parametric new-style dtype, but its scalar_type field is NULL. Its getitem function (sfloat_getitem) returns a plain Python float, not a np.generic subclass. Consequently:

  • arr[0] returns a Python float, which has no .astype() method.
  • PyArray_DiscoverDTypeFromScalarType(float) returns float64, a legacy dtype, so the !NPY_DT_is_legacy guard skips the new code block.
  • gentype_astype / PyArray_FromScalar are never called for SF scalars.

The tests that were written are non-regression tests (verifying that SF array astype to float64 and between different scalings continues to give correct results). They do not constitute a regression test for the actual bug.

What a proper regression test requires

A proper regression test needs a parametric new-style dtype that:

  1. Registers a custom np.generic subclass as its scalar type (scalar_type field in the dtype proto).
  2. Implements discover_descr_from_pyobject to return a descriptor whose parameters depend on the specific scalar instance.

numpy-quaddtype satisfies both conditions and is the natural home for such a test. Until it (or equivalent infrastructure) is available as a test dependency, the regression is only verifiable by manual testing with numpy-quaddtype.

Tests that were written

class TestDescrFromScalarNewStyleDType:
    def test_0d_sfloat_astype_to_float64(self): ...
    def test_0d_sfloat_astype_to_sfloat_preserves_value(self): ...
    def test_sfloat_array_astype_to_float64_various_scalings(self): ...

Checklist

  • Fix in numpy/_core/src/multiarray/scalarapi.c (PyArray_DescrFromScalar)
  • [~] Test covering parametric user dtype scalar → astype round-trip (non-regression tests added; true regression test blocked by SF lacking a registered scalar type — see limitation note above)
  • [~] Test covering that the correct descriptor is used (not the stub default) (same limitation)
  • Changelog / release note entry
  • Verify that datetime64 and timedelta64 scalar astype behaviour is unchanged (existing tests cover this; the !NPY_DT_is_legacy guard ensures the new block is not entered for these dtypes)
  • Verify that non-parametric user dtypes are unaffected (for such dtypes discover_descr_from_pyobject falls back to default_descr, identical behaviour to before)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment