scalar.astype(dst) silently fails for scalars belonging to parametric
new-style user-defined dtypes (introduced in NumPy 2.0). The intermediate
array NumPy creates internally uses the dtype's default descriptor rather
than the descriptor of the scalar instance, so setitem either rejects it or
stores wrong data. The fix is a one-function change to PyArray_DescrFromScalar
in numpy/_core/src/multiarray/scalarapi.c.
NumPy 2.0 introduced a new C-level dtype API (NPY_DT_* slots). A dtype
registered with this API can be either:
- Non-parametric — only one possible descriptor exists (e.g.
np.float64). The descriptor is the singletondtype.singleton. - Parametric — each descriptor instance carries parameters (e.g.
QuadPrecDType(backend='sleef')vsQuadPrecDType(backend='longdouble')). There is no meaningful singleton, anddefault_descrmust return some arbitrary stub or raise.
Two slots are relevant here:
| Slot | Purpose |
|---|---|
NPY_DT_default_descr |
Return a default descriptor when only the DTypeMeta class is available (no instance, no scalar). |
NPY_DT_discover_descr_from_pyobject |
Return the correct descriptor for a specific Python object (e.g. extract parameters from a scalar). |
When scalar.astype(dst) is called on a np.generic subclass, execution
flows through:
scalar.astype(dst)
→ gentype_astype() scalartypes.c.src
→ gentype_generic_method(self, ..., "astype")
arr = PyArray_FromScalar(self, NULL) scalarapi.c:217
typecode = PyArray_DescrFromScalar(self)
descr = PyArray_DescrFromTypeObject(type(self))
DType = PyArray_DiscoverDTypeFromScalarType(type(self))
return PyArray_GetDefaultDescr(DType) ← calls default_descr
meth = arr.astype ← arr has the wrong dtype
return meth(dst, ...)
gentype_generic_method (the shared backend for all scalar methods that
delegate to the array equivalent) converts the scalar to a 0-d array with
PyArray_FromScalar(self, NULL). That calls PyArray_DescrFromScalar, which
for new-style user dtypes ends up in PyArray_DescrFromTypeObject, which
calls PyArray_GetDefaultDescr — i.e. default_descr — without the scalar
instance. For non-parametric dtypes the default descriptor is always
correct. For parametric dtypes it is an arbitrary stub.
numpy-quaddtype implements QuadPrecDType, a parametric user dtype with two
backends: 'sleef' (128-bit quad via the SLEEF library) and 'longdouble'
(the platform long double, 80-bit on x86). Its default_descr always
returns a SLEEF-backend descriptor.
When a longdouble-backend scalar calls the inherited .astype(np.float64):
PyArray_FromScalarcallsPyArray_DescrFromScalar, which callsdefault_descrand gets a SLEEF-backend descriptor.- A 0-d array with SLEEF backend is created.
quadprec_setitemstores the scalar into the SLEEF array, converting thelongdoublevalue to SLEEF 128-bit format in the process..astype(np.float64)then casts from the SLEEF-converted value, not from the originallongdoublevalue.
The bug is silent: setitem does not enforce backend equality, so no
exception is raised, but the scalar's backend parameter has been silently
discarded. On x86 where longdouble is 80-bit and SLEEF is 128-bit, the
conversion happens to be lossless (extension, not truncation), masking the
bug in practice — but the code path is still wrong, and on platforms where
longdouble is a native 128-bit type the result could differ.
PyArray_DescrFromScalar already has hand-written special cases for void,
datetime64, and timedelta64 (all parametric legacy dtypes). New-style
parametric dtypes have no such case and fall through to the broken path.
In convert_datatype.c, the object→parametric-DType cast path explicitly
raises TypeError before reaching default_descr:
// convert_datatype.c ~3365
if (NPY_DT_is_parametric(dtypes[1]) && dtypes[1] != &PyArray_StringDType) {
PyErr_Format(PyExc_TypeError,
"casting from object to the parametric DType %S requires "
"the specified output dtype instance...", dtypes[1]);
return -1;
}
loop_descrs[1] = NPY_DT_CALL_default_descr(dtypes[1]);A comment on the adjacent void→X path says: "Possibly this should simply raise for all parametric DTypes."
numpy/_core/src/multiarray/scalarapi.c, function PyArray_DescrFromScalar.
After the existing datetime/timedelta special cases and before the
fallthrough to PyArray_DescrFromTypeObject, the following block was added:
/*
* For new-style user-defined dtypes, PyArray_DescrFromTypeObject would call
* PyArray_GetDefaultDescr (i.e. the NPY_DT_default_descr slot), which for
* parametric dtypes returns an arbitrary stub descriptor that does not reflect
* the scalar's actual parameters. Use discover_descr_from_pyobject instead,
* which is given the scalar instance and can extract the correct descriptor
* from it. For non-parametric user dtypes, discover_descr_from_pyobject
* falls back to default_descr anyway, so behaviour is unchanged.
*/
{
PyObject *DType_obj = PyArray_DiscoverDTypeFromScalarType(Py_TYPE(sc));
if (DType_obj != NULL) {
PyArray_DTypeMeta *DType = (PyArray_DTypeMeta *)DType_obj;
if (!NPY_DT_is_legacy(DType)) {
PyArray_Descr *result =
NPY_DT_CALL_discover_descr_from_pyobject(DType, sc);
Py_DECREF(DType_obj);
return result;
}
Py_DECREF(DType_obj);
}
}This block sits immediately before the existing line:
descr = PyArray_DescrFromTypeObject((PyObject *)Py_TYPE(sc));PyArray_DiscoverDTypeFromScalarTypeis a registry lookup; it returnsNULLfor types not registered as user dtypes, so the legacy path is unchanged.- The
!NPY_DT_is_legacyguard restricts the new path to new-style dtypes only, leaving thedatetime64/timedelta64special cases above untouched. NPY_DT_CALL_discover_descr_from_pyobjectis the correct API for "given this object, what dtype does it have?". For non-parametric user dtypes it callsdtypemeta_discover_as_defaultwhich callsdefault_descr, so behaviour is unchanged. For parametric user dtypes it extracts the actual parameters from the scalar instance.
Once this NumPy fix is released, parametric user dtypes that currently override
astype on their scalar class to work around this bug (passing the scalar's
own descriptor explicitly to np.array()) can remove the override and rely on
the inherited np.generic.astype. The override should be kept until the
minimum supported NumPy version includes the fix, with a comment explaining
why.
For dtypes like numpy-quaddtype that do not override astype but are
affected silently (wrong backend used during the intermediate array creation),
the fix corrects the behaviour automatically with no code changes required on
their side.
Tests were added to numpy/_core/tests/test_custom_dtypes.py as class
TestDescrFromScalarNewStyleDType.
The built-in _ScaledFloatTestDType (SF) is a parametric new-style dtype, but
its scalar_type field is NULL. Its getitem function (sfloat_getitem)
returns a plain Python float, not a np.generic subclass. Consequently:
arr[0]returns a Pythonfloat, which has no.astype()method.PyArray_DiscoverDTypeFromScalarType(float)returnsfloat64, a legacy dtype, so the!NPY_DT_is_legacyguard skips the new code block.gentype_astype/PyArray_FromScalarare never called for SF scalars.
The tests that were written are non-regression tests (verifying that SF
array astype to float64 and between different scalings continues to give
correct results). They do not constitute a regression test for the actual bug.
A proper regression test needs a parametric new-style dtype that:
- Registers a custom
np.genericsubclass as its scalar type (scalar_typefield in the dtype proto). - Implements
discover_descr_from_pyobjectto return a descriptor whose parameters depend on the specific scalar instance.
numpy-quaddtype satisfies both conditions and is the natural home for such a
test. Until it (or equivalent infrastructure) is available as a test
dependency, the regression is only verifiable by manual testing with
numpy-quaddtype.
class TestDescrFromScalarNewStyleDType:
def test_0d_sfloat_astype_to_float64(self): ...
def test_0d_sfloat_astype_to_sfloat_preserves_value(self): ...
def test_sfloat_array_astype_to_float64_various_scalings(self): ...- Fix in
numpy/_core/src/multiarray/scalarapi.c(PyArray_DescrFromScalar) - [~] Test covering parametric user dtype scalar →
astyperound-trip (non-regression tests added; true regression test blocked by SF lacking a registered scalar type — see limitation note above) - [~] Test covering that the correct descriptor is used (not the stub default) (same limitation)
- Changelog / release note entry
- Verify that
datetime64andtimedelta64scalarastypebehaviour is unchanged (existing tests cover this; the!NPY_DT_is_legacyguard ensures the new block is not entered for these dtypes) - Verify that non-parametric user dtypes are unaffected (for such dtypes
discover_descr_from_pyobjectfalls back todefault_descr, identical behaviour to before)