The maintainer (Sebastian Berg) suggests replacing most of PyArray_DescrFromScalar
with a two-step pattern:
- Get the
DTypeclass viaPyArray_DescrFromTypeObject(robust, handles subclasses) - Call
NPY_DT_CALL_discover_descr_from_pyobject(DType, sc)unconditionally
And "delete everything else" — meaning the special cases for Void, unsized string/unicode, and the new-style dtype block we added.
He also explains why PyArray_DescrFromTypeObject is preferable to
PyArray_DiscoverDTypeFromScalarType: the latter fails for subclasses of scalar
types (e.g., someone subclassing np.float64), while DescrFromTypeObject
handles these via MRO walking.
void_discover_descr_from_pyobject (dtypemeta.c:542) handles void scalars
correctly:
if (PyArray_IsScalar(obj, Void)) {
Py_INCREF(void_obj->descr);
return (PyArray_Descr *)void_obj->descr;
}Same behavior as the current special case. Safe to remove.
string_discover_descr_from_pyobject (dtypemeta.c:510) checks PyBytes_Check
and PyUnicode_Check and computes itemsize correctly. Since np.bytes_
subclasses bytes and np.str_ subclasses str, these isinstance checks pass.
Same effective behavior. Safe to remove.
Subsumed by the unconditional discover_descr_from_pyobject call. This was
our fix; the maintainer's approach generalizes it to all dtypes. Safe to remove.
Circular recursion: discover_datetime_and_timedelta_from_pyobject
(dtypemeta.c:572) calls back into PyArray_DescrFromScalar(obj):
// dtypemeta.c:577
PyArray_Descr *descr = PyArray_DescrFromScalar(obj);If we removed the datetime/timedelta special case from PyArray_DescrFromScalar,
the new general path would call discover_descr_from_pyobject, which calls
PyArray_DescrFromScalar, which calls discover_descr_from_pyobject → infinite
recursion, stack overflow.
The special case acts as a base case that breaks this recursion. It must stay.
Note: The maintainer may have been speaking aspirationally when they said "delete everything else." They acknowledged "I have to think about how that is best." The datetime recursion is a concrete blocker that prevents full deletion.
| Scalar type | Before | After | Changed? |
|---|---|---|---|
| Builtin non-parametric (float64, int32, ...) | DescrFromTypeObject → singleton |
DescrFromTypeObject → DType → discover → singleton |
No (same result) |
| Void | Special case → void_obj->descr |
DescrFromTypeObject → DType → void_discover → void_obj->descr |
No (same result) |
| Datetime/Timedelta | Special case (kept) | Special case (kept) | No |
| String/Unicode (unsized) | DescrFromTypeObject + manual elsize |
DescrFromTypeObject → DType → string_discover → sized descr |
No (same result) |
| New-style non-parametric user dtype | Our block → discover → default_descr |
DescrFromTypeObject → DType → discover → default_descr |
No |
| New-style parametric user dtype | Our block → discover → correct descr |
DescrFromTypeObject → DType → discover → correct descr |
No (same fix) |
| Scalar subclass (e.g., subclass of float64) | DescrFromTypeObject → _descr_from_subtype |
DescrFromTypeObject → DType → discover → singleton |
No (same base dtype) |
Performance: For builtin dtypes, the new path does one extra function call
(discover_descr_from_pyobject → nonparametric_discover_descr_from_pyobject
which just increfs and returns the singleton). Negligible. The maintainer
explicitly said "I wouldn't worry about it."
/*NUMPY_API
* Return descr object from array scalar.
*
* New reference
*/
NPY_NO_EXPORT PyArray_Descr *
PyArray_DescrFromScalar(PyObject *sc)
{
/*
* Datetime and timedelta must remain here because
* discover_datetime_and_timedelta_from_pyobject calls back into
* PyArray_DescrFromScalar — removing this would create infinite recursion.
*/
if (PyArray_IsScalar(sc, Datetime) || PyArray_IsScalar(sc, Timedelta)) {
PyArray_DatetimeMetaData *dt_data;
PyArray_Descr *descr;
if (PyArray_IsScalar(sc, Datetime)) {
descr = PyArray_DescrNewFromType(NPY_DATETIME);
}
else {
/* Timedelta */
descr = PyArray_DescrNewFromType(NPY_TIMEDELTA);
}
if (descr == NULL) {
return NULL;
}
dt_data = &(((PyArray_DatetimeDTypeMetaData *)((_PyArray_LegacyDescr *)descr)->c_metadata)->meta);
memcpy(dt_data, &((PyDatetimeScalarObject *)sc)->obmeta,
sizeof(PyArray_DatetimeMetaData));
return descr;
}
/*
* Use PyArray_DescrFromTypeObject to get the DType class — this is more
* robust than PyArray_DiscoverDTypeFromScalarType because it handles
* subclasses via MRO walking. Then call discover_descr_from_pyobject
* unconditionally, which extracts the correct descriptor from the scalar
* instance (important for parametric dtypes like Void, String, and
* new-style user dtypes).
*/
PyArray_Descr *descr = PyArray_DescrFromTypeObject((PyObject *)Py_TYPE(sc));
if (descr == NULL) {
return NULL;
}
PyArray_DTypeMeta *DType = NPY_DTYPE(descr);
Py_INCREF(DType);
Py_DECREF(descr);
PyArray_Descr *result = NPY_DT_CALL_discover_descr_from_pyobject(DType, sc);
Py_DECREF(DType);
return result;
}- Void special case (6 lines) —
void_discover_descr_from_pyobjecthandles it - Our new-style dtype block (14 lines) — subsumed by the unconditional pattern
- Unsized string/unicode/other handling (27 lines) —
string_discover_descr_from_pyobjecthandles it
Total: ~47 lines removed, ~12 lines added. Net simplification of ~35 lines.
- Datetime/Timedelta special case — required to prevent infinite recursion
-
Low risk: Some edge case in void subtype handling where
void_discover_descr_from_pyobjectbehaves slightly differently from the old hand-rolled code. Mitigated by: existing tests cover void scalars, andvoid_discover_descr_from_pyobjectis already the canonical path used by array coercion. -
Low risk: Minor performance regression for builtin scalars (one extra function call through
discover_descr_from_pyobject). Maintainer explicitly said not to worry.
The existing tests in test_custom_dtypes.py (our TestDescrFromScalarNewStyleDType
class) should still pass. We should also verify:
np.voidscalar.astype()still worksnp.bytes_andnp.str_scalar descriptor extraction still gives correct sizesnp.datetime64andnp.timedelta64scalar descriptor extraction is unchanged- All existing scalar tests pass (
test_scalar_methods,test_multiarray, etc.)