Skip to content

Instantly share code, notes, and snippets.

@MaartenBaert
Created March 24, 2026 15:27
Show Gist options
  • Select an option

  • Save MaartenBaert/b77e664b1b86a4fb585ef1686151705d to your computer and use it in GitHub Desktop.

Select an option

Save MaartenBaert/b77e664b1b86a4fb585ef1686151705d to your computer and use it in GitHub Desktop.

Revised PR: Simplify PyArray_DescrFromScalar per maintainer feedback

Maintainer's suggestion (decoded)

The maintainer (Sebastian Berg) suggests replacing most of PyArray_DescrFromScalar with a two-step pattern:

  1. Get the DType class via PyArray_DescrFromTypeObject (robust, handles subclasses)
  2. Call NPY_DT_CALL_discover_descr_from_pyobject(DType, sc) unconditionally

And "delete everything else" — meaning the special cases for Void, unsized string/unicode, and the new-style dtype block we added.

He also explains why PyArray_DescrFromTypeObject is preferable to PyArray_DiscoverDTypeFromScalarType: the latter fails for subclasses of scalar types (e.g., someone subclassing np.float64), while DescrFromTypeObject handles these via MRO walking.

Analysis: what can we actually delete?

Void special case (lines 408-412) — CAN DELETE

void_discover_descr_from_pyobject (dtypemeta.c:542) handles void scalars correctly:

if (PyArray_IsScalar(obj, Void)) {
    Py_INCREF(void_obj->descr);
    return (PyArray_Descr *)void_obj->descr;
}

Same behavior as the current special case. Safe to remove.

Unsized string/unicode/other handling (lines 461-487) — CAN DELETE

string_discover_descr_from_pyobject (dtypemeta.c:510) checks PyBytes_Check and PyUnicode_Check and computes itemsize correctly. Since np.bytes_ subclasses bytes and np.str_ subclasses str, these isinstance checks pass. Same effective behavior. Safe to remove.

Our new-style dtype block (lines 434-455) — CAN DELETE

Subsumed by the unconditional discover_descr_from_pyobject call. This was our fix; the maintainer's approach generalizes it to all dtypes. Safe to remove.

Datetime/Timedelta special case (lines 414-432) — MUST KEEP

Circular recursion: discover_datetime_and_timedelta_from_pyobject (dtypemeta.c:572) calls back into PyArray_DescrFromScalar(obj):

// dtypemeta.c:577
PyArray_Descr *descr = PyArray_DescrFromScalar(obj);

If we removed the datetime/timedelta special case from PyArray_DescrFromScalar, the new general path would call discover_descr_from_pyobject, which calls PyArray_DescrFromScalar, which calls discover_descr_from_pyobject → infinite recursion, stack overflow.

The special case acts as a base case that breaks this recursion. It must stay.

Note: The maintainer may have been speaking aspirationally when they said "delete everything else." They acknowledged "I have to think about how that is best." The datetime recursion is a concrete blocker that prevents full deletion.

Impact on behavior

Scalar type Before After Changed?
Builtin non-parametric (float64, int32, ...) DescrFromTypeObject → singleton DescrFromTypeObject → DType → discover → singleton No (same result)
Void Special case → void_obj->descr DescrFromTypeObject → DType → void_discovervoid_obj->descr No (same result)
Datetime/Timedelta Special case (kept) Special case (kept) No
String/Unicode (unsized) DescrFromTypeObject + manual elsize DescrFromTypeObject → DType → string_discover → sized descr No (same result)
New-style non-parametric user dtype Our block → discoverdefault_descr DescrFromTypeObject → DType → discoverdefault_descr No
New-style parametric user dtype Our block → discover → correct descr DescrFromTypeObject → DType → discover → correct descr No (same fix)
Scalar subclass (e.g., subclass of float64) DescrFromTypeObject_descr_from_subtype DescrFromTypeObject → DType → discover → singleton No (same base dtype)

Performance: For builtin dtypes, the new path does one extra function call (discover_descr_from_pyobjectnonparametric_discover_descr_from_pyobject which just increfs and returns the singleton). Negligible. The maintainer explicitly said "I wouldn't worry about it."

Proposed new code

/*NUMPY_API
 * Return descr object from array scalar.
 *
 * New reference
 */
NPY_NO_EXPORT PyArray_Descr *
PyArray_DescrFromScalar(PyObject *sc)
{
    /*
     * Datetime and timedelta must remain here because
     * discover_datetime_and_timedelta_from_pyobject calls back into
     * PyArray_DescrFromScalar — removing this would create infinite recursion.
     */
    if (PyArray_IsScalar(sc, Datetime) || PyArray_IsScalar(sc, Timedelta)) {
        PyArray_DatetimeMetaData *dt_data;
        PyArray_Descr *descr;

        if (PyArray_IsScalar(sc, Datetime)) {
            descr = PyArray_DescrNewFromType(NPY_DATETIME);
        }
        else {
            /* Timedelta */
            descr = PyArray_DescrNewFromType(NPY_TIMEDELTA);
        }
        if (descr == NULL) {
            return NULL;
        }
        dt_data = &(((PyArray_DatetimeDTypeMetaData *)((_PyArray_LegacyDescr *)descr)->c_metadata)->meta);
        memcpy(dt_data, &((PyDatetimeScalarObject *)sc)->obmeta,
               sizeof(PyArray_DatetimeMetaData));

        return descr;
    }

    /*
     * Use PyArray_DescrFromTypeObject to get the DType class — this is more
     * robust than PyArray_DiscoverDTypeFromScalarType because it handles
     * subclasses via MRO walking.  Then call discover_descr_from_pyobject
     * unconditionally, which extracts the correct descriptor from the scalar
     * instance (important for parametric dtypes like Void, String, and
     * new-style user dtypes).
     */
    PyArray_Descr *descr = PyArray_DescrFromTypeObject((PyObject *)Py_TYPE(sc));
    if (descr == NULL) {
        return NULL;
    }
    PyArray_DTypeMeta *DType = NPY_DTYPE(descr);
    Py_INCREF(DType);
    Py_DECREF(descr);

    PyArray_Descr *result = NPY_DT_CALL_discover_descr_from_pyobject(DType, sc);
    Py_DECREF(DType);
    return result;
}

What this removes

  1. Void special case (6 lines) — void_discover_descr_from_pyobject handles it
  2. Our new-style dtype block (14 lines) — subsumed by the unconditional pattern
  3. Unsized string/unicode/other handling (27 lines) — string_discover_descr_from_pyobject handles it

Total: ~47 lines removed, ~12 lines added. Net simplification of ~35 lines.

What this keeps

  • Datetime/Timedelta special case — required to prevent infinite recursion

Risks

  1. Low risk: Some edge case in void subtype handling where void_discover_descr_from_pyobject behaves slightly differently from the old hand-rolled code. Mitigated by: existing tests cover void scalars, and void_discover_descr_from_pyobject is already the canonical path used by array coercion.

  2. Low risk: Minor performance regression for builtin scalars (one extra function call through discover_descr_from_pyobject). Maintainer explicitly said not to worry.

Tests

The existing tests in test_custom_dtypes.py (our TestDescrFromScalarNewStyleDType class) should still pass. We should also verify:

  • np.void scalar .astype() still works
  • np.bytes_ and np.str_ scalar descriptor extraction still gives correct sizes
  • np.datetime64 and np.timedelta64 scalar descriptor extraction is unchanged
  • All existing scalar tests pass (test_scalar_methods, test_multiarray, etc.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment