Skip to content

Instantly share code, notes, and snippets.

@pbackus
Created September 12, 2024 02:38
Show Gist options
  • Save pbackus/28e7f5668219ce83467c83c347ec7202 to your computer and use it in GitHub Desktop.
Save pbackus/28e7f5668219ce83467c83c347ec7202 to your computer and use it in GitHub Desktop.
Enumerated Unions - Draft DIP

Enumerated Unions

Field Value
DIP: (number/id -- assigned by DIP Manager)
Author: Paul Backus ([email protected])
Implementation: (links to implementation PR if any)
Status: Draft

Abstract

This DIP proposes a conservative design for sum types that aims to be consistent with existing D syntax and semantics. It does not discuss pattern matching.

Contents

Rationale

Sum types have proven to be a useful and popular feature in many languages. In D, several library implementations are available, including Phobos's std.variant and std.sumtype, and vibe.d's taggedalgebraic.

Benefits to having sum types as a built-in language feature (rather than a library feature) would include nicer syntax, better error messages, and better compile-time performance.

Prior Work

In D

In other languages

Other languages have taken a variety of different approaches to implementing sum types. This list includes representative examples of several approaches:

Description

Enumerated unions are a specialized kind of union. Except when otherwise specified, enumerated unions behave the same way as unions.

Syntax

An enumerated union is declared by using the keywords enum union instead of union in a union declaration.

Example:

enum union WebAddress
{
    ubyte[4] ipv4;
    ubyte[16] ipv6;
    string url;
}

Fields

Anonymous struct and union fields are not allowed in an enumerated union. This ensures that there is always exactly one active field in any enum union object.

__tag property

The __tag property is used to determine at runtime which field of an enumerated union is active.

For any enum union expression e, the expression e.__tag is an rvalue of type size_t which evaluates to the index of the active field in e.tupleof.

Example: Using the __tag property to check if a field is active.

bool has(string target)(ref WebAddress addr)
if (target == "ipv4" || target == "ipv6" || target == "url")
{
    switch (addr.__tag)
    {
        static foreach (i, field; WebAddress.tupleof)
        {
            case i:
            {
                enum isTarget = __traits(identifier, field) == target;
                return isTarget;
            }
        }
        default:
            assert(0);
    }
}

unittest
{
    Address a = { url: "https://dlang.org/" };
    assert( a.has!"url");
    assert(!a.has!"ipv4");
}

Memory layout

In addition to its declared fields, an enumerated union may contain an additional hidden field called the tag field.

The tag field is used to store any additional data necessary to keep track of the enum union's active field at runtime. It may be omitted if the compiler determines that no additional data is needed (for example, if the enum union has only one declared field).

The tag field's storage does not overlap with any of the declared fields.

The type of the tag field must be a POD type, but is otherwise unspecified.

The size, offset, and alignment of the tag field are unspecified.

If two enum union values are of the same type, and both have the same active field, then the values stored in their tag fields must have identical binary representations.

Aside from the restriction above, the values stored in an enum union's tag field are unspecified.

It is undefined behavior to store any value in the tag field of an enum union object that was not read from the tag field of an object of the same type.

The tag field is not included in .tupleof.

Unless otherwise specified, any reference to the "fields" of an enum union in this document refers only to the declared fields, and does not include the tag field.

Special member functions

Unlike traditional unions, enumerated unions may have copy constructors, postblits, destructors, and invariants.

If an enum union does not have a copy constructor or a postblit, but one or more of its fields has elaborate copy semantics, a copy constructor is generated which performs the following steps:

  1. Copy-initializes the active field from the active field of the original object. If the active field has a copy constructor or postblit, it is called during this step.
  2. Copy-initializes the tag field (if any) from the tag field of the original object.

A type has elaborate copy semantics if it has a postblit or copy constructor, or if it directly embeds a type with elaborate copy semantics. This is the same definition used by std.traits.hasElaborateCopyConstructor.

If necessary, the compiler should generate multiple copy constructor overloads to handle different combinations of type qualifiers on the new and original objects.

If an enum union does not have a destructor, but one or more of its fields has elaborate destruction semantics, a destructor is generated which performs the following steps:

  1. If the active field has elaborate destruction semantics, destroys the active field.

A type has elaborate destruction semantics if

  1. it has a destructor or directly embeds a type with elaborate destruction semantics; and,
  2. it is not a class type or a non-enumerated union type.

This is the same definition used by std.traits.hasElaborateDestructor.

Equality

Enumerated union values of the same type can be compared for equality.

Two enum union values of the same type are equal if they have the same active field, and the values of their active fields are equal.

Safety

Direct access to fields of an enumerated union is subject to the same safety restrictions as access to fields of a traditional union.

A value of an enumerated union type is a safe value if

  1. its __tag property evaluates to the index of the active field, and
  2. the value of its active field is safe.

@trusted code may assume that the field indicated by the __tag property is the active field, and may rely on that assumption to allow access to the active field in @safe code.

Example:

@trusted ref get(string target)(ref WebAddress addr)
if (target == "ipv4" || target == "ipv6" || target == "url")
{
    switch (addr.__tag)
    {
        static foreach (i, field; WebAddress.tupleof)
        {
            case i:
            {
                enum isTarget = __traits(identifier, field) == target;
                static if (!isTarget)
                    assert(0, "Active field is " ~ active ~ ", not " ~ target);
                else
                    return addr.tupleof[i];
            }
        }
        default:
            assert(0);
    }
}

@safe unittest
{
    WebAddress a1 = { url: "https://www.rust-lang.org/" };
    WebAddress a2 = { ipv4: [127, 0, 0, 1] };

    assert(a1.get!"url" == "https://www.rust-lang.org/");
    assert(a2.get!"ipv4" == [127, 0, 0, 1]);

    a1.get!"url" = "https://dlang.org/";
}

Writing to an enum union object is @system if the enum union has fields whose types have unsafe values, since doing so could invalidate existing pointers or references to the active field.

Access to the tag field of an enum union, if it exists, is always @system.

Reflection

A new TypeSpecialization, enum union, is added to the syntax for the is() expression.

is(T == enum union) evaluates to true if T is an enumerated union type.

is(T : enum union) evaluates to true if T is an enumerated union type, or implicitly converts to an enumerated union type.

Breaking Changes and Deprecations

Currently, the syntax enum union { /* ... */ } is parsed by the D compiler as a union declaration with the enum storage class applied to it.

Since the enum storage class has no effect in this context, it is unlikely that existing D projects will be affected if this syntax is given a new meaning. However, it is not impossible.

Reference

Copyright & License

Copyright (c) 2024 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

History

The DIP Manager will supplement this section with links to forum discsusionss and a summary of the formal assessment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment