Field | Value |
---|---|
DIP: | (number/id -- assigned by DIP Manager) |
Author: | Paul Backus ([email protected]) |
Implementation: | (links to implementation PR if any) |
Status: | Draft |
This DIP proposes a conservative design for sum types that aims to be consistent with existing D syntax and semantics. It does not discuss pattern matching.
- Rationale
- Prior Work
- Description
- Breaking Changes and Deprecations
- Reference
- Copyright & License
- History
Sum types have proven to be a useful and popular feature in many languages. In
D, several library implementations are available, including Phobos's
std.variant
and std.sumtype
, and vibe.d's taggedalgebraic
.
Benefits to having sum types as a built-in language feature (rather than a library feature) would include nicer syntax, better error messages, and better compile-time performance.
Other languages have taken a variety of different approaches to implementing sum types. This list includes representative examples of several approaches:
- C++'s std::variant
- TypeScript's union types
- Rust's enumerations
- Standard ML's datatypes
- Scala's sealed traits and case classes
Enumerated unions are a specialized kind of union. Except when otherwise specified, enumerated unions behave the same way as unions.
An enumerated union is declared by using the keywords enum union
instead of
union
in a union declaration.
Example:
enum union WebAddress
{
ubyte[4] ipv4;
ubyte[16] ipv6;
string url;
}
Anonymous struct and union fields are not allowed in an
enumerated union. This ensures that there is always exactly one active field in
any enum union
object.
The __tag
property is used to determine at runtime which field of an
enumerated union is active.
For any enum union
expression e
, the expression e.__tag
is an rvalue of
type size_t
which evaluates to the index of the active field in e.tupleof
.
Example: Using the __tag
property to check if a field is active.
bool has(string target)(ref WebAddress addr)
if (target == "ipv4" || target == "ipv6" || target == "url")
{
switch (addr.__tag)
{
static foreach (i, field; WebAddress.tupleof)
{
case i:
{
enum isTarget = __traits(identifier, field) == target;
return isTarget;
}
}
default:
assert(0);
}
}
unittest
{
Address a = { url: "https://dlang.org/" };
assert( a.has!"url");
assert(!a.has!"ipv4");
}
In addition to its declared fields, an enumerated union may contain an additional hidden field called the tag field.
The tag field is used to store any additional data necessary to keep track of
the enum union
's active field at runtime. It may be omitted if the compiler
determines that no additional data is needed (for example, if the enum union
has only one declared field).
The tag field's storage does not overlap with any of the declared fields.
The type of the tag field must be a POD type, but is otherwise unspecified.
The size, offset, and alignment of the tag field are unspecified.
If two enum union
values are of the same type, and both have the same active
field, then the values stored in their tag fields must have identical binary
representations.
Aside from the restriction above, the values stored in an enum union
's tag
field are unspecified.
It is undefined behavior to store any value in the tag field of an enum union
object that was not read from the tag field of an object of the same type.
The tag field is not included in .tupleof
.
Unless otherwise specified, any reference to the "fields" of an enum union
in
this document refers only to the declared fields, and does not include the tag
field.
Unlike traditional unions, enumerated unions may have copy constructors, postblits, destructors, and invariants.
If an enum union
does not have a copy constructor or a postblit, but one or
more of its fields has elaborate copy semantics, a copy constructor is
generated which performs the following steps:
- Copy-initializes the active field from the active field of the original object. If the active field has a copy constructor or postblit, it is called during this step.
- Copy-initializes the tag field (if any) from the tag field of the original object.
A type has elaborate copy semantics if it has a postblit or copy constructor,
or if it directly embeds a type with elaborate copy semantics. This is the same
definition used by
std.traits.hasElaborateCopyConstructor
.
If necessary, the compiler should generate multiple copy constructor overloads to handle different combinations of type qualifiers on the new and original objects.
If an enum union
does not have a destructor, but one or more of its fields
has elaborate destruction semantics, a destructor is generated which performs
the following steps:
- If the active field has elaborate destruction semantics, destroys the active field.
A type has elaborate destruction semantics if
- it has a destructor or directly embeds a type with elaborate destruction semantics; and,
- it is not a class type or a non-enumerated union type.
This is the same definition used by
std.traits.hasElaborateDestructor
.
Enumerated union values of the same type can be compared for equality.
Two enum union
values of the same type are equal if they have the same active
field, and the values of their active fields are equal.
Direct access to fields of an enumerated union is subject to the same safety restrictions as access to fields of a traditional union.
A value of an enumerated union type is a safe value if
- its
__tag
property evaluates to the index of the active field, and - the value of its active field is safe.
@trusted
code may assume that the field indicated by the __tag
property is
the active field, and may rely on that assumption to allow access to the active
field in @safe
code.
Example:
@trusted ref get(string target)(ref WebAddress addr)
if (target == "ipv4" || target == "ipv6" || target == "url")
{
switch (addr.__tag)
{
static foreach (i, field; WebAddress.tupleof)
{
case i:
{
enum isTarget = __traits(identifier, field) == target;
static if (!isTarget)
assert(0, "Active field is " ~ active ~ ", not " ~ target);
else
return addr.tupleof[i];
}
}
default:
assert(0);
}
}
@safe unittest
{
WebAddress a1 = { url: "https://www.rust-lang.org/" };
WebAddress a2 = { ipv4: [127, 0, 0, 1] };
assert(a1.get!"url" == "https://www.rust-lang.org/");
assert(a2.get!"ipv4" == [127, 0, 0, 1]);
a1.get!"url" = "https://dlang.org/";
}
Writing to an enum union
object is @system
if the enum union
has fields
whose types have unsafe values, since doing so could invalidate existing
pointers or references to the active field.
Access to the tag field of an enum union
, if it exists, is always @system
.
A new TypeSpecialization, enum union
, is added to the syntax for the is()
expression.
is(T == enum union)
evaluates to true
if T
is an enumerated union type.
is(T : enum union)
evaluates to true
if T
is an enumerated union type, or
implicitly converts to an enumerated union type.
Currently, the syntax enum union { /* ... */ }
is parsed by the D compiler as
a union declaration with the enum
storage class applied to it.
Since the enum
storage class has no effect in this context, it is unlikely
that existing D projects will be affected if this syntax is given a new
meaning. However, it is not impossible.
- Sum Types - first draft by Walter Bright.
Copyright (c) 2024 by the D Language Foundation
Licensed under Creative Commons Zero 1.0
The DIP Manager will supplement this section with links to forum discsusionss and a summary of the formal assessment.