Field | Value |
---|---|
DIP: | (number/id -- assigned by DIP Manager) |
Author: | Richard (Rikki) Andrew Cattermole [email protected] |
Implementation: | (links to implementation PR if any) |
Status: | Draft |
This proposal models a sumtype based upon a struct, with a focus towards acceptance of current library implementations in D and with the reflection of the past 50 years of programming language design.
- Rationale
- Prior Work
- Description
- Breaking Changes and Deprecations
- Reference
- Copyright & License
- History
Adding sum types into the language offers three key advantages for usability.
- Rapid prototyping of business logic data. This can be used in conjunction with tuples to construct the data that represents web forms, in full with only one declaration required.
- Enabling an exception mechanism that is stack-based, rather than using a runtime unwinding library. This has the potential to be as cheap as a single move instruction to throw. Whilst guaranteeing every exception thrown is caught close to where it was thrown. Without any heap allocation.
- Safety, coupled with matching guarantees that a
@safe
cannot misuse it at compile time and in doing so corrupt a program.
Previously implemented library types in D, these can be split into two categories.
- Does not support element names:
- Optionally supports element names for non-unique types in constraint set:
This divide placing a high emphasis on a simpler element type only for a library implementation can be seen in other languages too:
Language support can be divided into three categories, rather than two:
- Does not support element names:
- Optional element names:
- Optional element types:
A sumtype is a container that can only store one value, whose access and mutation are constrained by a union set. Protected access is provided by a tag that gives tagged union behaviour.
sumtype Identifier = int | :None | string text;
Constraints come in three forms.
- A type
int
. - A member of operator
:None
that gets rewritten to be a typetypeof(:None)
. - A type with a name
string text
.
There must be at least one constraint.
If there is only one constraint, it is treated as though there was an alias this
to the element. Giving implicit safe access and mutation to it.
sumtype Int = int;
Int i = 2;
i *= 5;
Assigning one sumtype to another is done like how structs work.
You destroy the target, and then copy the source to it, after copying update the tag, copy constructor, and destructor.
sumtype S = int | bool;
S source, target;
//target.destroy;
target = source;
// target.__tag = source.__tag;
// target.__copyctor = source.__copyctor;
// target.__dtor = source.__dtor;
Assigning an element that has not been specifically requested by the user, is a little more involved. It is not enough to check against the constraint set, you must consider the sumtype variable itself.
sumtype S1 = int;
sumtype S2 = int | bool;
S1 source;
S2 target;
target = source;
The sumtype type is preferred over the elements in the constraint set. If you need to guarantee that it is setting the element rather than the sumtype variable, use a cast.
To access a value to the element of a sumtype that is a type, use a cast.
sumtype S = int | bool;
S s;
s = 5;
assert(cast(int)s == 5);
If you want to access an element by name provide it with the member of operator.
sumtype S = int i | bool;
S s;
cast(:i)s = 2;
assert(cast(:i)s == 2);
This has the same rewrite seen in the declaration of the sumtype. Where the member of operator gets rewritten to typeof(:Identifier)
. Giving a unique element in the set.
Storing a value into an element is always safe. It will destroy the container prior to the copy.
Getting a value from an element is not safe without matching. It is @system
to access an element value.
A sumtype has the members:
__tag
An integral tag value to indicate which element has been set.__tagTypes
To acquire the types in the constraint set.__tagNames
To acquire the name for each element in the constraint set, if not present a string in the sequence will benull
.__tagValues
Maps an offset for an element description in the constraint set, to its tag value.
The tag value is defined to be a hash value of the tag type combined with the tag name. Of type size_t
. This enables both growth and shrinking of the constraint set without requiring a full match and reassignment of the tag value. This is important for some use cases like value type exceptions.
The default tag that is selected, is the first element. In the following example :None
is the default element that would be used for .init
.
sumtype S = :None | int;
When a type qualifier is applied to a sumtype, it is also applied to the constraint set.
sumtype S = int* | bool;
const(S) s;
pragma(msg, S.__tagTypes[0]); // const(int*)
A sumtype is based upon a struct, it has its copy constructor/destructor called automatically just like a struct would. It has the following variable-sized layout, which depends upon the constraint set:
- Tag value
- † Copy constructor function pointer
- † Destructor function pointer
- ♥ Tag value
† May be elided if no elements in the constraint set require it. ♥ May be elided if all elements in the constraint set have a size of zero.
If required the compiler should generate a patch function that takes the pointer to the tag value, combines it with the copy constructor/destructor and makes it into a delegate. It should however be able to reconstruct the delegate without a new function.
A copy constructor will function in place of a postblit. However, it may need a wrapper function to be generated.
The layout must be variable in size, to allow it to be the size of a single machine word that can fit in a register. This allows for other language elements like value type exceptions to hold their name-sake zero-cost capability.
If a copy constructor or destructor is present in the layout, it must be non-null, regardless of the element selected.
Unused memory that would be used for a different element is to be left in an undefined state and is not safe to access.
An optimization strategy that Walter Bright remembered during DConf 2024 is to utilize the carry flag to indicate an error occured. This allows for the tag to be elided when returning a sumtype that is non-void as long as the sumtype elements only contain member-of-operators and the return type fits inside a register.
Matching as provided by DIP1048, is wired up to understand a sumtype providing of the properties it uses. Notably one is missing __tagValue
, the compiler understands where and how to acquire this reference. Therefore it should not exist.
sumtype S = int | string text | :None;
S s;
s.match {
(:text someTextVar) {
};
(int) {
};
(:None) {
};
};
The match patterns, now support the member of operator in the place of the type. It acts as a name lookup in the constraint set, for the given element name, or is transformed into typeof(:Identifier)
and does a type match. This capability is extended to library sumtypes by finding the index in __tagNames
which is a compile-time accessible sequence of string
s.
Accessing (by-ref too) an element value is a @safe
operation. It has no restrictions that accessing by cast does.
Lastly, a match expression supports returning a value. The returned value will be a language sumtype. Each return may have a different type, these will be included in the resulting type.
sumtype S = :None | int;
sumtype S2 = int | bool;
S1 s1;
S2 s2 = s1.match {
(:None) {
if (random() > 0.5)
return true;
else
return 2;
};
(int v) {
return v;
};
};
The syntax of a sumtype is based upon the predominately used description for one in a large portion of the literature.
Of note is it may templated, and when the member of operator is used to provide an element, it is rewritten into its type which unifies the syntax into one semantically represented concept.
+ SumTypeDeclaration:
+ sumtype Identifier SumTypeConstraints
+ sumtype Identifier TemplateParameters Constraint|opt SumTypeConstraints
+ SumTypeConstraints:
+ SumTypeConstraint , SumTypeConstraints
+ SumTypeConstraint
+ SumTypeConstraint:
+ MemberOfOperator
+ Type Identifier|opt
MatchPattern:
+ MemberOfOperator FunctionLiteralBody
ParameterDeclaration:
+ ParameterAttributes|opt MemberOfOperator Identifier
TypeSpecialization:
+ sumtype
Any symbols called sumtype
may break. The likely solution is to require the keyword to only be recognized in a set edition and above.
- Sum Types DIP by Walter Bright
- Enumerated Unions DIP by Paul Backus
- std.variant Algebraic
- std.sumtype
- taggedalgebraic
- C++ variant
- TypeScript union types
- Rust enumerations
- SML datatype
- OCaml enumerate data types
Copyright (c) 2024 by the D Language Foundation
Licensed under Creative Commons Zero 1.0
The DIP Manager will supplement this section with links to forum discussions and a summary of the formal assessment.