Skip to content

Instantly share code, notes, and snippets.

@balt-dev
Last active November 30, 2024 18:05
Show Gist options
  • Save balt-dev/87147e26f089f9b4cb6b8dd4d54fce42 to your computer and use it in GitHub Desktop.
Save balt-dev/87147e26f089f9b4cb6b8dd4d54fce42 to your computer and use it in GitHub Desktop.

uint and distinct type aliases

In Rust today, we have distinct primitive types for all integers. u8, u16, i64, usize, etc.

This works fine, and serves well as a way of handling numbers. Every programming langauge under the sun does this, and it works.

But we could do better.

With the advent of const generics in Rust 1.51, we could add a unifying integer type that handles all of these at once: uint (and int for signed types).


In this theoretical version of the lanugage, u8 would simply be a type alias to uint::<8>, and same for everything else.

usize would alias to uint::<mem::POINTER_SIZE>, but there's a problem with this. This would mean that depending on your system, usize could be equal to uint::<64>, which adds a very prominent footgun for portability into the language.

Because of this, I propose new syntax - a "distinct" type alias. People familiar with C3 may know about this.1

Distinct type aliases are, following the name, distinct from the type they alias to. They inherit all of the methods and functionality, but they are not the same type. This also allows you to impl on a distinct type alias.

I can imagine the syntax for this going something like this:

distinct type usize = uint::<mem::POINTER_SIZE>;

You would also be able to as cast between a distinct type alias and its underlying type if need be - this doesn't break anything with integers, as you can already as cast between them at will.

Along with this, since you can implement methods on specific values of generic types, things like from_le_bytes don't need to go away - it could in fact be implemented for all uint::<N> where N % 8 == 0, but would likely be easiest to impl uint::<32> and such.


Let's take a step back. All this seems pretty neat, but what's the benefit here? It's a big jump in complexity in the language, and might not be intuitive for some.

Where the power of this lies is twofold.

On the one hand, we can now implement methods for every integer type at the same time. This reduces the amount of repeated code, which is always a good thing to have.

On the other, this allows non-power of two integer types.

These types are supported by LLVM, which means there wouldn't need to be too much work to get them in on the lower level of Rust - but there would have to be some careful handling on higher levels.

Imagine a struct like this2:

#[repr(packed(8))]
pub struct LightInfo {
  pub is_light_info: bool,
  pub is_lamp_color: bool,
  _padding: uint::<2>,
  pub brightness: uint::<4>
}

This is something that isn't possible natively in Rust. There's an equivalent form to this in C++:

typedef struct LightInfo {
  bool isLightInfo: 1;
  bool isLampColor: 1;
  unsigned: 2; // padding
  unsigned int brightness: 4;
} LightInfo;

This would allow for new avenues, e.g. bitstruct enums.

Below is an example, of an enum representing an instruction from the Overture ISA, from the game Turing Complete3:

#[repr(uint(3))]
pub enum Register { R0, R1, R2, R3, R4, R5, IO }
#[repr(uint(3))]
pub enum AluInstruction { OR, NAND, NOR, AND, ADD, SUB }
#[repr(uint(3))]
pub enum Condition { Never, EqZero, LtZero, LeqZero, Always, NeqZero, GeqZero, GtZero }

#[repr(packed(8))]
pub enum Instruction {
	Immediate { value: uint::<6> } = 0b00u2,
	Calculate { alu_code: AluInstruction } = 0b00000u5,
	Copy { source: Register, destination: Register } = 0b01u2,
	Branch { condition: Condition } = 0b11000u5
}

As far as I'm aware, no other language has done this yet.

There's danger in this, though - Imagine something like this:

#[repr(packed(65))]
pub struct Adversarial {
  pub offset: bool,
  pub misaligned: u64
}

Not only would misaligned be not byte aligned, but it wouldn't even be bit aligned. There's a real danger in having types cross byte boundaries, so I can imagine this would have to be explicitly disallowed by the compiler.

This makes generic structs with integers kind of annoying, as this:

pub struct Vector2I<const N: usize> { pub x: int::<N>, pub y: int::<N> }

cannot be #[repr(packed)], as there's a possibility of the values flowing off a byte - so things like Vector2I<4> would have to take up at minimum 2 bytes.

Footnotes

  1. https://www.learn-c3.org/More/28/

  2. https://wiki.vg/Classic_Protocol_Extension#LightingMode_packet

  3. https://steamcommunity.com/sharedfiles/filedetails/?id=2782647016

@balt-dev
Copy link
Author

balt-dev commented Nov 30, 2024

Realizing now that the way I said it, distinct type aliases are equivalent to a subset of OOP inheritance :L

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment