Speculations about memory allocation in PHP

Premise

Whenever I write code, I habitually try to minimize the amount of memory allocations required for that code to fulfill its purpose.

Memory allocations are very slow:

They are complicated. So the CPU has to execute a bunch of instructions inside malloc.
They are complicated. The mass of instructions occupies bunch of memory, which then must occupy the CPU's instruction cache (icache) whenever it is executed. This will (probably) be a cache miss to load the allocator, and then it will evict a bunch of "hot" code that will then cache miss again once the allocator is done. Cache misses are SLOW.
If the allocator can't retrieve memory from its own pool, it will need to use a system call and execute kernel code to retrieve more memory. Suddenly we've gone from idyllic single-threaded concerns, to "every thread on the host machine wants this thing that you're accessing" kind of bigmess.
It has poor locality of reference. If there's any memory fragmentation, then the allocator may need to walk lists or traverse disparate parts of memory to find what its looking for. This will then cause a bunch of memory cache misses. Cache misses are SLOW.

In my past, I have converted many slow programs into fast programs by removing unnecessary memory allocations (among other things... stares daggers at unnecessary disk I/O). The experience reflects the theory.

I find myself solving a simple problem: how to store a timestamp-and-random-integer pair in PHP in the most efficient way possible.

In my head, I'm comparing some possibilities:

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Maybe/probably allocates?
    return ctime . strval(crand);
}

class RecordIdRepresentation
{
    public function __construct(
        public string $ctime,
        public int    $crand
    ) {}
}

// ...

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Definitely allocates
    return new RecordIdRepresentation($ctime, $crand);
}

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Allocates A LOT (hashtable allocations are hueg liek eksbawks)
    return [$ctime, $crand];
}

Communing With The Machine

I'm going to drop into C code (something like ISO C99) a bit, simply because it's a mostly-readable way to represent what the machine is doing (at a conceptual level).

Related: In the C code, I'll be using size_t as the integer type a lot. It is typically a 64-bit integer (uint64_t) for 64-bit machines, or a 32-bit integer (int32_t) for 32-bit machines, so it pretty closely represents what PHP's int type means.

(Notable obscura: there are exceptions, like on Itanium VMS, where the machine can do 64-bit addressing (sizeof(void*) == sizeof(uint64_t)) but malloc never returns contiguous chunks of memory longer than 2^32 bytes. On that platform, sizeof(size_t) == sizeof(uint32_t) and sizeof(uintptr_t) == sizeof(uint64_t)! But platforms like this are very rare... I can't think of any other examples.)

C's int type is weird and I can't think of any advantage to it besides "some other dummy wrote an API in C and used int as a function parameter or return type, so now I have to use an int to call that function". PLEASE never use C's int type.

(PHP's int type is NOT C's int type. PHP's int type, being more like size_t, is actually useful in a lot of common cases, like array indices. I just wish it weren't the only integer type we had access too 🙄)

Option A:

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Maybe/probably allocates?
    return ctime . strval(crand);
}

If this allocates, it's because it gets interpreted/compiled into something that would be like this C code:

char *recordid_from(const char  *ctime,  size_t  crand)
{
    // 21 is the number of characters required to represent the longest
    // possible 64-bit integer, INCLUDING 1 extra character for
    // C's typical "null terminating" 0 byte.
    size_t crand_capacity = 21;

    // Execute `strval(crand)`
    char *crand_str = malloc(crand_capacity);
    size_t crand_len = snprintf(crand_str, crand_capacity, "%ld", crand);
    assert(0 < crand_len);
    assert((crand_len+1) <= crand_capacity);

    // Concatenate `ctime` with result of `strval(crand)`.
    size_t ctime_len = strlen(ctime);
    size_t newstr_capacity = ctime_len + crand_len + 1;
    char *newstr = malloc(newstr_capacity);
    size_t newstr_len = snprintf(newstr, newstr_capacity, "%s%s", ctime, crand_str);
    assert(0 < newstr_len);
    assert((newstr_len+1) <= newstr_capacity);

    return newstr;
}

Notice that malloc is called twice.

If we're lucky, the interpreter notices the opportunity and executes something analogous to this slightly more optimal code:

char *recordid_from(const char  *ctime,  size_t  crand)
{
    // 20 is the number of characters required to represent the longest
    // possible 64-bit integer, NOT including the 1 extra character for
    // C's typical "null terminating" 0 byte.
    size_t crand_len = 20;

    // Execute `strval(crand)` AND concatenate `ctime` with result of `strval(crand)`.
    size_t ctime_len = strlen(ctime);
    size_t newstr_capacity = ctime_len + crand_len + 1;
    char *newstr = malloc(newstr_capacity);
    int newstr_len = snprintf(newstr, newstr_capacity, "%s%ld", ctime, crand);
    assert(0 < newstr_len);
    assert((newstr_len+1) <= newstr_capacity);

    return newstr;
}

Now malloc is called once.

But if we're REALLY lucky, it's possible to do even better:

typedef struct better_str
{
    char    *str;
    size_t  len;
    size_t  capacity;

} better_str;

char *recordid_from(better_str  ctime, size_t  crand)
{
    // Miscellaneous setup.
    better_str  crandstr;
    better_str  newstr;

    // Attempt to reuse any overallocation capacity in the ctime string to avoid allocation.
    // Just append the `crand` integer into the existing space.
    crandstr.ptr = ctime.ptr + ctime.len;
    crandstr.len = 0; // Don't know yet.
    crandstr.capacity = ctime.capacity - ctime.len;

    crandstr.len = snprintf(crandstr.ptr, crandstr.capacity, "%ld", crand);
    assert(0 < crandstr.len);
	newstr.ptr = ctime.ptr;

    // Fall back to allocation if there wasn't enough capacity.
    // (Occasional `+1` is to include trailing `\0` byte.)
    if (crandstr.capacity < (crandstr.len+1))
    {
        // Restore ctime's trailing null byte,
        // so that `ctime.ptr` points to a valid string.
        ctime.ptr[ctime.len] = '\0';

        // Allocating concatenation.
        newstr.capacity = ctime.len + crandstr.len + 1;
        newstr.ptr = malloc(newstr.capacity);
        newstr.len = snprintf(newstr.ptr, newstr.capacity, "%s%ld", ctime.ptr, crand);
        assert(0 < newstr.len);
        assert((newstr.len+1) <= newstr.capacity);
    }

    return newstr.ptr;
}

If the code before this function call left enough "wiggle room" in the crime string, then this code can do ZERO allocations. That'd be great!

There is nothing about the PHP code in scenario A that prevents this from happening. It may or may not happen, but we have given PHP the opportunity to optimize its code, and to do so without complicated flow analysis or complicated range-value-propagation analysis on string/class/whatever types.

Quick ProTip when working with (ISO) C code:
Whenever you're looking up a function (for example, snprintf above), append "opengroup" to your search query. In that case, you'd end up here. The opengroup specifications are the legit stuff and probably the most accurate info you can get without buying the ISO specifications or something. Other sites can be OK, but eh. Only problem with Opengroup specs are that sometimes they can be, uh, verbose and litigious. It's often nice to also look at examples and explanations from other sites in addition to looking at the Opengroup spec.

Option B:

class RecordIdRepresentation
{
    public function __construct(
        public string $ctime,
        public int    $crand
    ) {}
}

// ...

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Definitely allocates
    return new RecordIdRepresentation($ctime, $crand);
}

If we assume PHP classes to be somewhat like C#/Java/D/etc classes, then the corresponding C code might look like this:

typedef struct class_RecordIdRepresentation
{
    // Ignore the vtable for now.
    // Just know that polymorphism requires classes to allocate _at least_
    // enough memory to store a pointer to this thingie that allows
    // function inheritance to work. NBD though.
    // It's mostly just here for completeness+accuracy; it doesn't affect
    // our program's execution speed or memory usage significantly in this case.
    void *vtable;

    // Now our proper fields.
    const char    *ctime;
    size_t        crand;

} class_RecordIdRepresentation;


class_RecordIdRepresentation *recordid_from(const char  *ctime,  size_t  size_t  crand)
{
    class_RecordIdRepresentation *result = malloc(sizeof(class_RecordIdRepresentation));
    result->vtable = &class_RecordIdRepresentation_vtable;
    result->ctime = ctime;
    result->crand = crand;
    return result;
}

So far so good.

This could arguably be better than Option A. It's a trade-off, really: this always allocates ONCE. This is better than Option A's worst-case of 2 malloc calls, but worst than Option A's best-case of 0 malloc calls.

Only bummer is that accessing recordid_rep->ctime or recordid_rep->crand will require two indirect memory accesses, which is more likely to thrash cache, so it might be slow, BUT, we should never be doing those operations more than once or twice per record, so who cares. (Later on, in the encapsulation section, I will touch more on the "don't do this more than once or twice". I assert that ANY accessing ctime or crand from any PHP code besides "just before or after the SQL query" would be Doing It Wrong, and not just a little bit, but profoundly so. Want to do it anyways? Carefully rethink your life decisions, envision the future, then try again. 😉)

So, at first glance, classes might be competatively efficient for this problem.

だが！ (However!)

PHP classes are not C#/Java/D/etc classes.

More likely, they will be more like Perl/Python classes: they are HASH TABLES.

So, this:

class RecordIdRepresentation
{
    public function __construct(
        public string $ctime,
        public int    $crand
    ) {}
}

...

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Definitely allocates
    return new RecordIdRepresentation($ctime, $crand);
}

May actually be syntax sugar for THIS:

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Definitely allocates
    return array("__vtable" => __something, "ctime" => $ctime, "crand" => $crand);
}

This means that we're allocating a hash table, and not only that, but every time we access, say "$instance->ctime", it's not going to compile into C's instance->ctime;, it's going to compile into a hash-table lookup. (Which is still (mostly) O(1), but is a lot more complicated than just dereferencing one or two pointers!)

Indeed, if that suspicion is correct, then Option B is essentially the same thing as Option C, albeit with slightly better type safety that will only appear in a couple places in the entire codebase.

So let's talk about Option C.

Option C:

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Allocates A LOT (hashtable allocations are hueg liek eksbawks)
    return [$ctime, $crand];
}

PHP's built-in arrays ARE hashtables. In this case, we just have integers for keys.

Try writing a hashtable sometime (like, in ISO C, where all of the memory management is transparent). Let me know if your implementation can be efficient at lookup while allocating less than 1 kB of memory. Good luck!

The above is rhetorical, so here's what I know about hashtables: When a hashtable looks up some value V using key K, it does these things:

Creates a hash H of K. (Easy. No memory allocation. For strings it's just logical ops on each character. For ints, it may as well just call rand(K) and be done. In most cases it doesn't need to be a slow cryptographic randomizer either, it just needs uniformity and speed.)
Creates an index using the modulus of H, ex: size_t index = H % 37;
- The choice of modulus is determined by two things:

It should be a prime number, and
(ideally) it's close to the count of key-value pairs in the hash table.

Retrieves a bin from a plain contiguous array, ex in C-like notation: KVPair bin[] = this->bins[index];
Finds V in that bin, ex:

for (size_t j = 0; j < bin_length; j++)
{
    if ( bin[j].key == K ) {
        V = bin[j].value;
        break;
    }
}

Returns the V it found. (If not found, V might have been defaulted to null or something. Behaviors vary.)

Things to note from the above:

The this.bins array must be the size of the modulo, ex: 37 in this case.
Step 4 could potentially get nasty. For modulo 1, it is an O(n) lookup!
If we want O(1) behavior, the modulus M must be chosen as such: n/k <= M, where k is some constant that multiplies the time required for lookup.
The scan in step 4 is O(k). If we don't mind checking 2 array elements per lookup (on average), then we choose k=2, so n/2 <= M.
Using that k=2, the example hashtable with modulus 37 might be suitable for a hashtable with up to 74 elements.

We might be wondering... why worry? If we want to construct [$ctime, $crand], it's just 2 elements, so the hashtable might just turn into a 1 element binlist with a 2 element bin. NBD?

The problem is, it's a dynamic hashtable. We're not giving PHP any hints on how large that hashtable might grow (or not grow). For all it knows, this is just the beginning, and we're about to toss a bunch of stuff in there!

If you're a hashtable implementer, you don't tend to assume that the hashtable would contain 2 elements. Who would ever do such a thing?!? I mean, why use a hashtable to store a measly 2 elements!? Just using a 2-element C array would still be faster. Even a 74 element table is "small", and lookups a linear scan on it might be faster than a hash lookup on a 74 element hash table.

So hashtable implementations are more likely to start off tuned for larger numbers of elements. Ergo, if I were writing a hashtable implementation, I'd have it start with something like a 1013-element binlist, so index = H % 1013. Maybe if it detects more than, say, 2048 elements inserted, it'll do an expensive growth operation and pick a larger modulus. (Growth is expensive because it would require allocating a larger array for the binlist (which is SLOW, remember), and then it would have to shuffle all of the elements out of the old binlist and into the new binlist, calculating new indices for each one to "re-bin" each element, an O(n) operation.)

The number of allocations is likely to be larger, too.

The hash table will need to allocate:

The hash table object/struct itself.
The bin list.
MAYBE one allocation PER BIN 😱, but hopefully it has optimizations to reduce this.

This is all to say: Option B and Option C have a significant chance of doing a lot more memory allocations (and allocating more memory) than Option A's worst-case scenario of "allocate for integer stringization" then "allocate for string concatenation".

Of course, we don't truly know what's going on in PHP. The language promises to remove us from the nitty-gritty realities of machine limits like memory allocation and cache locality. But in reality, those realities still exist, they've just been hidden, and now we can't do nearly as much about them. It's like putting a rug over the pitfall trap: now we don't have an ugly pitfall trap visible, but also, you won't know when you're about to fall into a pitfall trap, and you can't, because it's invisible now (but still present and 100% dangerous).

Bonus Option B2:

use \DateTime;

class RecordIdRepresentation
{
    public function __construct(
        public DateTime $ctime,
        public int      $crand
    ) {}
}

...

public function recordid_from(string $ctime, int $crand) : mixed
{
    // Definitely allocates
    return new RecordIdRepresentation(new DateTime($ctime), $crand);
}

Want to store the $ctime as a DateTime object? Well, we still have to bind it to $crand somehow, so it doesn't solve that problem. Rather, now we have TWO hash tables! One for the {$ctime, $crand} tuple, and ANOTHER for the DateTime object! We might even have more, depending on how needy the internals of DateTime really are.

The REALLY REALLY important message that has nothing to do with memory allocation!

The best thing to do is to ensure that PHP's memory allocation schemes don't matter. This is done with proper encapsulation, by hiding this internal representation from the rest of our code.

(SURPRISE! This article isn't actually about memory allocation in PHP 😉)

As soon as we receive one of these IDs, we should immediately convert it into the internal representation, and then never peer into that representation in the rest of the codebase.

Example:

// ... some SQL query code ...
$row = mysqli_fetch_assoc($query_result);

// This line is tightly coupled to our database fields and names,
// BUT, it is not coupled to the record ID's internal representation!
$id = RecordId::from($row["ctime"], $row["crand"]);

// From now on, we can use references like `$id` or `$store_record`;
// never should we need to actually _decode_ the ID!
$store_record = new StoreRecord($id, $row);

// Need to access the record's creation time for some reason?
// Well, just do it like this:
$creation_time = $row["ctime"]; // GOOD!
OR
$creation_time = $store_record->ctime(); // GOOD!

// But DO NOT do it like this:
// $creation_time = substr($id, 0, $whatever_date_length); // BAD -> Assumption of internal rep being string. <- BAD
// $creation_time = $id->ctime; // BAD -> Assumption of internal rep being class/object. <- BAD
// $creation_time = $id["$ctime"]; // BAD -> Assumption of internal rep being hash table. <- BAD

// This is kinda OK, but unnecessary. Use one of the earlier recommended forms instead.
$creation_time = RecordId::ctime($id); // meh, it's medio-OKish I guess, kinda

When we need to UPDATE or INSERT a record, well, ctime and crand might already be populated on that record object (because we got it from somewhere), but if it doesn't, we encode or generate the ID in dedicated functions that are called just before we do any SQL.

Example:

$sql_query = 'UPDATE store SET [...] WHERE store.ctime = ? AND store.crand = ?';
$ctime = RecordId::ctime($store->id);
$crand = RecordId::crand($store->id);
mysqli_stmt_bind_param($stmt, 'si', $ctime, $crand);
// ... code to execute $sql_query ...

If we ever did care to optimize the way $id is stored, then we would TRY different implementations and benchmark/profile them!

The problem is, IF it's difficult to switch implementations, then such benchmarks become major undertakings that are a multiple of the time it takes to rewrite a bunch of the codebase. "Oh hell no" is the usual approach, which is why we can't have nice things, and why corporate reps for ISVs tend to spin like jet turbines, telling us that we WANT their code to be slow! No, it's not because their code being slow is beneficial to us. It's because they can't fix it. It's because someone didn't properly encapsulate their program state, and now they are stuck.

We can do better.

We do better by making it so that changing our internal representation for $id only requires us to modify our code in a few places: RecordId::from, RecordId::ctime, and RecordId::crand. And maybe the logic for generating IDs, too. But that's it.

Then, if we ever did need to optimize this, it becomes possible to swap out $id representations over and over, running some controlled test scenario on each one. That would tell us which one is best/fastest, and (again) we could easily switch over to that optimal one.

This is just one small instance of these notions of "encapsulation" and "internal representation". It's a good idea to use these concepts in many many places throughout the codebase, not just for this one thing.

Concerns besides speed and encapsulation

Comparison:

Option A and Option C might make it easier to implement comparison and equality operations, because those representations are builtin PHP types that already have those operations (ex: ===).
As for Option B, PHP classes use the equality (== and ===) operators as identity operators, which is different from equality. So we'd have to write a few lines of code to compare the members. No big deal really, just not as trivially convenient.
I don't expect to be comparing these $id values very much, though I could see it happening if some function is handed two records of the same type and wants to know if they're the same database record (even if they are different PHP objects).
Don't forget to make and use a RecordId::equal(a,b) static method if we ever need such things. Just raw-dogging === on two $id instances might "work", but it breaks encapsulation and then we're back to being unable to (easily) change our own code.

Robustness:

Option B might have an advantage here: because PHP classes don't have as many basic operations defined as other builtin types, it becomes more apparent when we mess up and try to do a raw operation on an $id instance.
Though it's not perfect, because, as above, it IS "valid" PHP to test two class objects for "equality", but it will do something different than what we probably intend in this case.
- That's GOOD because our code might fail early during testing and make us fix it.
- But it's BAD because it's an unintuitive and frustrating way to get feedback about our code being incorrect.
PHP doesn't give us a lot of ways to really make our abstractions leak-proof and our internal representation safe from accidental tampering.
- As much as I hate this mentality when trying to write reliable code, we'll just have to "git gud" and be disciplined, like surgeons practicing sterile technique.

Encapsulation is still the ultimate concern, at least in this one scenario, and in this one train of thought. Even the "robustness" concerns are just about providing stronger encapsulation. And the ease of implementing comparison operations is almost not even worth mentioning (but it's nice to be thorough).

chadjoan/speculations-about-memory-allocation-in-php.md