Current fastest TAS run (19.96 sec)
Hey everyone, thanks for watching the Sprint TAS! I'm mat1jaczyyy, and as we're smashing the limits of Puyo Puyo Tetris (PPT), it might not be entirely intuitive if these TAS runs are legit or not. How does one achieve all these Perfect Clears (PCs) so fast? How do you get the pieces you need? Why do the PCs instead of doing normal Tetrises? I'll try to explain as much as possible here so you can understand what happens in the video.
In Sprint, the goal is to clear 40 lines as fast as possible. PPT employs a lot of delays - line clear delay, piece entry delay (around 8 frames), etc. Most of the logic that goes into constructing the ideal Sprint is minimizing these delays to minimize time.
The major timesave not only in TAS runs, but also RTA runs is the Perfect Clear. The line clear animation for a Single takes 40f, a Double or Triple 45f, and a Tetris 50f. However, doing a PC doesn't require us to wait for the animation to resolve, rather we can start placing immediately (0f). So, stacking a PC instead can save up to 50f (0.83s) if we clear it with a Tetris. Clearing with lower lines still saves time, because the lower clears take less time to resolve than a Tetris that would be done for the respective 4 lines. Softdropping is also not viable because it is very slow in Sprint.
A special timesave is the final PC. If the last 4 lines are cleared with a PC, then the run ends having used exactly 100 pieces, which is the lowest amount required to finish. Using more is a waste of time, because we have to stack more pieces than required to finish the run. A big annoyance is that the first Hold has a bit of an oversight in the game code. Since the Hold is empty, the game has to spawn in a piece from the queue, which behaves like a normal piece spawning and as such experiences entry delay.
To get all those PCs, we need an extremely specific set of a 100 pieces. This set should also require the least amount of holds, ideally 0. Tetris uses a 7-bag randomizer, which means you always get a different permuted bag of 7 different pieces for your Next queue. This slightly limits the freedom we have in selecting which pieces we get, but is not a major issue for routing.
The bigger issue is that the Random Number Generator (RNG) simply does not work in a way is impossible to manipulate mid-game. RNG is a 32-bit value (unsigned int) whose roll function is RNG(x) = x * 5D588B65 + 269EC3. The game's RNG function loops consistently and all 2^32 seeds are uniformly distributed in one loop. The global RNG is rolled every game frame (which ticks only when the game itself is active, and not any of the menus we get beforehand - due to this you will always start the first game in your session with the same pieces), and a starting RNG value is taken as the game starts.
This value (piece RNG) is stored for our player outside of the global RNG, and used for piece generation. The game uses the Fisher-Yates method for generating a random bag permutation, which rolls our starting RNG 7 times when a new bag is required. The game stores exactly 2 future bags in memory for displaying in the Next queue preview. No other factors than requiring a new bag roll our piece RNG value, so there is no way to manipulate it outside of selecting an initial RNG value to work with. This also means that we can consistently predict as many pieces as we want, because the RNG remains unaffected.
Since our RNG value is 32-bit and not manipulatable mid-game, this means we can choose from 2^32 initial RNG seeds and thus that many sets of 100 pieces to work with. 100 pieces contain 14 full bags and 2 pieces from an incomplete 15th bag. As a full bag has 5040 different permutations, and the incomplete part of the bag 42 different permutations, this means the total number of different 100 piece sets is 2.866*10^53, which is much larger than the 4.295*10^9 sets we have available with the game's RNG seeds.
Routing an imaginary, made up ideal set of pieces is hopeless, because the rough chance of a specific set existing in the game is 1.5*10^-44. Instead, we have to create a sort of "seed finder" that would run through the entirety of the game's RNG seeds in hopes of finding a seed that meets the ideal requirements.
Ideally, we want our run to have as many Tetris PCs as possible, which save 50f compared to a normal Tetris. Right below it on the priority list are Single to Triple PCs, which save 10f. After that we have Double to Double and Triple to Single PCs which save 5f, and then finally normal Tetrises. The main goal is to find an RNG seed that fulfills as many of these requirements as possible.
To facilitate this I arranged for the help of knewjade, who specializes in writing programs that find PCs. If he could list all possible Tetris PC solutions for a given set of 10 (11 with hold) pieces, we could test all existing RNG seeds and 100 piece sets in the game against it, and have the program return all seeds that made it to the end. We can then use the found seed in game, and execute the run by looking back at the solutions and relevant stacks.
While I had no trouble reverse engineering the game's RNG, I am not too involved into the science behind Perfect Clears, so knewjade was the one doing all the work behind finding PC solutions. The goal was to find the following information for all 11-piece permutations with 7-bag (196862400 permutations):
- Is a PC with Tetris end possible to do
- What pieces can we end up holding after the PC
- The fastest movement (minimal count of move/rotate/hold)
However, it's not realistic to calculate individually for each permutation because, assuming one permutation takes 100ms to calculate, it would take 228 days in total to do so. Fortunately, he already had all the 4x10 PC solutions (explanation in Japanese). Out of all the solutions, there are only 117988 solutions with a Tetris ending. Therefore, we just had to check if a given permutation can be found in those solutions, and ends with a Tetris.
First, he calculated the count of move/rotate inputs for all 10-piece permutations that can be stacked without hold. There were 7^9(=40353607) combinations total that this had to be calculated for (the 10th piece is always an I). These were not hard to determine for each PC solution because only Hard Drop is used, so it could be calculated in advance. After that, we order all the possible solutions for a given permutation by the least number of inputs required, while separating solutions that end with different pieces on Hold (it can affect the outcome of the next PC).
Finally, he calculated all 11-piece permutations (where the first piece represents the piece on Hold) from the PC-able 10-piece permutations without hold. We can count how many times inputting a Hold is needed, and which piece will be held last, from the 10-piece permutations.
For example, if JSZLTOI JTI
is PC-able with tetris end (*
is any piece):
XXXXXXXXXXX → JSZLTOI JTI
[]JSZLTOIJTI* (Hold 0)
[]*JSZLTOIJTI (Hold 1)
[]SJ*ZLTOIJTI (Hold 2)
...
[*]JSZLTOIJTI (Hold 0)
[J]*SZLTOIJTI (Hold 1)
[S]J*ZLTOIJTI (Hold 1)
...
This calculation is extremely low cost. In addition, there are only 2206800 PC-able 10-piece permutations (5.5% of all), so it does not take much to calculate the results. After that, we count the inputs into a table while applying *
to one piece at a time, then check if it's a valid permutation with 7-bag and pick the best movement. As a result, the best movement of all the 11-piece permutations with 7-bag is recorded and saved.
The format of the recorded solutions was binary to take the least time to parse. One solution was formatted as PMMMMMMM RRRRHHHH
(2 bytes) where P is set to 1 if the PC is possible, M is the number of movement inputs required, R is the number of rotation inputs required, and H is the number of hold inputs required. For one permutation there are up to 8 different possible solutions (one for each piece we could end up holding after the PC, as well as no piece in Hold). So, one permutation takes 16 bytes of space in the binary. The full binary with all permutations ends up taking 3.15GB of space.
The source code of the program that verifies these solutions and outputs fumens can be found here.
The RNG seed finder program I wrote reads the solutions from our binary file, and then iterates through all of the seeds in the game and checks it against the solutions, making sure to take all paths in cases of multiple pieces ending on Hold while tracking the sum of inputs for each PC. There are 4294967296 RNG seeds in total, but it only takes 15 minutes to reach the end of the search because the calculations are extremely low cost, and most seeds don't make it past the first PC. After the search is completed we order the found seeds by least inputs required, and pick the best one to use in our run. The source code of the program can be found here. You can also find all working seeds here.
Now that we've found the ideal RNG (whose value is 0CC39230), it's time to tell the game to make use of it. To set the initial RNG, we would directly modify it to a desired value in the register ecx right before executing the instruction at address 0x141574208 (imul ecx,ecx,5D588B65) using Cheat Engine's debugger when the game rolls it for the first time while generating the first bag. The same effect would be achieved without hacks by letting the game run until we reached our seed (since the RNG loops uniformly), but this is not too practical since it could take anywhere between 0 and 4294967296 frames of waiting (instant and 54 years).
Now that the game has started with the RNG seed and 100 piece set we found, all that's left is to stack it. Unlike previous TAS runs which used Zetris' input engine to save time in executing the run, this time we use Cheat Engine and have a breakpoint on the instruction which reads the input bitmask from hardware into memory. It is called once per frame so it allows us to easily frame advance, and by modifying the register at that time we can select exactly what inputs we want on any frame. knewjade's solution finder has already sorted the solutions by fastest movement, so I do not need to care much about optimizations, I just have to execute the run. An optimization that cuts the time down though involves hard dropping on the same frame as rotating, as the game processes inputs in the order: Rotation, Hard Drop, Movement. All that's left is to do the run, and save the replay.
Our TAS is useless if it is not able to be easily replayed. Due to our hacky RNG injection, the game still saves the replay with the RNG as it was initially before the injection. This results in the replay playing incorrectly, and topping out within just seconds of playback.
The game stores the replays inside the binary save file (data.bin), where all the other information about your game is stored (such as rating, stats, etc). With some reverse engineering about how the game overwrites RNG from a replay, I was able to find the offset in the replay where the RNG value is stored. However, there are precisely 1973 RNG calls that happen before the piece generation, presumably to shuffle the RNG around a bit without a real purpose. So we must roll our RNG back exactly 1973 times and replace the RNG value in the replay with that one.
Inverting the RNG function is not easy because it relies on major overflows to generate randomness, so I wrote a quick script that iterated through all of the game's RNG and remembered 1973 previous values, printing the earliest one remembered when we hit our injected value.
Opening the file up in a hex editor and modifying this offset to that desired value allows the replay to play correctly on any unmodded game. Note that elsewhere in the save file, a checksum is employed for making sure no unauthorized modifications are done, but the check doesn't seem to exist for replay sections, which made doing this a lot easier.