-
-
Save mewmew/18188e49df6926cf001b381a7927032d to your computer and use it in GitHub Desktop.
The CASC (Content Addressable Storage Container) Filesystem
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--------------------------------------------------------------------- | |
| The CASC (Content Addressable Storage Container) Filesystem | | |
| Warlords of Draenor Alpha, Build 6.0.1.18125 | | |
| Written April 14th, 2014 by Caali | | |
| Version 1.2 | | |
--------------------------------------------------------------------- | |
Distribution and reproduction of this specification are allowed without | |
limitation, as long as it is not altered. Quotation in other works is | |
freely allowed, as long as the source and author of the quote are stated. | |
0) Introduction | |
CASC is the new file format currently used by Blizzard for the | |
Heroes of the Storm and WoW: Warlords of Draenor Alphas. | |
It is - according to them - more versatile and less bloated than | |
the previously used MPQ format. Also, as of now, it entirely | |
lacks encryption features and supports only one file compression | |
algorithm. File integrity of important files that must not be | |
modified (such as map files) is guaranteed by a file in the root | |
folder called 'signaturefile'. It seems that there is no longer | |
a way to determine the name of all files contained - unlike MPQs, | |
CASC files do not seem to contain a '(listfile)' containing all | |
file names in plain text. | |
There are still several unknown values in my analysis. Using what | |
I've found, one should be able to locate and extract any fully | |
downloaded file from the archives. | |
My research does not cover on-demand downloading or partially | |
downloaded files in any way - these might bring sense into some of | |
the unknowns. | |
Also, only files in 'data/data/' are covered. I have not yet figured | |
out what the files in 'data/config/' and 'data/indices/' are used | |
for - extracting game files is totally possible without them. | |
I appreciate any feedback on my analysis and hope that the remaining | |
unknowns can be figured out by someone based on what I have found. | |
0.1) Terms used | |
- [LE]: Given data is stored using little endianess | |
- [BE]: Given data is stored using big endianess | |
0.2) BlizzHash | |
Besides MD5, a version of Bob Jenkins' hash (from now on called BlizzHash) | |
is used (confer http://burtleburtle.net/bob/c/lookup3.c). | |
The hash value consists of two 4 byte parts A and B; | |
sometimes both, sometimes only one of them is used. Data is hashed | |
in 12 byte long blocks - but remaining bytes are considered as well. | |
The actual implementation can be seen in 'BlizzHash.cpp'. | |
!! Please note that HashA and HashB need to be initialized to 0 !! | |
!! for each hash you want to generate !! | |
---------------------------------------------------------------------- | |
1) data.XXX Files | |
1.1) Summary | |
All data.XXX files (from now on called 'data files') | |
are simple file containers. They contain bare files | |
only, no indexing information. All contained files are | |
stored as a whole sequential binary block with a short | |
header (30 bytes). Due to technical limitations (confer | |
the IDX file addressing scheme), no data file can be larger | |
than 2^30 = 1,073,741,824 bytes. However, there can be | |
up to 2^10 = 1,024 data files resulting in a total addressable | |
amount of 1TiB of compressed data (including overhead). | |
1.2) File format | |
1.2.1) Global format | |
[Block] [Block] [Block] .... [Block] | |
1.2.2) [Block] format | |
16 Byte - MD5 Checksum (not sure of what, probably the header?) | |
4 Byte - Size of entire block (including above checksum) in bytes [LE] | |
10 Byte - Unknown | |
[Data] - Length: Size of entire block minus 30 bytes | |
---------------------------------------------------------------------- | |
2) BLTE files | |
2.1) Summary | |
BLTE files are (as of now) the only files contained within data files. | |
They contain the actual game files - optionally split to chunks, each | |
of which can be compressed (and probably downloaded) individually. | |
2.2) File format | |
2.2.1) Global format | |
4 Byte - 'BLTE' magic | |
4 Byte - CompressedDataOffset in bytes [BE] | |
if(CompressedDataOffset > 0) | |
{ | |
2 Byte - Unknown, [BE] probably | |
2 Byte - Number of chunks [BE] | |
for(Number of chunks) | |
{ | |
4 Byte - Compressed chunk size in bytes [BE] | |
4 Byte - Decompressed chunk size in bytes [BE] | |
16 Byte - Chunk MD5 Checksum (of compressed chunk data I think) | |
} | |
for(Number of chunks) | |
{ | |
[CompressedDataBlock] | |
} | |
} | |
else | |
{ | |
[CompressedDataBlock] | |
} | |
2.2.2) [CompressedDataBlock] format | |
1 Byte - Compression type of [Data] (Currently using - 'N': No compression, 'Z': Deflate (ZLib)) | |
[Data] - Length: Compressed size minus 1 byte | |
---------------------------------------------------------------------- | |
3) *.idx Files | |
3.1) Summary | |
IDX files associate the first 9 bytes of a MD5 'File Key' (more on that later) | |
with the respective data file number (data.XXX) and the offset in | |
bytes within said data file pointing to the beginning of the [Block] | |
containing the file. The filename appears to consist of the hexadecimal | |
representation of a byte and 4 bytes [LE] glued together: | |
1 Byte - File number (there are always files 0x00 .. 0x0F) | |
4 Byte - Version number | |
It appears that old versions are kept during an update to be able to | |
recover data in case of data corruption. It is sufficient to load only | |
the IDX file with the highest version number of a given file number, | |
however one version of all file numbers must be loaded. | |
3.2) File format | |
3.2.1) Global format | |
Header 1: | |
4 Byte - Header2 Length (in bytes) [LE] | |
4 Byte - Header2 BlizzHash checksum (only A) [LE] | |
[PADDING] - 0x00 up to position (8 + Header2 Length + 0x0F) & 0xFFFFFFF0 | |
Header 2: | |
4 Byte - Data Length (in bytes) [LE] | |
4 Byte - Data BlizzHash checksum (only A) [LE] | |
(Data is hashed in 18 byte blocks!) | |
Data: | |
for(Data Length / 18) | |
{ | |
18 Byte - [Block] | |
} | |
[PADDING] - 0x00 up to position (Data Length + 0x0FFF) & 0xFFFFF000 | |
[Unknown] | |
3.2.2) [Block] format | |
9 Byte - First 9 bytes of File Key | |
1 Byte - high byte of indexing information | |
4 Byte - low bytes of indexing information [BE] | |
4 Byte - File size in bytes [LE] | |
The indexing information is calculated as follows: | |
Data file number = (high byte << 2) | ((low bytes & 0xC0000000) >> 30) | |
Data file offset = (low bytes & 0x3FFFFFFF) | |
---------------------------------------------------------------------- | |
4) Special Files contained in the data files | |
4.1) Summary | |
Within the data files, there are 4 special files that are used for filesystem | |
administration and are no game files. Subsequently, they are not referenced | |
by the IDX files, but instead (through their data file MD5 checksum) by the | |
build config files (more on that later). | |
4.2) The "encoding" file | |
4.2.1) Summary | |
Given the MD5 hash of the _content_ (hence the name CASC) of a file, it is | |
used to determine the File Key, which can then be used to find the actual | |
file in the data files using the IDX files. While the File Key seems to be | |
a MD5 hash as well, I have not yet found out of what data. For the sole | |
purpose of extracting data, this is irrelevant, though. | |
Once extracted from the data file and the BLTE container, its format | |
is as follows: | |
4.2.2) File format | |
Header: | |
2 Byte - Locale (?) 'EN' | |
1 Byte - Unknown | |
1 Byte - Unknown | |
1 Byte - Unknown | |
2 Byte - Unknown, [BE] probably | |
2 Byte - Unknown, [BE] probably | |
4 Byte - Hash Table Size [BE] | |
4 Byte - Unknown, [BE] probably | |
1 Byte - Unknown | |
4 Byte - Hash Table Offset in bytes [BE] | |
Unknown: | |
Set of plain text, zero-terminated ASCII strings up until | |
[Hash Table Offset] bytes from end of header | |
Hash Table Checksum Block: | |
for(Hash Table Size) | |
{ | |
16 Byte - File content MD5 of the first entry in the hash block below | |
16 Byte - MD5 Checksum (probably of the hash block below?) | |
} | |
Hash Table Blocks: | |
for(Hash Table Size) | |
{ | |
while( 2 Byte != 0 ) | |
{ | |
4 Byte - File size in bytes [BE] | |
16 Byte - File content MD5 | |
16 Byte - File key | |
} | |
28 Byte - 0x00 Padding after last entry | |
} | |
4.3) The "root" file | |
4.3.1) Summary | |
It associates a BlizzHash of a game file's full name with the MD5 checksum | |
of its content - which can then be used to obtain the file key using the | |
encoding file, and so on. It solely consists of multiple Data Blocks until | |
its end. Once extracted from the data file and the BLTE container, its format | |
is as follows: | |
4.3.2) File format | |
4.3.2.1) Global format | |
[Data Block] [Data Block] [Data Block] .... [Data Block] | |
4.3.2.2) [Data Block] format | |
Header: | |
4 Byte - Number of Root Entries [LE] | |
4 Byte - Unknown, [LE] probably | |
4 Byte - Unknown, [LE] probably | |
Unknown: | |
for(Number of Root Entries) | |
{ | |
4 Byte - Unknown, [LE] probably | |
} | |
Data: | |
for(Number of Root Entries) | |
{ | |
16 Byte - File content MD5 | |
4 Byte - File full name's BlizzHash (B) [LE] | |
4 Byte - File full name's BlizzHash (A) [LE] | |
} | |
4.4) The "download" file | |
4.4.1) Summary | |
This is one of the four special files in the data files. | |
4.5) The "install" file | |
4.5.1) Summary | |
This is one of the four special files in the data files. | |
---------------------------------------------------------------------- | |
5) The build configuration files | |
5.1) Summary | |
Those files reside in 'data/config/'. Their filename is just a MD5 | |
Hash, though I don't know yet of what data. | |
They are managed a folder structure following the first 2 bytes of | |
their MD5 Hash, for example a build configuration file called | |
'806f4fd265de05a9b328310fcc42eed0' can be found in: | |
data/config/ | |
| 80/ | |
| | 6f/ | |
| | | 806f4fd265de05a9b328310fcc42eed0 | |
Those files are plain-text files with a format similar to ini files | |
containing information about a client build. They follow a scheme | |
like 'varname = varvalue'; '#' is used for comments. Currently, | |
the following variables are specified: | |
5.2) Specified variables | |
Name Information Example | |
build-name The build's name 'WOW-18125patch6.0.1' | |
build-playbuild-installer The installer to use 'ngdptool_casc2' | |
build-product The product name 'WoW' | |
build-uid The build's UID 'wow_beta' | |
download The MD5 hash of the 'download' file | |
encoding The MD5 hash of the 'encoding' file | |
install The MD5 hash of the 'install' file | |
root The MD5 hash of the 'root' file | |
Please note that there can be more than one MD5 hash given for the 4 | |
special files. In that case, the last one seems to specify the one to use. | |
---------------------------------------------------------------------- | |
6) The CDN (Content Distribution Network) configuration file | |
6.1) Summary | |
This file resides in 'data/config/' and follows the same filename and | |
folder structure scheme of build configuration files described in 5). | |
They also share the same file format. It contains references to all | |
available build configuration files, archive groups, archives and patch | |
archives located in 'data/indices/'. | |
6.2) Specified variables | |
Name Information | |
archive-group The archive group file in 'data/indices/' | |
archives All archive files in 'data/indices/', separated by a space | |
builds All build configuration files in 'data/config/', separated by a space | |
patch-archives All patch archive files in 'data/indices/', separated by a space | |
---------------------------------------------------------------------- | |
7) Procedure when reading from the CASC filesystem | |
7.1) Initialization | |
- Load CDN configuration file and find build configuration file to use | |
- Load build configuration file | |
- Load IDX files and store the data in an associative container: | |
(First 9 bytes of File Key -> Indexing information) | |
- Locate and extract encoding file from the data files using the MD5 found in build configuration file | |
- Load encoding file and store the data in an associative container: | |
(File content MD5 -> File Key) | |
- Locate and extract root file from the data files using the MD5 found in build configuration file | |
- Load root file and store the data in an associative container: | |
(BlizzHash of full file name -> File content MD5) | |
!! Note that there can be multiple root entries !! | |
!! for a given BlizzHash, but only one is valid !! | |
!! (not sure how to determine this yet) !! | |
7.2) Locating and reading a file | |
- Make filename all-uppercase and convert '/' to '\' | |
- Generate BlizzHash of filename | |
- Find correct (or just try all you've found..) root entry for BlizzHash | |
- Using the file content's MD5 from the root entry, find File Key from encoding file | |
- Using the File Key, find Indexing information from IDX files | |
- Using the Indexing information, locate and extract the BLTE container of the file | |
- Extract file from above mentioned BLTE container | |
---------------------------------------------------------------------- | |
8) Acknowledgements | |
- TOM_RUS and justMaku for their research on the data.XXX and BLTE file formats | |
(https://github.com/tomrus88/BLTEExtractor) | |
(https://github.com/justMaku/blte) | |
---------------------------------------------------------------------- | |
9) Changelog | |
v1.0 | |
- Initial release | |
v1.1 | |
(thanks to TOM_RUS for his feedback) | |
- FileTable renamed to "encoding" file | |
- Manifest File renamed to "root" file | |
- Noted that BlizzHash is actually Bob Jenkins' hash | |
- Added information about the "download" and "install" files | |
v1.2 | |
- Analysis of the 'data/config/' folder |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment