Skip to content

Instantly share code, notes, and snippets.

@apple1417
Last active August 30, 2022 08:49
Show Gist options
  • Save apple1417/d2a77db2a40e326761764d3b9c2d43cb to your computer and use it in GitHub Desktop.
Save apple1417/d2a77db2a40e326761764d3b9c2d43cb to your computer and use it in GitHub Desktop.
Universal Borderlands Mod File Format

Background

The existing Borderlands mod file formats kind of suck, so let's come up with a new one.

Issues with the BLCMM file format

  • They look like xml, but have completely custom escape logic (see here)
  • Due to the above, very easy to "corrupt" files if you start getting into edge cases
  • Not the most human readable
  • Overly restrictive about what is and isn't allowed to be in a command (to try avoid the above edge cases, though it fails)
  • Does not allow commands other than set
  • Treats exec and say as commands despite storing them in comments
  • Forces commands stored in comments to always be active
  • Duplicates all enabled commands - once in the mod structure, once at the bottom
  • Always saves files in system encoding - i.e. does not use utf8 (on Windows)
  • No built in support for AoDK, BL3, or WL
  • Very little file metadata

Issues with the BL3Hotfix file format

  • Very little structure
  • No built in supported game
  • No built in file metadata
  • No categories
  • No encoding standardization

BLIMP Tags were created to address the lack of mod file metadata, and work with both formats. It could be considered an extension of these file formats, but it's not part an inherent part of them.

Program Types

There are three types of programs which interact with mod files.

Consumers

Consumers are the programs that take a mod files and apply it to the game. These include:

  • BL2/TPS/AoDK themseleves, via exec commands
  • CommandExtensions
  • OpenHotfixLoader
  • BL3HM

All existing consumers parse though a mod file line by line, looking for commands. Commands are usually considered to be a line that starts with particular English words, optionally followed by some seperator character(s) and arguments. The ordering of commands may be important.

Parsers

Parsers are programs which look through mod files for some more in depth data than simply the commands, but do not write back to them. These include:

  • CommandExtensions
  • TextModLoader
  • ModCabinet

Parsers must understand the underlying mod file format, but do not need to parse it perfectly, they can discard data which is irrelevant or too difficult to parse. Parsers also need to be able to tell the difference between the existing file formats, so that they can apply the appropriate logic. All existing parsers do this by looking at the inital bytes of the file.

Editors

Editors are the programs used to create mod files. These include:

  • BLCMM
  • bl3hotfixmod

Strictly speaking, editors do not also need to be parsers, they simply need to be able to output a mod file. Editors must have the highest level understanding of their mod file format, so that they create them in the correct format.

Encoding

Dealing with text encoding properly is complex. Unreal Engine uses utf16-le internally.

exec commands in BL2/TPS/AoDK treat file encodings as follows. This was tested on a machine with default system encoding cp1252, so it's possible all references to that simply check the default instead.

  • FName fields are ascii only, but accept utf16 as long as it's within that range.
  • FString fields accept accept ascii, utf16 and cp1252.
  • UTF8 gets interpreted as cp1252 where possible (80 -> 20AC), otherwise it's zero-padded (9D -> 009D).
  • UTF16 files must contain a BOM, but can be either endianness.

CommandExtensions and TextModLoader use the system default encoding.

OpenHotfixLoader treats all files as utf8, and converts them to utf16-le in memory.

Proposed Requirements

Given the above background, here are a number of proposed requirements for a new universal mod file format. These are not ordered by importance, nor are they all expected to be implemented.

  • Has magic bytes at the start to identify it
  • Contains explicit file encoding
  • Supports UTF-16
  • Based closely/entirely on existing structured data formats, such that a parser (but not necessarily an editor) can always use an existing library
  • Does not get ruined by a formatter for that format
  • Human readable/writeable
  • Minimal formatting required for structuring - minimize extra filesize
  • Expandable for future improvements
  • Has BLCMM's full category system
  • Contains explicit metadata section, which may contain arbitrary values
  • Explicit comment/command seperation, so you can turn a comment into a command and vice versa
  • Allows for arbitrary commands
  • Consumers can parse commands inline
  • Individual commands have an enabled state
  • Supports commands which the editor converts into more complex ones rather than enabling inline (e.g. willow hotfixes, exec importing files)

Proposed YAML based format

Commands

The main incentive behind this format is that YAML allows multiline strings with only leading whitespace. This gives us a very easy way to enabling commands for consumers inline.

- cmd: |-
    set a b c
- cmd: set x y z

This makes three assumptions:

  • Consumers accept leading whitespace - only B3HM does not.
  • None of of the YAML elements will get parsed as commands - since keys can be quoted it narrows it down to assuming commands never start with a - or a '/".
  • Editors are able to specify if to output a single or multiline string.

It also very likely gets ruined by any formatters, striking that requirement out - though they will not corrupt a file, only mess with what's enabled.

Now the above format is not sufficent for parsers. Existing libraries will simply parse both elements into strings. We must therefore add an enabled tag.

- cmd: |-
    set a b c
  'enabled': true
- cmd: set x y z
  'enabled': false

Now while this isn't too much compared to blcmm, it does add a decent amount of required structuring. We can do better. We only need to specify enabled when true - it's absense implies disabled. In most use cases, only entire categories are enabled/disabled at once, which we can very efficently store if we allow appending multiple enabled commands side by side.

- cmd: |-
    set a b c
    set d e f
    set g h i
  'enabled': true
- cmd: set x y z

We cannot allow appending disabled commands however, they need something to escape them so that consumers do not pick them up. YAML does not allow appending adjacent strings, so we can't simply wrap them in quotes. Once option is to convert them into a list of (single line) strings. To avoid using the same key for two different data types, we might rename, and end up with the final format.

- enabled: |-
    set a b c
    set d e f
- disabled:
  - set u v w
  - set x y z

While these are now two different data types, they require minial extra processing by editors and parsers, just a join/split with newlines.

It goes without saying that this format is also very human readable/writable.

File structure

A good number of the proposed requirements can be addressed by any structured data format, after selecting one. For completeness, here's how they'd be handled.

Magic Bytes/Encoding

We can simply require a root level element in ascii to use as the magic bytes.

'blmod':
  - ...

By forcing this to be ascii, we can trivally specify some extra magic bytes to detect wide encodings.

Starting Bytes Encoding
27 62 6c 6d 6f 64 27 Single byte
[FF FE] 27 00 62 00 6c 00 6d 00 6f 00 64 00 27 00 2-byte little endian
[FE FF] 00 27 00 62 00 6c 00 6d 00 6f 00 64 00 27 2-byte big endian
[ff fe 00 00] 27 00 00 00 62 00 00 00 ... 4-byte little endian
[00 00 fe ff] 00 00 00 27 00 00 00 62 ... 4-byte big endian

To tell between different encodings of the same width, we can define an encoding element, and require the document use ascii until it's defined.

'blmod':
  encoding: cp1252

Comments

While YAML supports comments, it's likely that some YAML libraries provide no way of keeping them, let alone accesing their value. There are still some reasonable situations where a parser might want to access them (e.g. coming up with a description if there was no metadata tag), so it's going to be better to require an explicit comment element. We can reuse the same format as disabled commands.

- comment:
  - This is a comment
  - "  This comment has leading whitespace"

Metadata

Once again we can use the same disabled command format to provide an arbitrary text block to fill with BLIMP tags.

'blmod':
  'metadata':
    - "@title My Mod"
    - "@authour apple1417"
    - "@game BL2, TPS, WL"

Alternatively, we could use a mapping. YAML accepts keys with no value (which become null), so this even works for value-less tags.

'blmod':
  'metadata':
    '@title': "My Mod"
    '@authour': "apple1417"
    '@tml-ignore-me':
    '@game': "BL2, TPS, WL"

The @ character was chosen to mark BLIMP tags to make them distinct from surrounding comments. As this is a dedicated metadata section, it's not strictly neccesary. YAML does not allow @ to be a leading character, may want to allow ommitting it, especially if using the earlier array of strings format.

The game a mod file is for is quite important, so it may make more sense to turn it into a dedicated element. This should once again allow arbitrary strings to be future compatible.

'blmod':
  'metadata':
    'title': My Mod
    'authour': apple1417
  'games':
    - BL2
    - WL
    - BL53

Categories

YAML allows arbitray keys, as long as you quote/escape them, so it would be really tempting to define categories as follows.

- enabled: |-
    set a b c
- "My cool category name: but with a colon in it":
  - enabled: |-
      set x y z

However, this restricts future expansion of the file format. Even if we prevent naming categories after the existing keys, if we had to add a new key there's no guarenteee a mod file doesn't already use it somewhere.

Instead, it's going to be better to use an explicit category key. This also makes dealing with locked and/or mutually exclusive categories a lot easier.

- enabled: |-
    set a b c
- category: "My cool category name: but with a colon in it"
  locked: true
  mut: true
  contains:
    - enabled: |-
        set x y z

Not Covered

  • Willow hotfixes (Oak ones are commands)
  • Exact file structure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment