Scream Tracker 1/2 module/song format documentation

version 10.3 / 2025-05-30

this is my attempt at documenting as much information as I can about the music format of Scream Tracker (from now on referred to as ST) versions 2 and 1.

note that the ST version numbers do not conform to SemVer, but are instead treated as decimal numbers with a maximum of 2 decimal places. in other words, 2.1 is the same as 2.10, but not 2.01.

some info is not yet complete.

the references I used were:

ST 2.3 source code (private)
- file.c
- st.h
OpenMPT source code
- Load_stm.cpp
- Snd_fx.cpp
the music file in Future Crew's Slideshow I

ST1 was never released or preserved, and the only currently known publicly available v1 song is the soundtrack of Future Crew's Slideshow I. most programs that support STM/STS only support v2 (including OpenMPT!), with the exception of libxmp, which seems to support v1.10, though I don't know how accurate it is.

everything about ST1 in this document is based entirely on:

the load function in ST2.3, which can read v1 files.
the seemingly unused usave function in ST2.3, which writes v1.10 files.
the music file in Slideshow I, which is a v1.10 file.

so, take the v1 stuff with a grain of salt.

if you have any suggestions or spot any mistakes, please let me know.

-cs127

https://cs127.github.io

file structure

module/song

main header

offset	type	content
0x0000	`char[20]`	song name (not necessarily null-terminated)
0x0014	`char[8]`	program ID (not necessarily null-terminated)
0x001C	`char`	end of ID (usually 0x1A, rarely 0x02)
0x001D	`u8`	file type (1 = "song" (external samples), 2 = "module" (internal samples)) (v2.xx only, see below)
0x001E	`u8`	major version number (1 or 2)
0x001F	`u8`	minor version number (in decimal, not BCD/HCD)
0x0020		rest of the header, different between v1 and v2

v1 files are always "songs" (external samples), not modules.
modules (type 2) typically have an .stm file extension, while songs (type 1) have .sts.
the possible version numbers (obtained from combining the two bytes at 0x001E at 0x001F) are:
- 2.21 (also used for 2.24, 2.3, and later revisions of 2.2)
- 2.20
- 2.10
- 2.00
- 1.10
- 1.00
the program ID varies depending on what program was used to last save the file. common IDs are:
- !Scream!: Scream Tracker 2 (or possibly a different program that pretends to be ST2)
- PCSTV1.0: Scream Tracker 1 (including 1.10, despite the name)
- BMOD2STM: BMOD2STM (MOD to STM converter)
- WUZAMOD!: WUZAMOD (MOD to STM converter)
- SWavePro: SoundWave Pro (???)
ST1 songs seem to always have their song name set to Noname, followed by 14 whitespaces.

v2 module/song

offset	type	content
0x0020	`u8`	initial "tempo" (compound, hex in v2.21, decimal in older, see tempo)
0x0021	`u8`	number of patterns (ST2 limit is 64, practical limit is 98)
0x0022	`u8`	global volume (max 64) (unused before v2.20)
0x0023	`u8[13]`	(unused)
0x0030	`smphdr[31]`	sample headers for samples 1..31
0x0410	`u8[]`	order list (99 or 98 = end of song / empty, see below) (64 bytes in v2.00, 128 in newer)
^+n	`pattern[]`	patterns

if the file is a "module" (type 2), the data for the samples is stored after the last pattern, but they may not necessarily be contiguous, or immediately after the last pattern ends. the actual positions are determined by the position field in the corresponding sample headers.
the value 99 is used in the order list for empty orders, while either 99 or 98 can be used to mark the end of a song.
- 99 causes a jump to the beginning of the song (regular loop).
- 98 causes a jump to the order that the playback was started from (according to viiri's st2play at least).
ST2 writes the letter X (value 0x58) for the unused bytes (including the global volume byte before v2.20).

v1 song

offset	type	content
0x0020	`u16`	number of samples (last sample is ignored) (practically limited to 32 (31 + 1 ignored))
0x0022	`u16`	number of orders (ST2 limit is 128)
0x0024	`u16`	number of patterns (ST2 limit is 64, practical limit is 254)
0x0026	`u16`	audio output sample rate (ignored in ST2, probably unintentionally)
0x0028	`u8`	initial "tempo" (simply just tpr in v1.00, compound decimal in newer, see tempo)
0x0029	`u8`	number of channels (ST1/2 only supports 4)
0x002A	`u16`	number of rows per pattern (ST1/2 only supports 64)
0x002C	`u8[2]`	(unused)
0x002E	`u16`	size of custom data (in bytes)
0x0030	`[]`	custom data
^+n	`smphdr[]`	sample headers starting from sample 1 (last sample is ignored)
^+n	`u8[]`	orders (255 = end of song / empty, 254 = unknown, see below) (length may be limited, see below)
^+n	`pattern[]`	patterns

ST2 assumes there are no more than 128 orders, even if the "number of orders" field is larger than 128.
- additionally, a commented-out line of code suggests that in 1.00, the limit is 257 rather than 128. but since the line is commented out, you probably should not worry about this, as ST2 doesn't either.
ST2 converts order value 255 to 99, which makes sense, but it also converts 254 to 100, which is meaningless (ST2 simply just plays it as pattern 36, since pattern indices are effectively reduced modulo 64 due to memory segmentation).
ST1 writes the letter X (value 0x58) for both unused bytes at 0x002C.
ST1 does not write custom data, i.e. the "size of custom data" is 0.

sample header

offset	type	content
s+0x0000	`char[13]`	sample filename (null-terminated) (often used as song comments)
s+0x000D	`u8`	disk number (seemingly unused in v1)
s+0x000E	`u16`	data position (in paragraphs) (see below)
s+0x0010	`u16`	sample length (in sampling points / bytes)
s+0x0012	`u16`	loop start (in sampling points / bytes)
s+0x0014	`u16`	loop end (in sampling points / bytes)
s+0x0016	`u8`	default note volume (max 64)
s+0x0017	`u8`	(unused)
s+0x0018	`u16`	middle C sample rate
s+0x001A	`u8[4]`	(unused)
s+0x001E	`u16`	(unknown)

the sample is only looped if the loop end is greater than loop start, and not 0xFFFF.
for each of the unused bytes (1 byte at 0x0017 and 4 bytes at 0x001A):
- ST2 writes 0x00.
- ST1 writes the letter X (value 0x58).
in some documents, the u16 at the end is said to be the length of the sample in paragraphs. I'm not sure how true that is.

to get the sample data:

if the file is a "module" (type 2), the sample data is stored in the module file, as signed 8-bit PCM, and the data position field (at 0x000E) stores the in-file position of the data.
if the file is a "song" (type 1), the data for each sample is attempted to be read from disk, as unsigned 8-bit PCM, using the sample's filename (at 0x0000).
- in this case, the data position field does not hold a meaningful value.
- the way ST1/2 loads external samples is outside the scope of this document for now.
when a song/module is loaded in ST1/2, the sample data is stored in memory, as signed 8-bit PCM, and the data position field for each sample header in memory holds the in-memory position of the data.

in any case, the data position is in terms of paragraphs. to convert it to bytes, it should be multiplied by 16.

pattern

each pattern is an array of r rows, and every row is an array of c events, with:

r being the number of rows per pattern (u16 at 0x002A in v1, always 64 in v2),
c being the number of channels (u8 at 0x0029 in v1, always 4 in v2).

pattern data supports a very simple form of compression (explained later).

general structure

when uncompressed, a pattern event is comprised of 4 bytes.

the general structure of a pattern event in a v2 song/module is as follows:

--------------------------------------------
BYTE        0        1        2        3
--------------------------------------------
BIT      76543210 76543210 76543210 76543210
--------------------------------------------
         oooonnnn iiiiivvv VVVVeeee pppppppp

with:

oooo being the note octave number (0..4, middle C is C2) (see below for special notes)
nnnn being the note pitch (0..11 = C..B) (see below for special notes)
iiiii being the instrument number (0 = blank cell, 1..31 = samples 1..31)
VVVVvvv being the volume cell (0..64 = 00-64, 65 = empty)
eeee being the effect type (0 = empty, 1..15 = A..O)
pppppppp being the effect parameter (0..255 = 00..FF)

the entire first byte (oooonnnn) can also have the values 254 or 255, in which case:

255 means blank cell (...)
254 means note cut (^^^, or -0- as displayed in ST2)

see effect list for more information regarding the effects.

as for v1, there are the following differences:

v1.10:

the volume values 0 and 65 have swapped meaning (0 = empty, 65 = 00).

v1.00:

the octave range is 3 octaves, 0..2, equivalent to 1..3 for newer versions.
there is no volume column. effect D is "set note volume" instead of "slide note volume". the structure of the last three bytes is as follows:

-----------------------------------
BYTE        1        2        3    
-----------------------------------
BIT      76543210 76543210 76543210
-----------------------------------
         000iiiii 0000eeee pppppppp

compression

when stored in a song/module, events are generally uncompressed, with their 4 bytes written like described above. however, the following three specific types of events can be encoded using one byte in the range 251..253:

252 means an event where all cells are empty.
253 means an event where the note is note cut and the other cells are empty.
251 means an event which, when uncompressed, yields zeroes for all 4 bytes.
- this would result in:
  - note C0 (C1 if converting from v1.00 to v1.10+)
  - blank instrument number
  - volume 00, or empty volume in v1
  - empty effect

in practice, the way you read an event from a file is as follows:

read one byte.
if the byte is 255, 254, or a valid octave-pitch pair, the event is stored uncompressed.
- in this case, the byte you just read is the first byte of the event, and you can proceed to read the remaining three bytes.
if the byte is 251, 252, or 253, this byte alone is the event, compressed in the format described above.

a more detailed pseudocode example for reading events is provided further below.

effect list

the effect list is somewhat similar to S3M, but only up to (and including) effect J.

note that not all of these effects may have existed in v1. there is no way to know for sure, since ST2 does not do any effect conversions except for D in v1.00 songs.

the major difference is that there is no effect memory. i.e., if an effect seems like it will do nothing with a parameter of zero, it will in fact do nothing, rather than recalling the previous nonzero parameter.

the minor differences in comparison to S3M are clarified in the table.

value	letter	description	notes
0	`.`	empty	-
1	`A`	set tempo	see tempo.
2	`B`	override next order	unlike S3M, this alone does not cause a jump.
3	`C`	break to next order	unlike S3M, the parameter is ignored.
4	`D`	v1.10+: slide note volume, v1.00: set note volume	unlike S3M, there are no fine volume slides.
5	`E`	slide note pitch down	unlike S3M, there are no fine pitch slides.
6	`F`	slide note pitch up	unlike S3M, there are no fine pitch slides.
7	`G`	portamento	-
8	`H`	vibrato	the depth is double that of S3M.
9	`I`	tremor	0 for both digits will cause a very fast tremor.
10	`J`	arpeggio	(TODO)
11	`K`	(no-op)	-
12	`L`	(no-op)	-
13	`M`	(no-op)	-
14	`N`	(no-op)	-
15	`O`	(no-op)	-

pseudocode examples

here are C-like pseudocode examples for reading and writing pattern events.

the examples are meant to be simple, so they don't have any range checking for values.

both examples assume the program's features are on par with v2.21. so, when reading older versions, values are converted to behave correctly in v2.21, and when writing older versions, values are converted to behave correctly in those versions.

read:

#define NOTE_EMPTY 255
#define NOTE_CUT   254
#define VOL_EMPTY  65
#define EFF_EMPTY  0

byte0 = read();

// check for compression

if (byte0 == 251)
{
	// all zero bytes
	
	byte0 = byte1 = byte2 = byte3 = 0x00;
	// we won't read any more bytes,
	// but we'll process the event as if we did read them
	// and they were all 0.
}
else if (byte0 == 252)
{
	// completely empty

	event->note = NOTE_EMPTY;
	event->ins = 0;
	event->vol = VOL_EMPTY;
	event->eff = EFF_EMPTY;
	event->param = 0x00;
	// event done, move on to next event.
	continue;
}
else if (byte0 == 253)
{
	// empty with note cut

	event->note = NOTE_CUT;
	event->ins = 0;
	event->vol = VOL_EMPTY;
	event->eff = EFF_EMPTY;
	event->param = 0x00;
	// event done, move on to next event.
	continue;
}

// we're still here if the first byte was not 252 or 253.
// proceed with reading the remaining bytes,
// unless the first byte was 251, because in that case,
// we're acting as if we already read all 4 bytes and they were all 0.

if (byte0 != 251)
{
	byte1 = read();
	byte2 = read();
	byte3 = read();
}

// get note

if (byte0 != NOTE_EMPTY && byte0 != NOTE_CUT)
{
	octave = byte0 >> 4;
	pitch = byte0 & 0x0F;
	
	// compensate for smaller octave range in v1.00
	if (version == V1_00) octave++; 

	event->note = octave * 12 + pitch;
}
else
{
	event->note = byte0;
}

// get everything else

if (version > V1_00)
{
	// v1.10+

	event->ins = byte1 >> 3;
	event->vol = (byte1 & 0x07) | ((byte2 >> 4) << 3);
	event->eff = byte2 & 0x0F;
	event->param = byte3;
	
	if (version <= V1_10)
	{
		// v1.10: volume 65 (empty) and 0 are swapped
		if (event->vol == 65) event->vol = 0;
		else if (event->vol == 0) event->vol = 65;
	}
}
else
{
	// v1.00: no volume column, effect D is set note volume.

	event->ins = byte1 & 0x1F;
	event->vol = VOL_EMPTY;
	event->eff = byte2 & 0x0F;
	event->param = byte3;
	
	if (event->eff == 4) // D
	{
		// convert to volume column
		event->vol = event->param;
		event->eff = EFF_EMPTY;
		event->param = 0x00;
	}
}

// convert tempo command depending on version
// see tempo for more information

if (event->eff == 1) // A
{
	if (version >= V2_21)
	{
		// v2.21+: compound hex
	}
	else if (version >= V1_10)
	{
		// v1.10..v2.20: compound decimal
		tpr = event->param / 10;
		factor = event->param % 10;
		event->param = (tpr << 4) | factor;
	}
	else
	{
		// v1.00: simple tpr
		event->param <<= 4;
	}
}

// event done, move on to next event.

write:

// copy the event into temporary variables
// (to not modify the actual event itself during conversion)

note = event->note;
ins = event->ins;
vol = event->vol;
eff = event->eff;
param = event->param;

// check if compression to 252 or 253 is possible.

if (ins == 0 && vol == VOL_EMPTY && eff == EFF_EMPTY && param == 0x00)
{
	if (note == NOTE_EMPTY)
	{
		// completely empty
	
		write(252);
		// event done, move on to next event.
		continue;
	}
	else if (note == NOTE_CUT)
	{
		// empty with note cut
		
		write(253);
		// event done, move on to next event.
		continue;
	}
}

// we're still here if the event couldn't be compressed into 252 or 253.
// proceed with setting up the byte values.

// set up note

if (note == NOTE_EMPTY || note == NOTE_CUT)
{
	byte0 = note;
}
else
{
	octave = note / 12;
	pitch = note % 12;
	
	// compensate for smaller octave range in v1.00
	if (version == V1_00) octave--;
	
	byte0 = (octave << 4) | pitch;
}

// convert tempo command depending on version
// see tempo for more information

if (eff == 1) // A
{
	if (version >= V2_21)
	{
		// v2.21+: compound hex
	}
	else if (version >= V1_10)
	{
		// v1.10..v2.20: compound decimal
		tpr = param >> 4;
		factor = param & 0x0F;
		
		if (tpr > 9)
		{
			tpr = 9;
			warn(TEMPO_TPR_CLAMPED);
		}
		if (factor > 9)
		{
			factor = 9;
			warn(TEMPO_FACTOR_CLAMPED);
		}
		
		param = tpr * 10 + factor;
	}
	else
	{
		// v1.00: simple tpr
		if (param & 0x0F) warn(TEMPO_FACTOR_DISCARDED);
		param >>= 4;
	}
}

// set up everything else

if (version > V1_00)
{
	// v1.10+
	
	if (version <= V1_10)
	{
		// v1.10: volume 65 (empty) and 0 are swapped
		if (vol == 65) vol = 0;
		else if (vol == 0) vol = 65;
	}

	byte1 = (vol & 0x07) | (ins << 3);
	byte2 = ((vol >> 3) << 4) | eff;
	byte3 = param;
}
else
{
	// v1.00: no volume column, effect D is set note volume.
	
	byte1 = ins;
	
	if (eff == 4) // D
	{
		// volume slide command.
		// we can't really do much other than discard it,
		// since D means *set* note volume in v1.00.
		
		warn(VOLSLIDE_DISCARDED);
		eff = EFF_EMPTY;
	}
	
	if (vol != VOL_EMPTY)
	{
		// there's a volume command, we have to either:
		// - remove it if it's unnecessary
		// - move it to the effect column if the effect column is empty
		// - remove it if we have no other option
	
		if (ins != 0 && vol == samples[ins - 1]->notevol)
		{
			// volume is the same as sample's default note volume.
			// we can safely discard it.
		}
		else if (eff = EFF_EMPTY)
		{
			// effect column is empty
			// (not checking for no-op commands or zero params,
			//  you probably should though).
			// we can put command D.
			
			eff = 4; // D
			param = vol;
		}
		else
		{
			// we have to sacrifice either the volume or effect.
			// we'll keep the effect in this example.
			
			warn(VOL_DISCARDED);
		}
	}
	
	byte2 = eff;
	byte3 = param;
}

// write the event

// if all bytes are zero, we can compress the event into a 251.

if (byte0 == 0 && byte1 == 0 && byte2 == 0 && byte3 == 0)
{
	write(251);
	// event done, move on to next event.
	continue;
}

// we're still here if we couldn't compress the event into a 251 either.
// in this case, write the 4 bytes normally.

write(byte0);
write(byte1);
write(byte2);
write(byte3);

// event done, move on to next event.

playback/behaviour

panning

ST1/2 does not support any stereo soundcards. some trackers/players may set up default panning for the channels like in MOD, but ST1/2 can only play in mono.

tempo

the following applies to both the initial tempo value in the header (0x0020 in v2, 0x0028 in v1), and the paramater of effect A.

in ST1/2, there is only one byte to control how fast the playback is.

in v1.00, the byte is simply the number of ticks per row (same as ST3's "speed").

in v1.10 and higher, the byte is divided into two digits, in decimal or hex, before and since v2.21 respectively (i.e. 60 (0x3C) in v1.10..v2.20 is equivalent to 0x60 (96) in v2.21).

the upper digit is the number of ticks per row (same as ST3's "speed"),
the lower digit is merely a speed factor, roughly analogous to ST3's "tempo", but somewhat complicated. it's explained below.

from now on, we will refer to the byte (both digits as a whole) in v1.10 and higher as "v1.10+ tempo".

v1.10+ tempo calculation

to convert ST1.10+ tempo into ST3-style speed+tempo:

the ST3 speed is the same as the upper digit of the ST1.10+ tempo.
the ST3 tempo is calculated from both digits of the ST1.10+ tempo, demonstrated using pseudocode below.

the resulting ST3 tempo may not necessarily be an integer. the way this is handled varies between different programs. for example, OpenMPT rounds it to 4 decimal places, while Schism Tracker rounds it to an integer.

st1_tpr; // upper digit (same as ST3 "speed" / ticks per row)
st1_fac; // lower digit

factor_constants [] = {140, 50, 25, 15, 10, 7, 6, 4, 3, 3, 2, 2, 2, 2, 1, 1};

samples_per_tick = floor(st_mixing_rate / (50 - floor((factor_constants[st1_tpr] * st1_fac) / 16)));

if (samples_per_tick <= 0) samples_per_tick += 65536;

st3_tempo = st_mixing_rate * 5 / (samples_per_tick * 2);

note that same tempo factor can cause different results depending on the selected mixing rate (st_mixing_rate in the pseudocode above).

OpenMPT's implementation uses 23863 as the mixing rate, which is the highest possible option in ST2 (unless ST2 was compiled with 486 support, in which case it would be 44100).

misc

(TBD)

cs127/stm.md