This is the technical specification for the MMH format, which is not yet programmed. You can also see a text-only version, which may be less up-to-date than this HTML version.

MMH - MIDI-MOD Hybrid music format


Both MIDI and MOD have their strengths and weaknesses.

Probably MIDI's main strength is that it is very "raw"... primarily just a series of commands and time intervals. This makes it sound great if you have really high-end equipment to accept those commands, and if the MIDI you have was designed for that hardware. If not, well, you can settle for General MIDI. You might still get something that sounds good, but it will not have any "special effects" or exotic-sounding instruments. At least the file size tends to be very small. In any case, making good MIDIs is no walk in the park and so far I haven't seen a free editor (but then, I haven't looked either.)

As for MODs, S3Ms, XMs, and the like, they are a totally different concept. You store the instrument samples in the file along with a number of "patterns" which can be represented on the screen as a grid: the rows are the channels and the columns are the different spots in time where you can start a note (mind you, the rows and columns may be reversed). It's somewhat limiting, but it lures you with the miscellaneous special effects and what-not. I imagine it would seem pretty bizarre to someone who's done nothing but sheet music.

My entry to the fray will probably seem a bit premature, since I don't have the time or resources to put all the features that you find in some formats, like IT. But it is a good compromise between simplicity and power. Plus, the player is free, as is the editor... for now, anyway. Perhaps the biggest draw for programmers and such is that the documentation on the format is exceptional--not to boast, or anything. I have always been frustrated at how badly other formats were documented, and that is part of the reason I made my own format.

As in a MOD file, the MMH file contains a number of "patterns". But MMH patterns are variable-length, and can be started at any position in the file. Furthermore, the tempo of a pattern can be (but doesn't have to be) set independently of other patterns in the file. There are no "channels". You can have as many notes play at the same time as you want (of course, in the interest of keeping processor usage down, you should restrain yourself.) Also, there are no "divisions" per se like in a MOD file, but there IS a minimum note length: the length and position of a note is specified in terms of 1/64 notes*. A beat, which is displayed in the music editor as a round black circle with a line sticking up on the right side, is defined as a 16/64 note--in other words, a quarter note.

* Actually, the length and position of a note is specified in term of "number of units of time". I have arbitrarily decided to call the basic time unit a sixty-fourth note. Likewise, I have arbitrarily decided to make a quarter note equal to 16 units of time. Throughout all my documents I have called the basic unit of time a 1/64 note. There is one case where you can use half-units of time: when specifying boundary offsets. In this case I have used the term "1/128 note". The default length in microseconds * 10 of this basic unit of time is specified in the header for the MMH file.

What about instruments? Like a MIDI, instrument samples are normally stored outside of the file (in a library). However, like a MOD, you may also put custom samples in the file, giving them a ID number of your choosing.
Effects? I came up with a few basic ones, such as frequency and volume sliding, and several amplitude modifiers (mainly for vibrato effects, but also fading out an instrument.)

There are a few things in the MMH format that actually do not affect how it is played: firstly, you can have lyrics. Secondly, each pattern may have a time signature (only the top number, since the bottom is fixed at four)--but the editor does not force you to keep notes within measure boundaries, so the measures are for cosmetic benefit only. Thirdly, each pattern may have a key signature, but it doesn't affect how the music is played--only how it is displayed. Individual notes are not stored in the file as a letter and a step value (sharp, flat or natural), nor are they stored as a position on a staff; rather, they are stored as a simple number representing the number of half-steps up from A in the lowest octave (27.5 Hz). For example, the number 2 represents A-sharp or B-flat--because they are really the same tone, no distinction is made between them in the file. The editor uses the key signature to help decide how a piece should be displayed--whether a note should be shown as sharp or flat, and where to use accidental symbols.

Here is a conceptual layout for a hypothetical MMH music file:

Patterns---Name-----------------Tempo*---Length-----------------------------
Pattern 0 Basic Percussion      144 bpm   12 beats
Pattern 1 Verse 1               144 bpm   72 beats
Pattern 2 Chorus                144 bpm   36 beats
Pattern 3 Verse 2               144 bpm   72 beats
Pattern 4 End-verse 1 fill      144 bpm   12 beats
Pattern 5 Chorus start drums    144 bpm   24 beats
Pattern 6 End-verse 2 fill      144 bpm   12 beats
Pattern 7 Lyrics                 72 bpm  108 beats 

* Actually, the tempo is stored on the timeline so that individual instances of the pattern can be played at different rates.

-------------------------------Timeline-------------------------------------
[......Pattern 1.......][Pattern 2.][......Pattern 3.......][Pattern 2.]
    [P0][P0][P0][P0][P4][P5]        [P0][P0][P0][P0][P0][P6][P5]
[..............................Pattern 7...............................]
0:00----0:10----0:20----0:30----0:40----0:50----1:00----1:10----1:20----1:30

Please note that the vertical arrangement of the patterns on the timeline is also cosmetic: it would sound the same if the the percussion was above the tune.

And for good measure, here's what the hypothetical pattern 0 might look like:

-------------------------------Timeline---------------------------------------
      [..132...]     [..131...]     [..132...]     [..131..]      [128]    [131]
[128..]     [128.][129.]      [128..]     [128.][129.]         [131] [128]
   [129..]     [130.]            [129..]     [130.]         [131]       [128]
0-----1-----2-----3-----4-----5-----6-----7-----8-----9-----10----11----12----

Where the numbers on the timeline represent percussion instruments stored elsewhere in the file. Of course, there can be more information in a note than just an instrument number, but there wasn't enough room to fit all that on the time line. :)

So, are you interested in the specifics now?

The MMH file consists of five main parts:

  1. The header
  2. The pattern list
  3. The timeline for the piece
  4. The patterns, in order by index (most important part of the file)
  5. The instrument information and instrument samples (either the largest or smallest part of the file, depending on whether you have custom instruments.

Except for the header, which must come first, the parts do not have to be in order. In fact, the MMH Classes I've written put the patterns before the pattern list, since it happens to be more convenient.

Now for the details. Before I go into the five main parts, I must describe the format of a "note". A lot of people will think that a note is a single tone with a pitch, instrument number, volume and length. In the MMH format, a note could be that simple, but it might be even simpler or much more complicated. It's obvious how it could get more complicated: by adding effects or panning. But how could it be simpler, you ask? After all, every note has a pitch, volume, instrument and length, and some have more properties that make them distinct, such as panning and special effects.

That's because, in addition to ordinary notes, you can also put in "null notes" which are not played but can set defaults--a default instrument and volume, for instance--which are used for all notes from that point forward in the pattern. Both kinds of notes are described below. To really take advantage of this feature, you generally need to run several patterns at once, so that you can tweak the sound of the sections of your piece independently--rather like sections of an orchestra, where each section goes into its own pattern.

In order to reduce CPU load, the MMH format has special support for samples of an instrument that contain chords. You can store chord samples of an instrument AND single pitch samples under the same instrument number. Then, you can specify that a certain note is a chord, and give all the pitches in that chord. The MMH player will count the semitones between each pitch in the chord, and look in the instrument table for an instrument with the same semitone spacing.

What if there is no null note in the pattern, yet an audible note is given that lacks vital information? There is a single null note stored in the header that gives global defaults for the entire file.

In order to increase music quality and realism, MMH also supports having multiple samples for each instrument: up to 6, in fact. There are three purposes for this. Firstly, you can store variations that are designed for higher or lower pitches. The player can automatically select the best variation for the piece. Secondly, in the case that multiple variations have the same original pitch, you can set notes to use a random variation. This could be useful in an acoustic guitar sample, for instance, where each strum sounds a bit different. Thirdly, some instruments make more than one distinct sound, or have different lengths of their sound. You can store each one as a variation, and pick the specific one you want within each note. By the way, you can also make variations of chord samples.


The note

The definition of a note is quite broad in the context of MMH: it's any event that happens at a specific time in a pattern. There are audible notes, null notes, "lyric" notes, and "reserved" notes which are designed to allow future extension of the file format.

The first byte of a note is its length, not including the first two bytes.

This is followed by one to four flag bytes. The first one is: (MSB)76543210(LSB)

Lyrics

Lyrics are the simplest type of note, so I will document them first.

Lyrics use the other bits in the first flag byte as follows:

If there is a second flags byte, it has the following fields:

If there is a third flags byte, it has the following fields:

The fourth flag byte is unused.

Just so so understand, you typically have many, many lyric "notes" in a composition if you have any at all. You typically create a separate lyric note for each word that starts on a certain beat so that the words always line up with the audible notes.

After the flag byte(s), for both lyric notes and reserved notes, is a byte that specifies the length of the data that follows. For a lyric note, this data is simply a string. For example, "Hello", would probably have a length of 5. Optionally, the string can be null-terminated. Why would you want a terminator character when a length is already specified? This allows for the possibility of extending the format: extra data could be added after the actual text. If the length byte is 0, there is no data.

Audible and Null notes

Normal notes and null notes use the other bits in the first flag byte as follows:

The second flag byte contains these fields:

The third flag byte contains these fields:

The fourth flag byte is unused.

What comes after this depends on the flags. If all the flags are set, then all of the following things will be found in the note:

  1. Frequency/chord (2 to 16 bytes)
  2. Length (1 byte)
  3. Volume (1 byte)
  4. Instrument (1 byte)
  5. Amplitude vibrato/slide effect (4 bytes)
  6. Panning or panning slide (1 byte)
  7. Boundary offsets (1 byte)
  8. Frequency slide (2 bytes)

Frequency/chord (2 to 16 bytes)
This specifies all the pitches in the note. Each set of two bytes is treated as an unsigned word with the format:
(MSB)5432109876543210(LSB)

Note 1: If these bits are zero, then the MMH player plays the instrument at its original sampling rate. Typically, these bits are set to zero for percussion instruments.
Note 2: The MMH player arranges tones in a chord in order of lowest pitch to highest pitch.
Note 3: The highest octave number allowed is 8, and the highest note is G#-8, which is note number 107 at 13290 Hz.

Length (1 byte)
The whole byte represents the length. If the length is specified as zero, then the length is the exact length of the instrument sample. In other words, the sample is played once through. Otherwise, the length is specified in 1/64 notes. Thus, the maximum length is just lower than four whole notes. Hopefully there ain't many people who want longer notes than that.

Volume (1 byte)
Specifies the volume to use for the note, where 255 is the full original volume of the sample and 0 is muted. This brings up an important point: you should usually sample your instruments with maximum possible loudness, because from there you can only make it quieter.

Instrument (1 byte)
The whole byte represents the instrument number. Numbers below 128 are reserved for instruments in the standard libraries, while you can have custom instruments at or above 128.

Volume slide (2 bytes)
The two bytes are divided into .....

Volume Vibrato (3 bytes)
The first 16 bits specify the magnitude of vibrato to use at each "section" of the note: You can start with a low vibrato and move up, or go up at first and then down again. The effect is linearly interpolated.

If a null note enables vibrato, it can be disabled later in another null note by setting the third byte to zero.

Panning or panning slide (1 byte)
Bits 0-3 specify the initial note panning, while bits 4-7 specify the final panning. If you don't want the panning to slide, make these two values the same. These bits act as signed; -7 sends all sound to the left, 0 is equal volume in both channels; 7 sends all sound data to the right. -8 is an invalid value that is reset to -7 upon loading the file.

If a null note enables panning or a panning slide, it can be disabled later in another null note by setting both pannings to zero.

Boundary offsets (1 byte)
This modifies the position at which the note starts or ends in 1/128 note increments.

Freqency slide: (2 bytes)
This is a difficult effect for the player to do, but I figured someone might need it. The slide occurs at the beginning of the note, and stops when it has reached the pitch specified in the frequency section.
(MSB)5432109876543210(LSB)

When playing a chord, all the pitches of the chord are slid. As if the effect wasn't tricky enough already. :) In this case, the pitch specified here corresponds to the lowest pitch of the set. For instance, if the chord was C and E together, and the start pitch was set to B, then the sliding that would occur is B to C, and E flat to E.

A null note cannot use a frequency slide. Any attempt to do so is ignored.

Linked notes

Now, we've got the basic format covered. Now, if you'll remember way, way back in this section there was a bit that talked about "linked notes". I'm going to define and describe them here.

A linked note allows you to play several samples, exactly one after another. This allows you to time things exactly. For example, you could have a basic guitar plucking sample, then follow it up with a sample of the note being stopped in an audible way (you know what I'm talking about, right?) anyway, I think some of you creative types will be able to think of a use for this.

When a note is linked, it is stored right after the original note in the file. A linked note must be an audible note. Also, you can link another note at the end of the linked note. You can string as many samples as you want together like this.

Please note that the boundary offsets effect does not affect where linked notes start. If you have delayed the end of the original note, the linked note will still start at the same time as if you hadn't put in the delay. The result is that, for a brief moment, both the original and linked notes will be playing at the same time. Conversely, if you have used the effect to make the original note stop early, there will be a small gap between the end of the original note and beginning of the linked note.


1. The header

First Four Bytes: "MMH\0" or the numbers 0x4D, 0x4D, 0x48, 0x00 or 0x00484D4D. This brings up an important point about the numbers in this format: all numbers use the PC format (big endian), which puts the least significant byte first. A mac player, which we might never see :( would have to swap all the appropriate bytes while loading the file. Note: the code in the MMH classes do this automatically.

Next four bytes: specifies the offset in the file of the pattern list.
Next four bytes: specifies the offset in the file of the main timeline.
Next four bytes: specifies the offset in the file of the instruments.

Next comes the "default note". The two flag bytes are not included because they are (conceptually) fixed at the following: first=0x3F second=0x40. In other words, the following settings are found, in order:

  1. Frequency/chord (2 bytes)
  2. Length (1 byte)
  3. Volume (1 byte)
  4. Instrument (1 byte)
  5. Boundary offsets (1 byte)

You may not use a chord for the default note.

Next, the default tempo is specified (2 bytes). This tempo is actually stored as a reciprocol (time per beat instead of beats per time), as described in the main timeline section of this document.
Next, the default number of beats in a measure (the time signature) is given (1 byte).

Finally, there are four null-terminated strings (maximum length: 256 characters). In order, these are:

  1. The song name
  2. The artist name
  3. Copyright notice/terms of use summary
  4. Comment (e.g. Date composed etc.)

2. The pattern list.

This list contains information about all the patterns in the file. The first 16-bit word specifies the number of patterns. For each pattern, the following information is given.

  1. Offset of the pattern data in the file (4 bytes)
  2. Length of the pattern, in beats (2 bytes) (where a beat is a 16/64 note).
  3. Key signature (two bytes). This is stored like so: (MSB)54 32 10 98 76 54 32 10(LSB)
    Each set of two bits (1 and 0, 3 and 2 etc.) represent a tone on the staff: Bits 0,1=A; 3,2=B; 5,4=C; 7,6=D; 9,8=E; 11,10=F, 13,12=G; bits 15,14 are unused and should be zero.
    Each set of two bits specifies a default type for that position on the staff:

    Remember, as described in the introduction, the key signature is cosmetic only; it will not affect how the file is played.

  4. Number of beats in a measure (i.e. the time signature.) (1 Byte.) If this is zero, then the default, stored in the header, is used.
  5. Name of the pattern (fixed size of 33 bytes, NULL-terminated)

3. The main timeline.

This specifies when to play which patterns. The first thing in this section is the number of entries in the timeline (2 bytes). The MMH player can calculate the length of the song based simply on when all the patterns on this list will have finished playing. A "pattern library", by the way, is an MMH file that has patterns but no entries on the timeline.

The entries on the timeline are stored one after another, and do not have to be in chronological order. Here is their format:

  1. Which pattern number to play (two bytes)
  2. The time to start the pattern (four bytes). This is given in 1/64 notes, based on the length of a 1/64 note as stored in the file header. For example, if the header specified 20ms per 1/64 note, and the number here was 1000, then the pattern would start 20 seconds into the piece.
  3. The reciprocol of the tempo, given in 1/100 milliseconds per 1/64 note (two bytes). The number must be at least 400 (or zero), which is 4ms per 1/64 note, or 256ms per whole note, which is very fast and therefore very processor-intensive. On the other hand, you can play the file as slow as you want (up to the biggest number that fits in a word, which works out to about 10.5 seconds per beat--I'll be damned if anyone wants to play a song that slow.) In order to use the default tempo from the header, simply set this to zero.
  4. An extra number (4 bytes). I added this so that the mmh file editor could store the vertical position of the timeline in the file.

4. The patterns

Each pattern has a number of notes in it. So, the first thing that goes in each pattern is a 2-byte note count (this includes all the types of notes.) The note count does not include linked notes; in other words, several notes linked together only count as one. The next thing is two bytes that are reserved for future use, and should be zero.

After that, the pattern simply consists of a list of notes. Before each note is two bytes specifying the delay (in 1/64 notes, of course) between the beginning of the last note and the beginning of the current one. For example, if there was a quarter note followed immediately by another note, the time stored here would be 16. Even the very first note has a delay before it, so you can have some empty space at the beginning of the pattern. Linked notes do not store this delay; the two bytes are simply missing.


5. The instruments.

This section begins with a variable-size table describing each of the instruments. The first byte contains the number of instruments that are in the file. Then, the instruments are listed in order from lowest ID number to highest ID number. Each instrument contains this information:

  1. Instrument number (1 byte)

    Note: Instrument number zero is special. It specifies the default instrument—the instrument that is used when an instrument requested in a note is nonexistant. If there is no default instrument, the player should still generate some kind of default sound. The MMH classes, for instance, generate a pure sine wave.

  2. Flags (1 bytes):
  3. Instrument name (null terminated; maximum size is 256.)
  4. Comment (null terminated; maximum size is 256.)
  5. When an instrument is acting as an alias, one byte here specifies what it is being an alias for. However, if the instrument is not an alias, this byte is ignored.

    Note: If an instrument acting as an alias points to another instrument acting as an alias, the player holds up its hands, says "screw it", and plays the default instrument instead.

  6. This is a count of the total number of samples recorded for this instrument (1 byte). Even aliases may contain samples; however, they are not used.

Finally, a set of information about each sample is given. The info is stored as laid out in the next section:

Sample information format:

  1. The number of values in the sample (4 bytes).
  2. The loop start position (4 bytes)
  3. The loop length (4 bytes)
  4. The original pitch or chord of the sample. (2 to 16 bytes). This is stored in exactly the same format as a pitch/chord in a note. The MMH player arranges tones in a chord in order of lowest pitch to highest pitch. A pitch code must be stored here even if the instrument has no pitch (i.e. bit 1 of the instrument flags byte is set); however, the pitch is unused in playing the instrument.
  5. The original sampling rate of the sample, in Hertz (2 bytes).
  6. Flags (1 byte):
  7. Compression settings (1 byte):

Sound Data

After all the samples are listed, and after all the instruments are listed, the raw sample data is listed, in order from the first sample in the first instrument to the last sample in the last instrument. Each sample is stored like this:

  1. Size of sample, in bytes (4 bytes), NOT including these four bytes. Since the size can easily be calculated using the information in the sample table, this field works like a checksum: If the calculated size doesn't match the actual size listed here, the file is corrupt. When compression type 1 is used, this size must be zero.
  2. The sample data (variable size)

Compression Formats

There are three types of compression currently supported by the MMH format. The first two, though, aren't actually compression methods.

Type 0: PCM

The CTSI for this "compression" type has two possible values:

PCM is raw, signed data.

Type 1: Player-generated data

This is only valid for chords. When this compression type is specified, there is no sound data whatsoever; The 4 bytes that specify the size of the sound data should be zero. The CTSI specifies a variation number of a one-pitch sound from which to generate the chord. If the number stored in the file is invalid, a default of zero is chosen. If the instrument has no single-pitch sounds, the default instrument is used to generate the chord.

When the MMH classes generate the chord, it is equal to or just slightly larger than the single-pitch sound from which it was made. For instance, if you are generating a chord based on a sound that was one second of mono 22kHz data, the output chord will contain either 22050 or 22050+256 point samples. The player copies the sound and uses it as the base pitch. The single-pitch sample is then sped up to reach the next note in the chord and is mixed in again. This process is repeated until the chord is complete.

Perhaps I lost you when I stuck "+256" in the above paragraph. The generated chord is created slightly larger only when the original sample is looping. When the higher pitches are added to the generated sample, phase errors can occur at the point of looping. For instance, the waveform may be at a trough at the looping start point, and at a crest at the looping end point. This causes an annoying popping noise. By adding extra data which fades between the loop end point and the loop start point, this jump in the waveform is eliminated. The size of this intermediate data is 256 point samples; hence, the +256. By the way, this added data causes the loop start point to be advanced by 256 samples.

Type 2: Adaptive PCM

My version of adaptive PCM is comparable to the one used by the SNES (Super Nintendo Entertainment System), for which I wrote an emulator once upon a time. This sound data stores, on average, 4.5 bits per sample.

Sound is stored in blocks of 16 point samples, where each block is 9 bytes and begins with a 1-byte header in the least-significant nibble, which indicates the 16-bit range of the samples. Bits 0-3 are a number from 0 to 15, indicating the "jump size":

# Delta
0 0001
1 0002
2 0004
3 0008
# Delta
4 0010
5 0020
6 0040
7 0090
# Delta
8 0100
9 0200
A 0400
B 0800
# Delta
C 1000
D 2000
E 4000
F silence

Bits 4-7 are also a number from 0 to 15, which is a multiplier indicating the lower range boundary:

# Range multiplier
0 -15 to 0
1 -14 to 1
2 -13 to 2
3 -12 to 3
# Range multiplier
4 -11 to 4
5 -10 to 5
6 -9 to 6
7 -8 to 7
# Range multiplier
8 -7 to 8
9 -6 to 9
A -5 to 10
B -4 to 11
# Range multiplier
C -3 to 12
D -2 to 13
E -1 to 14
F 0 to 15

So what does all this mean? Well, let's say you have this waveform that has 16 samples that graph like this:

Value
+2800¯¯--__                          
+2000      --__                      
+1800          __                    
+1000            __                  
+0800                                
 0000..............¯¯__..............
-0800                  __            
-1000                    --          
-1800                      --        
-2000                        ¯¯----¯¯
-2800              Time              
>

Clearly, this waveform doesn't use the whole 16-bit spectrum: its range is within about -0x1000 to +3000. And even if it did, it wouldn't be important that the samples be stored with 16 digits of precision. And notice what the difference between two adjacent point samples is: never greater than about 0xA00. My algorithm uses the range in the differences between the point samples. Notice that the waveform is mainly on a downward trend, only going up once and then only about 0x400. My algorithm picks the best fit for this case by choosing a Range Multiplier of around -10 to 5 (#5), with a delta of around 0x100 (#8). Thus, any delta between -0xA00 and 0x500 can be represented. There will, of course, be rounding errors while converting to this 4-bit format, but the key is that it there will hardly be enough to hear.

For a stereo sample, the samples are interleaved: first, 16 samples from the left side are given, then 16 samples from the right side are given.


Finally, the part everyone has been waiting for: EOF (End Of File)
And that just about wraps up our little document.


By the way, I was not a music composer or sound programmer before I came up with this format, although I love music. Do you think there was something I should have done differently? No? You want to offer me a job? Well anyway, my e-mail address is QwertMan@hotmail.com.

If you want to extend the file format in a way that can fit into the "room for expansion" this format already provides, please consult me so we can discuss the best way to do it and so that I can make the change official. Plus, we might get to "do lunch".

Copyright © 1999 by David Piepgrass. This document may not be modified except by the author, and this notice may not be removed. This document may be distributed freely.

1