Analysis Files

When rekordbox analyzes tracks there is some data that is too big to fit in the database itself. We have already seen some of that (the album art images, and of course the track audio is left in the filesystem as well). The other analysis data is organized into “anlz” files, whose path can be found in the DeviceSQL string pointed to by index 14 in the string offsets found at the end of the corresponding track row. These files have names like ANLZ0001.DAT and their structure is described in this section.

The files are “tagged type” files, where there is an overall file header section, and then each entry in the file has its own header which identifies the type and length of that section.

Later player hardware added support for things like colored and more-detailed waveforms. Apparently these were deemed too large to fit in the .DAT files (probably due to memory limitations of the older players downloading those files), so another file was introduced, which shares the same base filename as the .DAT file, but uses an extension of .EXT instead. Both kinds of file share the same structure, but different sets of tags can be found in each.

Analysis File Header

For some reason the analysis files store their numbers in big-endian byte order, the opposite of the export.pdb database file. Field names used in the byte field diagrams match the IDs assigned to them in the Kaitai Struct specification, unless that is too long to fit, in which case a subscripted abbreviation is used, and the text will mention the actual struct field name.

The file itself starts with the four-character code PMAI that identifies its format. This file format identifier is followed a four-byte value, len_header (at bytes 04-07) that specifies the length of the file header in bytes. This is followed by another four-byte value, len_file, at bytes 08-0b that specifies the length of the whole file in bytes:

0123456789abcdefPMAIlen_headerlen_file001020Tagged sections30i+00
Analysis file structure.

The header seems to usually be 1c bytes long, though we do not yet know the purpose of any of the header values that come after len_file. After the header, the file consists of a series of tagged sections, each with their own four-character code identifying the section type, followed by a header and the section content. This overall structure is illustrated in the above diagram, and the structure of the known tag types is described next.

Analysis File Sections

The structure of each tagged section has an “envelope” that can be understood even if the internal structure of the section is unknown, making it easy to navigate through the file looking for the section you need. This structure is very similar to the file itself, and is illustrated below.

0123456789abcdeffourcclen_headerlen_tag0010Tag-specific content20i+00
Tagged section structure.

Every section begins with a four-character code, fourcc, identifying its specific structure and content, as described in the sections below. This is followed by a four-byte value, len_header, which specifies how many bytes there are in the section header, and another four-byte value, len_tag, which specifies the length of the entire tagged section (including the header), in bytes. This value can be added to the address of the start of the tag to find the start of the next tag.

There is not much value to len_header. If you study the structure of each type of tagged section, you can see some sense of where the “header-like stuff ” ends, and “content-like stuff” begins, and this seems to line up with the value of len_header. But because there are important values in each tag’s header, and those always start immediately after len_tag, it is simply easier to ignore the value of len_header, and model the tag body as beginning at byte 0c of the tag. To show where the boundary occurs, in the diagrams that follow, values that fall inside the byte range of the header are colored yellow.

Beat Grid Tag

This kind of section holds a list of all beats found within the track, recording their bar position, the time at which they occur, and the tempo at that point. It is identified by the four-character code PQTZ, which may stand for “Pioneer Quantization”. It has the structure shown below. len_header is 18. The tag-specific content starts with two unknown values, although Mr. Flesniak says that unknown2 seems to always have the value 00800000.

0123456789abcdefPQTZlen_headerlen_tagunknown10010unknown2len_beatsBeat entries20i+00
Beat Grid tag.

len_beats at bytes 14-17 specifies the number of beats were found in the track, and thus the number of beat entries that will be present in this section. The beat entries come next, and each has the following structure:

01234567bnumtempotime
Beat Grid beat.

Each beat entry is eight bytes long. It starts with beat_number, a two-byte number (abbreviated bnum in the byte field diagram above) which specifies where the beat falls within its measure. So the value is always 1, 2, 3, or 4. This is followed by a two-byte tempo value, which records the track tempo at the point where this beat occurs, in beats per minute multiplied by 100 (to allow a precision of BPM). Finally, there is a four-byte time value, which specifies the time at which this beat would occur, in milliseconds, when playing the track at its normal speed.

As noted above, there will be as many beat entries as len_beats specifies. They continue to the end of the tag.

Cue List Tag

This kind of section holds either a list of ordinary memory points and loops, or a list of hot cues and hot loops. It is identified by the four-character code PCOB, and has the structure shown below. len_header is 18.

Since the release of the Nexus 2 series of players, there is a newer tag available that contains more information and supports more hot cues, so you should check for that before loading this tag. See Extended (nxs2) Cue List Tag for details.
0123456789abcdefPCOBlen_headerlen_tagtype0010unklencuesmemory_countCue entries20i+00
Cue List tag.

The type value at bytes 0c-0f determines whether this section holds memory points (if type is 0) or hot cues (if type is 1). The number of cue entries present in the section is reported in lencues at bytes 12-13, and we don’t yet know the meaning of unk at bytes 10-11 or memory_count at bytes 14-17. The remainder of the section, from byte 18 through len_tag holds the cue entries themselves, with the following structure:

0123456789abcdefPCPTlen_headerlen_entryhot_cue0010statusunknown1ofirstolasttunknown220timeloop_time30
Cue List entry.

Each cue entry is 38 bytes long. It is structured as its own miniature tag for unknown reasons, starting with the four-character code PCPT (Pioneer Cue Point?), and its own internal four-byte len_header and len_entry values (1c and 38 respectively).

If the cue is an ordinary memory point, hot_cue at bytes 0c-0f will be zero, otherwise it identifies the number of the hot cue that this entry represents (Hot Cue A is number 1, B is 2, and so on). The status value at bytes 10-13 is an indicator of active loops; if it is zero, the entry is a regular cue point or loop. Active loops have the value 4 here.

The next four bytes have an unknown purpose, but seem to always have the value 00100000. They are followed by two two-byte values, which seem to be for sorting the cues in the proper order in some strange way. order_first at bytes 1a-1b (labeled ofirst in the diagram) has the value ffff for the first cue, 0000 for the second, then 2, 3 and on. order_last at bytes 1a-1b (labeled olast) has the value 1 for the first cue, 2 for the second, and so on, but ffff for the last. It would seem that the cues could be perfectly well sorted by just one of these fields, or indeed, by their time values.

The first “non-header” field is type at byte 1c (labeled t in the diagram), and it specifies whether the entry records a simple position (if it has the value 1) or a loop (if it has the value 2). The next three bytes have an unknown purpose, but seem to always have the value 0003e8, or decimal 1000.

The value time at bytes 20-23 records the position of the cue within the track, as a number of milliseconds (representing when the cue would occur if the track is being played at normal speed). If type is 2, meaning that this cue stores a loop, then loop_time at bytes 24-27 stores the track time in milliseconds at which the player should loop back to time.

We do not know what, if anything, is stored in the remaining bytes of the cue entry.

Extended (nxs2) Cue List Tag

This is a variation of the Cue List Tag just described that was introduced with the Nexus 2 players to add support for more than three hot cues with custom color assignments, as well as DJ-assigned comment text for each hot cue and memory point. It also contains the information present in the standard Cue List Tag, so you only need to read one set or the other. Beat Link tries to use the extended tags if they are available, and falls back to using the older ones if they are not.

Just like the older tag, this kind of section holds either a list of ordinary memory points and loops, or a list of hot cues and hot loops. It is identified by the four-character code PCO2, and has the structure shown below. len_header is 14.

0123456789abcdefPCO2len_headerlen_tagtype0010lencues0000Cue entries20i+00
Extended (nxs2) Cue List tag.

The type value at bytes 0c-0f determines whether this section holds memory points (if type is 0) or hot cues (if type is 1). The number of cue entries present in the section is reported in lencues at bytes 10-11, and we don’t yet know the meaning of the remaining two header bytes. The remainder of the section, from byte 14 through len_tag holds the cue entries themselves, with the following structure:

0123456789abcdefPCP2len_headerlen_entryhot_cue0010tunknown1timeloop_timecid20lnumeratorldenominatorlen_commentcomment30i+00crgbi+00i+10
Extended (nxs2) Cue List entry.

Each extended cue entry has a variable length. It is structured as its own miniature tag, starting with the four-character code PCP2, and its own internal four-byte len_header and len_entry values. While len_header has the fixed value 10, len_entry is needed to determine the length of the entry, so the beginning of the next one can be located.

If the cue is an ordinary memory point, hot_cue at bytes 0c-0f will be zero, otherwise it identifies the number of the hot cue that this entry represents (Hot Cue A is number 1, B is 2, and so on).

The status flag and mysterious sort order values present in the older cue list entry header are simply absent here.

The first “non-header” field is type at byte 10 (labeled t in the diagram), and it specifies whether the entry records a simple position (if it has the value 1) or a loop (if it has the value 2). The next three bytes have an unknown purpose, but seem to always have the values 0003e8, or decimal 1000.

The value time at bytes 14-17 records the position of the cue within the track, as a number of milliseconds (representing when the cue would occur if the track is being played at normal speed). If type is 2, meaning that this cue stores a loop, then loop_time at bytes 18-1b stores the track time in milliseconds at which the player should loop back to time.

Immediately after the loop time, at byte 1c is the single byte value color_id (labeled cid). This holds the color, if any, assigned to memory points and loops. If it is not zero, it is the ID of a row in the color table. Hot cues do not use this value, and have their own color information later in the entry.

The next seven bytes have an unknown purpose, but seem to have the value 00, except for the first byte which seems to have the value 01.

For entries that represent quantized automatic loops, information about the quantized loop size is found in the values loop_numerator (labeled lnumerator) at bytes 24-25 and loop_denominator (labeled ldenominator) at bytes  26-27. The numerator and denominator represent the size of the loop as a fraction of beats. So a four-beat loop would have a numerator of 4 and a denominator of 1, while a half-beat loop would have a numerator of 1 and a denominator of 2. Entries that are not loops, or that are non-quantized, manually-positioned loops, will have zeroes here. For cases where these are non-zero values, they are always positive and powers of two. If the numerator is greater than 1, the denominator will always be 1, and if the denominator is greater than 1 the numerator will always be 1.

The quantized loop information is followed by len_comment at bytes  28-2b, which contains the length, in bytes, of the comment field which immediately follows it starting at byte 2c. If len_comment has a non-zero value, comment will hold the text of the comment, encoded as a UTF-16 Big Endian string with a trailing NUL (0000) character. So the length will always be even, and (when non-zero) always at least 4 (a one character comment followed by the trailing NUL).

Some extended cue entries are incomplete, and their len_entry indicates they end before the comment, or include the comment but end before the hot cue color information. Code that processes them needs to be prepared to handle this, and treat such partial cues as having no comment and/or hot cue color.

Immediately after comment (in other words, starting len_comment + 1c past the start of the entry) there are four one-byte values containing hot cue color information. color_code (labeled c in the diagram) appears to be a code identifying the color in which rekordbox displays the cue, by looking it up in a table. The value zero means to use the default green color which was the only color supported by older CDJs, while the values 01 through 3e identify specific colors from the various 4x4 hot cue palette grids available in rekordbox; their corresponding RGB colors can be found by looking at the findRecordboxColor static method in the Beat Link library’s CueList class. The next three bytes, color_red (labeled r), color_green (labeled g), and color_blue (labeled b), make up an RGB color specification which is similar, but not identical, to the color that rekordbox displays. We believe these are the values used to illuminate the RGB LEDs in a player that has loaded the cue. When no color is associated with the hot cue, all four of these bytes have the value 00.

We do not know what, if anything, is stored in the remaining bytes of the tag.

Path Tag

This kind of section holds the file path of the audio file for which the track analysis was performed. It is identified by the four-character code PPTH and has the structure shown below. len_header is 10.

0123456789abcdefPPTHlen_headerlen_taglen_path0010path20i+00
Path tag.

len_path at bytes 0c-0f holds the length of the file path value, which makes up the entire tag body. path, which starts at byte 10, is a UTF-16 Big Endian string with a trailing NUL (0000) character.

VBR Tag

This kind of section has not yet been explained, but it is believed to hold an index allowing rapid seeking to particular times within variable-bit-rate tracks. (Without such a structure, it would be necessary to scan the entire file from the beginning to find a frame starting at a particular time, which would be too slow for jumping to memory points or hot cues deep within the track.) What is known of the structure is shown below. The four-character code that identifies this type of section is PVBR and len_header is 10.

0123456789abcdefPVBRlen_headerlen_tagunknown10010unknown220i+00
VBR tag.

Waveform Preview Tag

This kind of section holds a fixed-width monochrome preview of the track waveform, displayed above the touch strip on original nexus players, providing a birds-eye view of the current playback position, and supporting direct needle jump to specific track sections. It is identified by the four-character code PWAV and has the structure shown below. len_header is 14.

0123456789abcdefPWAVlen_headerlen_taglen_preview0010unknowndata20i+00
Waveform Preview tag.

The purpose of the header bytes 10-13 is unknown; they always seem to have the value 00100000. The waveform preview data begins at byte 14 and is 400 (decimal) bytes long. Each byte encodes one vertical pixel-wide column of the waveform preview. The height of the column is represented by the five low-order bits of the byte (so it can range from 0 to 31 pixels high), and the whiteness of the segment is represented by the three high-order bits. Segments with higher values in these three bits are drawn in a less saturated (whiter) shade of blue.

Tiny Waveform Preview Tag

This kind of section holds an even smaller fixed-width monochrome preview of the track waveform, which seems to be displayed on the CDJ-900. It is identified by the four-character code PWV2 but otherwise has the same structure as the larger waveform preview tags shown above. len_header is still 14, and header bytes 10-13 also seem to have the value 00100000. The waveform preview data begins at byte 14 and is 100 (decimal) bytes long. Each byte encodes one vertical pixel-wide column of the waveform preview. The height of the column is represented by the four low-order bits of the byte (so it can only range from 0 to 15 pixels high), and no other bits are used.

Waveform Detail Tag

This kind of section holds a variable-width and much larger monochrome rendition of the track waveform, which scrolls along while the track plays, giving a detailed glimpse of the neighborhood of the current playback position. Since this is potentially much larger than other analysis elements, and is not supported by older players, it is stored in the extended analyis file (with extension .EXT). It is identified by the four-character code PWV3 and has the structure shown below. len_header is 18.

0123456789abcdefPWV3len_headerlen_taglen_entry_bytes0010len_entriesunknownentries20i+00
Waveform Detail tag.

len_entry_bytes identifies how many bytes each waveform detail entry takes up; for this kind of tag it always has the value 1. len_entries specifies how many entries are present in the tag. Each entry represents one half-frame of audio data, and there are 75 frames per second, so for each second of track audio there are 150 waveform detail entries. The purpose of the header bytes 14-17 is unknown; they always seem to have the value 00960000. The waveform detail entries begin at byte 18. The interpretation of each byte is the same as for the Waveform Preview data.

Waveform Color Preview Tag

This kind of section holds a fixed-width color preview of the track waveform, displayed above the touch strip on nexus 2 players, providing a birds-eye view of the current playback position, and supporting direct needle jump to specific track sections. It is also used in rekordbox itself. This is stored in the extended analyis file (with extension .EXT). It is identified by the four-character code PWV4 and has the structure shown below. len_header is 18.

0123456789abcdefPWV4len_headerlen_taglen_entry_bytes0010len_entriesunknownentries20i+00
Waveform Color Preview tag.

len_entry_bytes identifies how many bytes each waveform preview entry takes up; for this kind of tag it always has the value 6. len_entries specifies how many entries are present in the tag. The purpose of the header bytes 14-17 is unknown. The waveform color preview data begins at byte 18 and is 7,200 (decimal) bytes long, representing 1,200 columns of waveform preview information.

The color waveform preview entries are the most complex of any of the waveform tags. See the protocol analysis document for the details.

Waveform Color Detail Tag

This kind of section holds a variable-width and much larger color rendition of the track waveform, introduced with the nexus 2 line (and also used in rekordbox), which scrolls along while the track plays, giving a detailed glimpse of the neighborhood of the current playback position. This is stored in the extended analyis file (with extension .EXT). It is identified by the four-character code PWV5 and has the structure shown below. len_header is 18.

0123456789abcdefPWV5len_headerlen_taglen_entry_bytes0010len_entriesunknownentries20i+00
Waveform Color Detail tag.

len_entry_bytes identifies how many bytes each waveform detail entry takes up; for this kind of tag it always has the value 2. len_entries specifies how many entries are present in the tag. Each entry represents one half-frame of audio data, and there are 75 frames per second, so for each second of track audio there are 150 waveform detail entries. The purpose of the header bytes 14-17 is unknown; they may always have the value 00960305. The color waveform detail entries begin at byte 18.

Color detail entries are much simpler than color preview entries. They consist of three-bit red, green, and blue components and a five-bit height component packed into the sixteen bits of the two entry bytes. Considering each entry as a two-byte big-endian integer, the red component is the three high-order bits. The next three bits are the green component, followed by the three bits of blue intensity, and finally five bits of height. The two low-order bits do not seem to be used. This is shown below:

fedcba9876543210redgreenblueheight0000
Waveform Color Detail segment bits.

Song Structure Tag

This kind of section was originally used only in rekordbox Performance Mode, but starting with rekordbox version 6 it also gets exported to external media so CDJ-3000 players can use it to control lighting looks. The section is identified by the four-character code PSSI and has the structure shown below. len_header is 20, and as always len_tag is the length of the entire tag including the header. Many thanks to Michael Ganss for contributing this analysis.

The version that rekordbox 6 exports is garbled with an XOR mask to make it more difficult to access the data. All bytes after lene (bytes 10-11) are XOR-masked with a pattern that is generated by adding the value of lene to each byte of the following base pattern:

CB E1 EE FA E5 EE AD EE E9 D2 E9 EB E1 E9 F3 E8 E9 F4 E1
0123456789abcdefPSSIlen_headerlen_taglen_entry_bytes0010lenemoodunknownendunk2banku320entries30i+00
Song Structure tag.

len_entry_bytes identifies how many bytes each phrase entry takes up; so far it always has the value 18, so each entry takes twenty four bytes. len_entries at bytes 10-11 (labeled lene in the diagram) specifies how many entries are present in the tag. Each entry represents one recognized phrase.

The value mood at bytes 12-13 specifies the overall type of phrase structure that rekordbox chose to represent the song, based on its analysis of the audio.

The value 1 is a “high” mood where the phrase types consist of “Intro”, “Up”, “Down”, “Chorus”, and “Outro”. Other values in each phrase entry cause the intro, chorus, and outro phrases to have their labels subdivided into styes “1” or “2” (for example, “Intro 1”), and “up” is subdivided into style “Up 1”, “Up 2”, or “Up 3”. See the table below for an expanded version of this description.

The value 2 is a “mid” mood where the phrase types are labeled “Intro”, “Verse 1” through “Verse 6”, “Chorus”, “Bridge”, and “Outro”.

And value 3 is a “low” mood where the phrase types are labeled “Intro”, “Verse 1”, “Verse 2”, “Chorus”, “Bridge”, and “Outro”. There are three different phrase type values for each of “Verse 1” and “Verse 2”, but rekordbox makes no distinction between them.

The purpose of the header bytes 14-19 is unknown. end_beat at bytes 1a-1b (labeled end in the diagram) holds the beat number at which the last recognized phrase ends. The track may continue beyond this, but will mostly be silence from then on.

The purpose of the header bytes 1c-1d is also unknown.

bank at byte 1e identifies the stylistic bank which has been assigned to the track by the user in Lighting mode. The value zero means the user has made no assignment, and this is treated the same as if “Cool” has been chosen. The values and their meanings are listed in the table below.

The final byte of the header also has an unknown purpose.

The phrase entries begin at byte 20, and each has the structure shown below:

0123456789abcdefindexbeatkindk1k2bbeat2beat30010beat4k3fillbeatfill
Song Structure entry.

The first two bytes of each song structure entry hold index, which numbers each phrase, starting at one and incrementing with each entry. That is followed by beat, a two-byte value that specifies the beat at which this phrase begins in the track. It continues until either the beat number of the next phrase, or the beat identified by end in the tag header if this is the last entry.

kind at bytes 04-05 specifies what kind of phrase rekordbox has identified here. The interpretation depends on the value of mood in the tag header, as is detailed the table below. In the case of the “high” mood, there are numbered variations for some of the phrases displayed in rekordbox that are not reflected in kind, but depend on the values of three flag bytes k1 through k3 at bytes 07, 09, and 13 in a complicated way shown in its own table. Our best guess as to the reasons behind this is that the design of the lighting feature changed after the first release, and they struggled to maintain backwards compatibility.

We also noticed that when mood, kind and the k flags indicate a phrase of type “Up 3”, additional beat numbers (which all fall within the phrase) are present in the entry. These may indicate points within the phrase at which lighting changes would look good; more investigation is required to make sense of them. The number of beats that will be listed seems to depend on the value of the flag b at byte 0b: if this has the value 0, there will be a single beat found in beat2 at bytes 0c-0d, and if b has the value 1 there will be three different beat numbers present, with increasing values, in beat2, beat3 at bytes 0e-0f, and beat4 at bytes 10-11.

fill at byte 15 is a flag that indicates whether there are fill (non-phrase) beats at the end of the phrase. If it is non-zero, then beatfill at bytes 16-17 holds the beat number at which the fill begins. When fill-in is present, it is indicated in rekordbox by little dots on the full waveform. The manual says:

[Fill in] is a section that provides improvisational changes at the end of phrase. [Fill in] is detected at the end of Intro, Up, and Chorus (up to 4 beats).

Table 1. Phrase labels in each mood.
Phrase ID Low Label Mid Label High Label

1

Intro

Intro

Intro n

2

Verse 1

Verse 1

Up n

3

Verse 1

Verse 2

Down

4

Verse 1

Verse 3

5

Verse 2

Verse 4

Chorus n

6

Verse 2

Verse 5

Outro n

7

Verse 2

Verse 6

8

Bridge

Bridge

9

Chorus

Chorus

10

Outro

Outro

Table 2. High mood phrase variants.
Phrase ID k1 k2 k3 Expanded Label

1

1

Intro 1

1

0

Intro 2

2

0

0

Up 1

2

0

1

Up 2

2

1

0

Up 3

3

Down

5

1

Chorus 1

5

0

Chorus 2

6

1

Outro 1

6

0

Outro 2

Table 3. Track banks.
Bank ID Label

0

Default (treated as Cool)

1

Cool

2

Natural

3

Hot

4

Subtle

5

Warm

6

Vivid

7

Club 1

8

Club 2