Analysis Files
When rekordbox analyzes tracks there is some data that is too big to
fit in the database itself. We have already seen some of that (the
album art images, and of course the track audio is left in the
filesystem as well). The other analysis data is organized into “anlz”
files, whose path can be found in the DeviceSQL string pointed to by
index 14 in the string offsets found at the end of the corresponding
track row. These files have names like
ANLZ0001.DAT
and their structure is described in this section.
The files are “tagged type” files, where there is an overall file header section, and then each entry in the file has its own header which identifies the type and length of that section.
Later player hardware added support for things like colored and
more-detailed waveforms. Apparently these were deemed too large to fit
in the .DAT
files (probably due to memory limitations of the older
players downloading those files), so another file was introduced,
which shares the same base filename as the .DAT
file, but uses an
extension of .EXT
instead. Both kinds of file share the same
structure, but different sets of tags can be found in each.
Analysis File Header
For some reason the analysis files store their numbers in big-endian
byte order, the opposite of the export.pdb
database file. Field
names used in the byte field diagrams match the IDs assigned to them
in the
Kaitai
Struct specification, unless that is too long to fit, in which case a
subscripted abbreviation is used, and the text will mention the actual
struct field name.
The file itself starts with the four-character code PMAI
that
identifies its format. This file format identifier is followed a
four-byte value, len_header (at bytes 04
-07
) that
specifies the length of the file header in bytes. This is followed by
another four-byte value, len_file, at bytes 08
-0b
that
specifies the length of the whole file in bytes:
The header seems to usually be 1c
bytes long, though we do not yet
know the purpose of any of the header values that come after
len_file. After the header, the file consists of a series of
tagged sections, each with their own four-character code identifying
the section type, followed by a header and the section content. This
overall structure is illustrated in the above diagram, and the
structure of the known tag types is described next.
Analysis File Sections
The structure of each tagged section has an “envelope” that can be understood even if the internal structure of the section is unknown, making it easy to navigate through the file looking for the section you need. This structure is very similar to the file itself, and is illustrated below.
Every section begins with a four-character code, fourcc, identifying its specific structure and content, as described in the sections below. This is followed by a four-byte value, len_header, which specifies how many bytes there are in the section header, and another four-byte value, len_tag, which specifies the length of the entire tagged section (including the header), in bytes. This value can be added to the address of the start of the tag to find the start of the next tag.
There is not much value to len_header. If you study the structure
of each type of tagged section, you can see some sense of where the
“header-like stuff ” ends, and “content-like stuff” begins, and this
seems to line up with the value of len_header. But because there
are important values in each tag’s header, and those always start
immediately after len_tag, it is simply easier to ignore the value
of len_header, and model the tag body as beginning at byte 0c
of
the tag. To show where the boundary occurs, in the diagrams that
follow, values that fall inside the byte range of the header are
colored yellow.
Beat Grid Tag
This kind of section holds a list of all beats found within the track,
recording their bar position, the time at which they occur, and the
tempo at that point. It is identified by the four-character code
PQTZ
, which may stand for “Pioneer Quantization”. It has the
structure shown below. len_header is 18
. The tag-specific
content starts with two unknown values, although Mr. Flesniak says
that unknown2 seems to always have the value 00800000
.
len_beats at bytes 14
-17
specifies the number of beats
were found in the track, and thus the number of beat entries that will
be present in this section. The beat entries come next, and each has
the following structure:
Each beat entry is eight bytes long. It starts with beat_number, a two-byte number (abbreviated bnum in the byte field diagram above) which specifies where the beat falls within its measure. So the value is always 1, 2, 3, or 4. This is followed by a two-byte tempo value, which records the track tempo at the point where this beat occurs, in beats per minute multiplied by 100 (to allow a precision of BPM). Finally, there is a four-byte time value, which specifies the time at which this beat would occur, in milliseconds, when playing the track at its normal speed.
As noted above, there will be as many beat entries as len_beats specifies. They continue to the end of the tag.
Cue List Tag
This kind of section holds either a list of ordinary memory points and
loops, or a list of hot cues and hot loops. It is identified by the
four-character code PCOB
, and has the structure shown below.
len_header is 18
.
Since the release of the Nexus 2 series of players, there is a newer tag available that contains more information and supports more hot cues, so you should check for that before loading this tag. See Extended (nxs2) Cue List Tag for details. |
The type value at bytes 0c
-0f
determines whether this
section holds memory points (if type is 0) or hot cues (if type is
1). The number of cue entries present in the section is reported in
lencues at bytes 12
-13
, and we don’t yet know the
meaning of unk at bytes 10
-11
or memory_count at
bytes 14
-17
. The remainder of the section, from
byte 18
through len_tag holds the cue entries themselves,
with the following structure:
Each cue entry is 38
bytes long. It is structured as its own
miniature tag for unknown reasons, starting with the four-character
code PCPT
(Pioneer Cue Point?), and its own internal four-byte
len_header and len_entry values (1c
and 38
respectively).
If the cue is an ordinary memory point, hot_cue at
bytes 0c
-0f
will be zero, otherwise it identifies the
number of the hot cue that this entry represents (Hot Cue A is number
1, B is 2, and so on). The status value at bytes 10
-13
is an indicator of active loops; if it is zero, the entry is a regular
cue point or loop. Active loops have the value 4 here.
The next four bytes have an unknown purpose, but seem to always have
the value 00100000
. They are followed by two two-byte values, which
seem to be for sorting the cues in the proper order in some strange
way. order_first at bytes 1a
-1b
(labeled ofirst in
the diagram) has the value ffff
for the first cue, 0000
for the
second, then 2, 3 and on. order_last at bytes 1a
-1b
(labeled olast) has the value 1 for the first cue, 2 for the
second, and so on, but ffff
for the last. It would seem that the
cues could be perfectly well sorted by just one of these fields, or
indeed, by their time values.
The first “non-header” field is type at byte 1c
(labeled
t in the diagram), and it specifies whether the entry records a
simple position (if it has the value 1) or a loop (if it has the value
2). The next three bytes have an unknown purpose, but seem to always
have the value 0003e8
, or decimal 1000.
The value time at bytes 20
-23
records the position of the
cue within the track, as a number of milliseconds (representing when
the cue would occur if the track is being played at normal speed). If
type is 2, meaning that this cue stores a loop, then loop_time
at bytes 24
-27
stores the track time in milliseconds at
which the player should loop back to time.
We do not know what, if anything, is stored in the remaining bytes of the cue entry.
Extended (nxs2) Cue List Tag
This is a variation of the Cue List Tag just described that was introduced with the Nexus 2 players to add support for more than three hot cues with custom color assignments, as well as DJ-assigned comment text for each hot cue and memory point. It also contains the information present in the standard Cue List Tag, so you only need to read one set or the other. Beat Link tries to use the extended tags if they are available, and falls back to using the older ones if they are not.
Just like the older tag, this kind of section holds either a list of
ordinary memory points and loops, or a list of hot cues and hot loops.
It is identified by the four-character code PCO2
, and has the
structure shown below. len_header is 14
.
The type value at bytes 0c
-0f
determines whether this
section holds memory points (if type is 0) or hot cues (if type is
1). The number of cue entries present in the section is reported in
lencues at bytes 10
-11
, and we don’t yet know the
meaning of the remaining two header bytes. The remainder of the
section, from byte 14
through len_tag holds the cue
entries themselves, with the following structure:
Each extended cue entry has a variable length. It is structured as its
own miniature tag, starting with the four-character code PCP2
, and
its own internal four-byte len_header and len_entry values.
While len_header has the fixed value 10
, len_entry is needed to
determine the length of the entry, so the beginning of the next one
can be located.
If the cue is an ordinary memory point, hot_cue at
bytes 0c
-0f
will be zero, otherwise it identifies the
number of the hot cue that this entry represents (Hot Cue A is number
1, B is 2, and so on).
The status flag and mysterious sort order values present in the older cue list entry header are simply absent here.
The first “non-header” field is type at byte 10
(labeled
t in the diagram), and it specifies whether the entry records a
simple position (if it has the value 1) or a loop (if it has the value
2). The next three bytes have an unknown purpose, but seem to always
have the values 0003e8
, or decimal 1000.
The value time at bytes 14
-17
records the position of the
cue within the track, as a number of milliseconds (representing when
the cue would occur if the track is being played at normal speed). If
type is 2, meaning that this cue stores a loop, then loop_time
at bytes 18
-1b
stores the track time in milliseconds at
which the player should loop back to time.
Immediately after the loop time, at byte 1c
is the single
byte value color_id (labeled cid). This holds the color,
if any, assigned to memory points and loops. If it is not zero,
it is the ID of a row in the color table.
Hot cues do not use this value, and have their own color information
later in the entry.
The next seven bytes have an unknown purpose, but seem to have the
value 00
, except for the first byte which seems to have the value
01
.
For entries that represent quantized automatic loops, information
about the quantized loop size is found in the values
loop_numerator (labeled lnumerator) at bytes 24
-25
and loop_denominator (labeled ldenominator) at bytes
26
-27
. The numerator and denominator represent the size of
the loop as a fraction of beats. So a four-beat loop would have a
numerator of 4 and a denominator of 1, while a half-beat loop would
have a numerator of 1 and a denominator of 2. Entries that are not
loops, or that are non-quantized, manually-positioned loops, will have
zeroes here. For cases where these are non-zero values, they are
always positive and powers of two. If the numerator is greater than 1,
the denominator will always be 1, and if the denominator is greater
than 1 the numerator will always be 1.
The quantized loop information is followed by len_comment at bytes
28
-2b
, which contains the length, in bytes, of the
comment field which immediately follows it starting at
byte 2c
. If len_comment has a non-zero value, comment
will hold the text of the comment, encoded as a UTF-16 Big Endian
string with a trailing NUL
(0000
) character. So the length will
always be even, and (when non-zero) always at least 4 (a one character
comment followed by the trailing NUL
).
Some extended cue entries are incomplete, and their len_entry indicates they end before the comment, or include the comment but end before the hot cue color information. Code that processes them needs to be prepared to handle this, and treat such partial cues as having no comment and/or hot cue color. |
Immediately after comment (in other words, starting len_comment
+ 1c
past the start of the entry) there are four one-byte values
containing hot cue color information. color_code (labeled c in
the diagram) appears to be a code identifying the color in which
rekordbox displays the cue, by looking it up in a table. The value
zero means to use the default green color which was the only color
supported by older CDJs, while the values 01
through 3e
identify
specific colors from the various 4x4 hot cue palette grids available
in rekordbox; their corresponding RGB colors can be found by looking
at the
findRecordboxColor
static method in the Beat Link library’s CueList
class. The next
three bytes, color_red (labeled r), color_green (labeled
g), and color_blue (labeled b), make up an RGB color
specification which is similar, but not identical, to the color that
rekordbox displays. We believe these are the values used to illuminate
the RGB LEDs in a player that has loaded the cue. When no color is
associated with the hot cue, all four of these bytes have the value
00
.
We do not know what, if anything, is stored in the remaining bytes of the tag.
Path Tag
This kind of section holds the file path of the audio file for which
the track analysis was performed. It is identified by the
four-character code PPTH
and has the structure shown below.
len_header is 10
.
len_path at bytes 0c
-0f
holds the length of the file
path value, which makes up the entire tag body. path, which starts
at byte 10
, is a UTF-16 Big Endian string with a trailing NUL
(0000
) character.
VBR Tag
This kind of section has not yet been explained, but it is believed to
hold an index allowing rapid seeking to particular times within
variable-bit-rate tracks. (Without such a structure, it would be
necessary to scan the entire file from the beginning to find a frame
starting at a particular time, which would be too slow for jumping to
memory points or hot cues deep within the track.) What is known of the
structure is shown below. The four-character code that identifies this
type of section is PVBR
and len_header is 10
.
Waveform Preview Tag
This kind of section holds a fixed-width monochrome preview of the
track waveform, displayed above the touch strip on original nexus
players, providing a birds-eye view of the current playback position,
and supporting direct needle jump to specific track sections. It is
identified by the four-character code PWAV
and has the structure
shown below. len_header is 14
.
The purpose of the header bytes 10
-13
is unknown; they
always seem to have the value 00100000
. The waveform preview data
begins at byte 14
and is 400 (decimal) bytes long. Each byte
encodes one vertical pixel-wide column of the waveform preview. The
height of the column is represented by the five low-order bits of the
byte (so it can range from 0 to 31 pixels high), and the whiteness of
the segment is represented by the three high-order bits. Segments with
higher values in these three bits are drawn in a less saturated
(whiter) shade of blue.
Tiny Waveform Preview Tag
This kind of section holds an even smaller fixed-width monochrome
preview of the track waveform, which seems to be displayed on the
CDJ-900. It is identified by the four-character code PWV2
but
otherwise has the same structure as the larger waveform preview tags
shown above. len_header is still
14
, and header bytes 10
-13
also seem to have the value
00100000
. The waveform preview data begins at byte 14
and is 100
(decimal) bytes long. Each byte encodes one vertical pixel-wide column
of the waveform preview. The height of the column is represented by
the four low-order bits of the byte (so it can only range from 0 to 15
pixels high), and no other bits are used.
Waveform Detail Tag
This kind of section holds a variable-width and much larger monochrome
rendition of the track waveform, which scrolls along while the track
plays, giving a detailed glimpse of the neighborhood of the current
playback position. Since this is potentially much larger than other
analysis elements, and is not supported by older players, it is stored
in the extended analyis file (with extension .EXT
). It is identified
by the four-character code PWV3
and has the structure shown below.
len_header is 18
.
len_entry_bytes identifies how many bytes each waveform detail
entry takes up; for this kind of tag it always has the value 1.
len_entries specifies how many entries are present in the tag.
Each entry represents one
half-frame of audio
data, and there are 75 frames per second, so for each second of track
audio there are 150 waveform detail entries. The purpose of the header
bytes 14
-17
is unknown; they always seem to have the value
00960000
. The waveform detail entries begin at byte 18
. The
interpretation of each byte is the same as for the
Waveform Preview data.
Waveform Color Preview Tag
This kind of section holds a fixed-width color preview of the track
waveform, displayed above the touch strip on nexus 2 players,
providing a birds-eye view of the current playback position, and
supporting direct needle jump to specific track sections. It is also
used in rekordbox itself. This is stored in the extended analyis file
(with extension .EXT
). It is identified by the four-character code
PWV4
and has the structure shown below. len_header is 18
.
len_entry_bytes identifies how many bytes each waveform preview
entry takes up; for this kind of tag it always has the value 6.
len_entries specifies how many entries are present in the tag. The
purpose of the header bytes 14
-17
is unknown. The waveform
color preview data begins at byte 18
and is 7,200 (decimal)
bytes long, representing 1,200 columns of waveform preview
information.
The color waveform preview entries are the most complex of the waveform tags. See the protocol analysis document for the details.
Waveform Color Detail Tag
This kind of section holds a variable-width and much larger color
rendition of the track waveform, introduced with the nexus 2 line (and
also used in rekordbox), which scrolls along while the track plays,
giving a detailed glimpse of the neighborhood of the current playback
position. This is stored in the extended analyis file (with extension
.EXT
). It is identified by the four-character code PWV5
and has
the structure shown below. len_header is 18
.
len_entry_bytes identifies how many bytes each waveform detail
entry takes up; for this kind of tag it always has the value 2.
len_entries specifies how many entries are present in the tag.
Each entry represents one
half-frame of audio data,
and there are 75 frames per second, so for each second of track audio
there are 150 waveform detail entries. The purpose of the header
bytes 14
-17
is unknown; they may always have the value
00960305
. The color waveform detail entries begin at
byte 18
.
Color detail entries are much simpler than color preview entries. They consist of three-bit red, green, and blue components and a five-bit height component packed into the sixteen bits of the two entry bytes. Considering each entry as a two-byte big-endian integer, the red component is the three high-order bits. The next three bits are the green component, followed by the three bits of blue intensity, and finally five bits of height. The two low-order bits do not seem to be used. This is shown below:
Song Structure Tag
This kind of section was originally used only in rekordbox Performance
Mode, but starting with rekordbox version 6 it also gets exported to
external media so CDJ-3000 players can use it to control lighting
looks. The section is identified by the four-character code PSSI
and
has the structure shown below. len_header is 20
, and as always
len_tag is the length of the entire tag including the header. Many
thanks to Michael Ganss for contributing
this analysis.
The version that rekordbox 6 exports is garbled with an XOR mask to
make it more difficult to access the data. All bytes after lene
(bytes CB E1 EE FA E5 EE AD EE E9 D2 E9 EB E1 E9 F3 E8 E9 F4 E1 |
len_entry_bytes identifies how many bytes each phrase entry takes
up; so far it always has the value 18
, so each entry takes twenty-four bytes. len_entries at bytes 10
-11
(labeled
lene in the diagram) specifies how many entries are present in the
tag. Each entry represents one recognized phrase.
The value mood at bytes 12
-13
specifies the overall type
of phrase structure that rekordbox chose to represent the song, based
on its analysis of the audio.
The value 1 is a “high” mood where the phrase types consist of “Intro”, “Up”, “Down”, “Chorus”, and “Outro”. Other values in each phrase entry cause the intro, chorus, and outro phrases to have their labels subdivided into styes “1” or “2” (for example, “Intro 1”), and “up” is subdivided into style “Up 1”, “Up 2”, or “Up 3”. See the table below for an expanded version of this description.
The value 2 is a “mid” mood where the phrase types are labeled “Intro”, “Verse 1” through “Verse 6”, “Chorus”, “Bridge”, and “Outro”.
And value 3 is a “low” mood where the phrase types are labeled “Intro”, “Verse 1”, “Verse 2”, “Chorus”, “Bridge”, and “Outro”. There are three different phrase type values for each of “Verse 1” and “Verse 2”, but rekordbox makes no distinction between them.
The purpose of the header bytes 14
-19
is unknown.
end_beat at bytes 1a
-1b
(labeled end in the diagram)
holds the beat number at which the last recognized phrase ends. The
track may continue beyond this, but will mostly be silence from then
on.
The purpose of the header bytes 1c
-1d
is also unknown.
bank at byte 1e
identifies the stylistic bank which has
been assigned to the track by the user in Lighting mode. The value
zero means the user has made no assignment, and this is treated the
same as if “Cool” has been chosen. The values and their meanings are
listed in the table below.
The final byte of the header also has an unknown purpose.
The phrase entries begin at byte 20
, and each has the
structure shown below:
The first two bytes of each song structure entry hold index, which numbers each phrase, starting at one and incrementing with each entry. That is followed by beat, a two-byte value that specifies the beat at which this phrase begins in the track. It continues until either the beat number of the next phrase, or the beat identified by end in the tag header if this is the last entry.
kind at bytes 04
-05
specifies what kind of phrase
rekordbox has identified here. The interpretation depends on the value
of mood in the tag header, as is detailed the table below. In the
case of the “high” mood, there are numbered variations for some of the
phrases displayed in rekordbox that are not reflected in kind, but
depend on the values of three flag bytes k1 through k3 at
bytes 07
, 09
, and 13
in a complicated way shown in its
own table. Our best guess as to the reasons
behind this is that the design of the lighting feature changed after
the first release, and they struggled to maintain backwards
compatibility.
We also noticed that when mood, kind and the k flags indicate a
phrase of type “Up 3”, additional beat numbers (which all fall within
the phrase) are present in the entry. These may indicate points within
the phrase at which lighting changes would look good; more
investigation is required to make sense of them. The number of beats
that will be listed seems to depend on the value of the flag b at
byte 0b
: if this has the value 0, there will be a single beat
found in beat2 at bytes 0c
-0d
, and if b has the value
1 there will be three different beat numbers present, with increasing
values, in beat2, beat3 at bytes 0e
-0f
, and
beat4 at bytes 10
-11
.
fill at byte 15
is a flag that indicates whether there are
fill (non-phrase) beats at the end of the phrase. If it is non-zero,
then beatfill at bytes 16
-17
holds the beat number at
which the fill begins. When fill-in is present, it is indicated in
rekordbox by little dots on the full waveform. The manual says:
[Fill in] is a section that provides improvisational changes at the end of phrase. [Fill in] is detected at the end of Intro, Up, and Chorus (up to 4 beats).
Phrase ID | Low Label | Mid Label | High Label |
---|---|---|---|
1 |
Intro |
Intro |
Intro n |
2 |
Verse 1 |
Verse 1 |
Up n |
3 |
Verse 1 |
Verse 2 |
Down |
4 |
Verse 1 |
Verse 3 |
|
5 |
Verse 2 |
Verse 4 |
Chorus n |
6 |
Verse 2 |
Verse 5 |
Outro n |
7 |
Verse 2 |
Verse 6 |
|
8 |
Bridge |
Bridge |
|
9 |
Chorus |
Chorus |
|
10 |
Outro |
Outro |
Phrase ID | k1 | k2 | k3 | Expanded Label |
---|---|---|---|---|
1 |
1 |
Intro 1 |
||
1 |
0 |
Intro 2 |
||
2 |
0 |
0 |
Up 1 |
|
2 |
0 |
1 |
Up 2 |
|
2 |
1 |
0 |
Up 3 |
|
3 |
Down |
|||
5 |
1 |
Chorus 1 |
||
5 |
0 |
Chorus 2 |
||
6 |
1 |
Outro 1 |
||
6 |
0 |
Outro 2 |
Bank ID | Label |
---|---|
0 |
Default (treated as Cool) |
1 |
Cool |
2 |
Natural |
3 |
Hot |
4 |
Subtle |
5 |
Warm |
6 |
Vivid |
7 |
Club 1 |
8 |
Club 2 |