Database Exports
Background
The files written to external media by rekordbox for use in player
hardware contain a wealth of information that can be used in place of
queries to the remotedb
server on the players, which is important
because they can be obtained from the players’ NFS servers, even if
there are four players in use sharing the same media. Under those
circumstances, remotedb
queries are impossible. This document shares
what has been learned so far about the files, and how to interpret
them.
Database Exports
The starting point for finding track metadata from a player is the database export file, which can be found within rekordbox media at the following path:
/PIONEER/rekordbox/export.pdb
(If you are using the
Crate
Digger FileFetcher
to request this file, use that path as the
filePath
argument, and use a mountPath
value of /B/
if you want
to read it from the SD slot, or /C/
to obtain it from the USB slot).
Newer players also support an additional database with the filename exportExt.pdb in the same location, which holds a different and smaller set of table types in it.
|
The file is a relational database format designed to be efficiently used by very low power devices (there were deployments on 16 bit devices with 32K of RAM). Today you are most likely to encounter it within the Pioneer Professional DJ ecosystem, because it is the format that their rekordbox software uses to write USB and SD media which can be mounted in DJ controllers and used to play and mix music.
The file consists of a series of fixed size pages. The first page contains a file header which defines the page size and the locations of database tables of different types, by the index of their first page. The rest of the pages consist of the data pages for all the tables identified in the header.
Each table is made up of a series of rows which may be spread across any number of pages. The pages start with a header describing the page and linking to the next page. The rest of the page is used as a heap: rows are scattered around it, and located using an index structure that builds backwards from the end of the page. Each row of a given type has a fixed size structure which links to any variable-sized strings by their offsets within the page.
As changes are made to the table, some records may become unused, and there may be gaps within the heap that are too small to be used by other data. There is a bit map in the row index that identifies which rows are actually present. Rows that are not present must be ignored: they do not contain valid (or even necessarily well-formed) data.
The majority of the work in reverse-engineering this format was performed by Henry Betts and Fabian Lesniak, to whom I am hugely grateful.
More recently, Dominik Stolz (@voidc) figured out what was in
exportExt.pdb
files.
File Header
Unless otherwise stated, all multibyte numbers in the file are stored in little-endian byte order. Field names used in the byte field diagrams match the IDs assigned to them in the Kaitai Struct specification,[1] unless that is too long to fit, in which case a subscripted abbreviation is used, and the text will mention the actual struct field name.
The first page begins with the file header, shown below. The header
starts with four zero bytes, followed by a four-byte integer,
len_page at byte 04
, that establishes the size of each page
(including this first one), in bytes. This is followed by another
four-byte integer, num_tables at byte 08
, which reports the
number of different tables that are present in the file. Each table
will have a table pointer entry in the “Table pointers” section of the
file header, described below, that identifies and locates the table.
The four-byte integer nextu at byte 0c
has an unknown purpose,
but Mr. Lesniak named it next_unused_page
and said “Not used as any
empty_candidate
, points past the end of the file.” The four-byte
integer sequence, at byte 14
, was described “Always incremented by
at least one, sometimes by two or three.” and I assume this means it
reflects a version number that rekordbox updates when synchronizing
to the exported media.
Finally, there is another series of four zero bytes, and then the
header ends with the list of table pointers which begins at byte 1c
.
There are as many of these as specified by num_tables, and each has
the following structure:
Each Table Pointer is a series of four four-byte integers. The first,
type, identifies the type of table being defined. The known table
types are shown in below. The second value, at byte 04
of the table
pointer, was called empty_candidate by Mr. Lesniak. It may link to
a chain of empty pages if the database is ever garbage collected, but
this is speculation on my part.
Type | Name | Meaning |
---|---|---|
|
tracks |
Track metadata: title, artist, genre, artwork ID, playing time, etc. |
|
genres |
Musical genres, for reference by tracks and searching. |
|
artists |
Artists, for reference by tracks and searching. |
|
albums |
Albums, for reference by tracks and searching. |
|
labels |
Music labels, for reference by tracks and searching. |
|
keys |
Musical keys, for reference by tracks, searching, and key matching. |
|
colors |
Color labels, for reference by tracks and searching. |
|
playlist_tree |
Holds the hierarchical tree structure of playlists and folders grouping them. |
|
playlist_entries |
Links tracks to playlists, in the right order. |
|
artwork |
File paths of album artwork images. |
|
columns |
Details not yet confirmed. |
|
history_playlists |
Holds the list of history playlists in the History menu. |
|
history_entries |
Links tracks to history playlists entries, in the right order. |
|
history |
Data used by rekordbox to synchronize history playlists (not yet studied). |
Type | Name | Meaning |
---|---|---|
|
tags |
Tags: can be assigned to tracks for the purpose of categorization. |
|
tag_tracks |
Tag Tracks: holds the associations between tag ids and track ids. |
Other than the type, the two important values are first_page at
byte 08
and last_page at byte 0c
. These tell us how to find
the table. They are page indices, where the page containing the file
header has index 0, the page with index 1 begins at byte len_page,
and so on. In other words, the first page of the table identified by
the current table pointer can be found within the file starting at the
byte len_page × first_page.
The table is a linked list of pages: each page contains the index of the next page after it. However, you need to keep track of the last_page value for the table, because it tells you not to try to follow the next page link once you reach the page with that index. (If you do keep going, you will start reading pages of some different table.) The structure of the table pages themselves are described in the next section.
As far as we know, the remainder of the first page after the table pointers is unused.
Table Pages
The table header is followed by the table pages themselves. These each have the size specified by len_page in the above diagram, and the following structure:
Data pages all seem to have the header structure described here, but not all of them actually store data. Some of them are “strange” and we have not yet figured out why. The discussion below describes how to recognize a strange page, and avoid trying to read it as a data page.
The first four bytes of a table page always seem to be zero. This is followed by a four-byte value page_index which identifies the index of this page within the list of table pages (the header has index 0, the first actual data page the index 1, and so on). This value seems to be redundant, because it can be calculated by dividing the offset of the start of the page by len_page, but perhaps it serves as a sanity check.
This is followed by another four-byte value, type, which identifies the type of the page, using the values shown in the preceding table. This again seems redundant because the table header which was followed to reach this page also identified the table type, but perhaps it is another sanity check, or an alternate way to tell, when following page links, that you have reached the end of the table you are interested in. Speaking of which, the next four-byte value, next_page, is that link: it identifies the index at which the next page of this table can be found, as long as we have not already reached the final page of the table, as described in File Header.
The exact meaning of unknown1 is unclear. Mr. Flesinak said
“sequence number (0→1: 8→13, 1→2: 22, 2→3: 27)” but I don’t know how
to interpret that. Even less is known about unknown2 . But
num_rows_small at byte 18
within the page (abbreviated nrs in
the byte field diagram above) holds the number of rows that are
present in the page, unless num_rows_large (below) holds a value
that is larger than it (but not equal to 1fff
). This seems like a
strange mechanism for dealing with the fact that some tables (like
playlist entries) have a lot of very small rows, too many to count
with a single byte. But then why not just always use
num_rows_large?
The row counter entries represent the number of rows that have ever been allocated in the page, but some will no longer be valid due to deletion or updates. To find the actual rows, you need to scan all 16 entries of each of the row groups present in the page, ignoring any whose row presence bit is zero. |
The purpose of the next two bytes are is also unclear. Of u3 Mr. Flesniak said “a bitmask (first track: 32)”, and he described u4 as “often 0, sometimes larger, especially for pages with a high number of rows (e.g. 12 for 101 rows)”.
Byte 1b
is called page_flags (abbreviated pf in the
diagram). According to Mr. Flesniak, “strange” (non-data) pages will
have the value 44
or 64
, and other pages have had the values 24
or 34
. Crate Digger considers a page to be a data page if
page_flags&40
= `0`.
Bytes 1c
-1d
are called free_size (abbreviated frees
in the diagram), and store the amount of unused space in the page heap
(excluding the row index which is built backwards from the end of the
page); used_size at bytes 1c
-1d
(abbreviated useds)
stores the number of bytes that are in use in the page heap.
Bytes 20
-21
, u5 , are of unclear purpose. Mr. Flesniak
labeled them “(0→1: 2).”
Bytes 22
-23
, num_rows_large (abbreviated numrl in
the diagram) hold the number of entries in the row index at the end of
the page when that value is too large to fit into num_rows_small
(as mentioned above), and that situation seems to be indicated when
this value is larger than num_rows_small, but not equal to 1fff
.
u6 at bytes 24
-25
seems to have the value 1004
for
strange pages, and 0000
for data pages. And Mr. Flesniak describes
u7 at bytes 26
-27
as “always 0 except 1 for history
pages, num entries for strange pages?”
After these header fields comes the page heap. Rows are allocated
within this heap starting at byte 28
. Since rows can be different
sizes, there needs to be a way to locate them. This takes the form of
a row index, which is built from the end of the page backwards, in
groups of up to sixteen row pointers along with a bitmask saying which
of those rows are still part of the table (they might have been
deleted). The number of row index entries is determined, as described
above, by the value of either num_rows_small or
num_rows_large.
The bit mask for the first group of up to sixteen rows, labeled
rowpf0 in the diagram (meaning “row presence flags group 0”), is
found near the end of the page. The last two bytes after each row
bitmask (for example pad0 after rowpf0) have an unknown
purpose and may always be zero, and the rowpf0 bitmask takes up
the two bytes that precede them. The low-order bit of this value will
be set if row 0 is really present, the next bit if row 1 is really
present, and so on. The two bytes before these flags, labeled
ofs0, store the offset of the first row in the page. This offset
is the number of bytes past the end of the page header at which the
row itself can be found. So if row 0 begins at the very beginning of
the heap, at byte 28
in the page, ofs0 would have the value
0000
.
As more rows are added to the page, space is allocated for them in the heap, and additional index entries are added at the end of the heap, growing backwards. Once there have been sixteen rows added, all the bits in rowpf0 are accounted for, and when another row is added, before its offset entry ofs16 can be added, another row bit-mask entry rowpf1 needs to be allocated, followed by its corresponding pad1. And so the row index grows backwards towards the rows that are being added forwards, and once they are too close for a new row to fit, the page is full, and another page gets allocated to the table.
Table Rows
The structure of the rows themselves is determined by the type of the table, using the values shown in Table types.
Unless otherwise noted, these table types and rows appear in export.pdb
files.
Album Rows
Album rows hold an album name and ID along with an artist association,
with the structure shown below. The unknown value at
bytes 00
-01
seems to usually have the values 80 00
. It is
followed by a two-byte value Mr. Flesniak called index_shift,
although I don’t know what that means, and another four bytes of
unknown purpose. But at bytes 08
-0b
we finally find a value
we have a use for: artist_id holds the ID of an artist row
associated with this track row. This is followed by id, the ID of
this track row itself, at bytes 0c
-0f
. We assume that there
are index tables somewhere that would let us locate the page and row
index of a record given its table type and ID, but we have not yet
found and figured them out.
This is followed by five more bytes with unknown meaning, and the final byte in the row, ofs_name is a pointer to the track name (labeled on in the byte field diagram). To find the location of the name, add ofs_name bytes to the address of the start of the track row itself. The name itself is encoded in a surprisingly baroque way, explained in DeviceSQL Strings.
Artist Rows
Artist rows hold an Artist name and ID, with the structure shown in
Artist row with nearby name or
Artist row with far name. The subtype value at
bytes 00
-01
determines which variant is used. If the artist
name was allocated close enough to the row to be reached by a single
byte offset, offset, subtype has the value 0060
, and the row has
the structure in Artist row with nearby name. If
the name is too far away for that, subtype has the value 0064
and
the row has the structure in Artist row with far
name.
In either case, subtype is followed by the unexplained two-byte
value found in many row types that Mr. Flesniak called
index_shift, and then by id, the ID of this artist row itself,
at bytes 04
-07
, an unknown value at byte 08
, and
ofs_name_near at byte 09
(labeled on), the one-byte
name offset used only in the first variant.
If subtype is 0064
, the value of ofs_name_near is ignored, and
instead the two-byte value ofs_name_far (labeled ofar) is
used.
Whichever name offset is used, it is a pointer to the artist name. To find the location of the name, add the value of the offset to the address of the start of the artist row itself. This gives the address of a DeviceSQL string holding the name, with the structure explained in DeviceSQL Strings.
Artwork Rows
Artwork rows hold an id (which tracks refer to) and the path at which the corresponding album art image file can be found, with the structure shown below. Note that in this case, the DeviceSQL string path is embedded directly into the row itself, rather than being located elsewhere in the heap through an offset. The structure of the string itself is still as described in DeviceSQL Strings.
The art file pointed to by this path will be the original
resolution 80x80 pixel image. Recent versions of rekordbox will also
add a higher resolution image, at 240x240 pixels. Its path can be
found by adding the string _m right before the file extension. So
for example if the original resolution path is /a/b/foo.jpg , the
high resolution file can be found at /a/b/foo_m.jpg .
|
Color Rows
Color rows hold a numeric color id (which controls the actual color
displayed on the player interface) at bytes 05
-06
and a
text label or name starting at byte 08
which is a
DeviceSQL string shown in the information panel
for tracks that are assigned the color. The rows have the structure
shown below. There are several bytes in the row that are not yet known
to have any meaning.
Regardless of the names assigned to the colors by the user, the row id values map to the following colors in the user interface of rekordbox and on CDJs:
ID | Meaning |
---|---|
|
No color |
|
Pink |
|
Red |
|
Orange |
|
Yellow |
|
Green |
|
Aqua |
|
Blue |
|
Purple |
Genre Rows
Genre rows hold a numeric genre id (which tracks can be assigned) at
bytes 00
-03
and a text name starting at byte 04
which is a DeviceSQL string. The rows
have the structure shown below:
History Playlist Rows
The History menu automatically records playlists of the tracks performed off a particular USB or SD card in a new, numbered playlist each time the media is mounted in a player. These playlists have names like "HISTORY 001". This table lists all the history playlists which have been created for the current database, tying their name to an ID which is used to match the History Entry Rows that make up the playlist for the corresponding performance.
The rows are much simpler than the general-purpose hierarchical
playlists described below. They hold only a
numeric id at bytes 00
-03
and a text name starting at
byte 04
which is a DeviceSQL string.
History Entry Rows
History entry rows list the tracks that belong to a particular history
playlist, and also establish the order in which they were played. They
have a very simple structure, shown below, containing only three
values. The track_id at bytes 00
-03
identifies the
track that was played at this position in the playlist, by
corresponding to the id of a row in the Track table.
The playlist_id at bytes 04
-07
identifies the history
playlist to which it belongs, by corresponding to the id of a row
in the History Playlist list. The
entry_index at bytes 08
-0b
specifies the position
within the playlist at which this entry belongs.
Key Rows
Key rows represent musial keys. They hold a numeric id (which tracks
can be assigned) at bytes 00
-03
and a text name starting
at byte 08
which is a DeviceSQL string.
(There seems to be a second copy of the ID at bytes 04
-07
.)
The rows have the structure shown below:
Label Rows
Label rows represent record labels. They hold a numeric genre id
(which tracks can be assigned) at bytes 00
-03
and a text
name starting at byte 04
which is a
DeviceSQL string. The rows have the structure
shown in Genre or Label row, above.
Playlist Tree Rows
Playlist tree rows are used to organize the hierarchical structure of the playlist menu. There is probably an index somewhere that makes it possible to find the right rows directly when loading a playlist, but we have not yet figured out how indices work in DeviceSQL databases, so Crate Digger simply reads all the rows and builds its own in-memory index of the tree.
Playlist tree rows can either represent a playlist “folder” which
contains other folders and playlists, or a regular playlist which
holds only tracks. The rows are identified by an id at
bytes 0c
-0f
, and also contain a parent_id at
bytes 00
-03
which is how the hierarchical structure is
represented: the contents of a folder are the other rows in this table
whose parent_id folder is equal to the id of the folder.
Similarly, the tracks that make up a regular playlist are the Playlist Entry Rows whose playlist_id is equal to this row’s id.
Each playlist tree row also has a text name starting at
byte 14
which is a DeviceSQL string
displayed when navigating the hierarchy, a sort_order indicator at
bytes 08
-0b
(this may be the same value used to select sort
orders when requesting menus using the dbserver protocol, shown in the
packet
analysis, but this has not yet been confirmed), and a value that
specifies whether the row defines a folder or a playlist. In the
Kaitai Struct, this value is called raw_is_folder, is found at
bytes 10
-13
, and has a non-zero value for folders. For
convenience, the struct also defines a derived value, is_folder,
which is a boolean.
The rows have the following structure:
Playlist Entry Rows
Playlist entry rows list the tracks that belong to a particular
playlist, and also establish the order in which they should be played.
They have a very simple structure, shown below, containing only three
values. The entry_index at bytes 00
--03
specifies the
position within the playlist at which this entry belongs. The
track_id at bytes 04
--07
identifies the track to be
played at this position in the playlist, by corresponding to the id
of a row in the Track table, and the playlist_id at
bytes 08
--0b
identifies the playlist to which it belongs, by
corresponding to the id of a row in the
Playlist Tree.
Tag Rows
These are present in exportExt.pdb files, not the main export.pdb database.
|
Tags provide a flexible way for DJs to categorize tracks, supported by the “My Tags” tab within rekordbox. Tags have names, and can be assigned to any number of tracks. Tags themselves can be grouped into categories, which are stored in the same table.
The rows have the following structure:
The first two bytes serve an unknown purpose but their values always seem to be the same.
They are followed by tag_index at bytes 02
--03
which seems to increment by 20
for each row.
This is followed by another eight bytes of unknown purpose that always seem to be zero.
The category at bytes 0c
–0f
holds the ID of the category to which the tag belongs; if this row is itself a category, this field has the value 0.
The category_pos at bytes 10
–13
specifies zero-based position at which this tag should be displayed within its category.
If the row represents a category rather than a tag, then this is the zero-based position of the category itself within the category list.
The id at bytes 14
–17
is how the tag or category is referenced in Tag Track Rows or the category field.
Tags seem to have very large id values, while the four categories have fixed id
values in the range 1—4.
The value of raw_is_category at bytes 18
–1b
is non-zero when this row stores a tag category instead of a tag.
It is followed by two bytes of unknown purpose that always seem to have the same values, and a byte that may hold some sort of flags that have not yet been understood.
A variable number of bytes starting at byte 1f
is a DeviceSQL string holding the name of the tag or category. Finally, this is followed by a byte with an unknown purpose, whose value always seems to be 3.
Tag Track Rows
These are present in exportExt.pdb files, not the main export.pdb database.
|
The rows have the following structure:
The first four bytes have no known purpose and always seem to be zero.
The track_id at bytes 04
--07
identifies the Track to which a tag has been assigned.
The tag_id at bytes 08
--0c
identifies a Tag the DJ has assigned to this track.
The final four bytes in the row seem to always have the values shown, and have no known purpose.
Track Rows
Track rows describe audio tracks that can be played from the media export, and provide many details about the music including links to other tables like artists, albums, keys, and others. They have the structure shown below:
The first two bytes, labeled u1, have an unknown purpose; they
usually are 24
followed by 00
. They are followed by the
unexplained two-byte value found in many row types that Mr. Flesniak
called index_shift, and a four-byte value he called bitmask,
although we do not know what the bits mean. The value at
bytes 08
-0b
, sample_rate, is the first one we have a
solid understanding of: it holds the playback sample rate of the audio
file, in samples per second (this will be 0 if it is unknown or
variable).
Bytes 0c
-0f
hold the value composer_id which identifies
the composer of the track, if known, as a non-zero id value of an
Artist row. The size of the audio file, in bytes, is
found in file_size at bytes 10
-13
. This is followed by
an unknown four-byte value, u2, which may be another ID, and two
unknown two-byte values, u3 (about which Mr. Flesniak says “always
19048?”) and u4 (“always 30967?”).
If there is cover art for the track, there will be a non-zero value in
artwork_id (bytes 1c
-1f
), identifying the id of an
Artwork row.
If a dominant musical key was identified for the track there will be a
non-zero value in key_id (bytes 20
-23
), which
represents the id of a Key row. If the track is known
to be a remake, the non-zero Artist row id of the
original performer will be found at bytes 24
-27
in
original_artist_id. If there is a known record label for the
track, the non-zero value in label_id (bytes 28
-2b
)
will link to the id of a Label row id. Similarly,
if there is a known remixer, there will be a non-zero value in
remixer_id (bytes 2c
-2f
) linking to the id of an
Artist row.
The field bitrate at bytes 30
-33
stores the playback bit
rate of the track, and track_number at bytes 34
-37
holds the position of the track within its album. tempo at
bytes 38
-3b
holds the playback tempo of the start of the
track in beats per minute, multiplied by 100 (in order to support a
precision of BPM). If there is a known genre for
the track, there will be a non-zero value in genre_id at
bytes 3c
-3f
, representing the id of a Genre
row.
If the track is part of an album, there will be a non-zero value in
album_id at bytes 40
-43
, and this will be the id of
an Album row. The Artist row id of
the primary performer associated with the track is found in
artist_id at bytes 44
-47
. And the id of the track
itself is found in id at bytes 48
-4b
. If the album is
known to consist of multiple discs, the disc number on which this
track is found will be in disc_number at bytes 4c
-4d
.
And the number of times the track has been played is found in
play_count (bytes 4e
-4f
).
The year in which the track was recorded, if known, is in year at
bytes 50
-51
. The sample depth of the track audio file (bits
per sample) is in sample_depth at bytes 52
-53
. The
playback time of the track (in seconds, at normal speed) is in
duration at bytes 54
-55
. The purpose of the next two
bytes, labeled u5, is unknown; they seem to always hold the value
29
.
Byte 58
, color_id (labeled cid in the diagram), holds
the color assigned to the track in rekordbox, as the id of a
Color row, or zero if no color has been assigned.
Byte 59
, rating (labeled r in the diagram) holds the
rating (0 to 5 stars) assigned the track. The next two bytes, labeled
u6, have an unknown purpose, and seem to always have the value 1.
The two bytes after them, labeled u7, are also unknown; Mr.
Flesniak said “alternating 2 and 3”.
The rest of the track row is an array of 21 two-byte offsets that point to DeviceSQL strings. To find the start of the string, add the address of the start of the track row to the offset. The purpose of each string is described in the following table. For convenience, the strings can be accessed as Kaitai Struct instance values with the names shown in the table:
Index | Name | Content |
---|---|---|
0 |
isrc |
International Standard Recording Code, if known, in mangled format.[2] |
1 |
texter |
Unknown, named by @flesniak. |
2 |
unknown_string_2 |
Unknown, “thought track number, wrong”. |
3 |
unknown_string_3 |
Unknown, “strange things”.[3] |
4 |
unknown_string_4 |
Unknown, “strange things” (as above). |
5 |
message |
Unknown, named by @flesniak. |
6 |
kuvo_public |
Empty or |
7 |
autoload_hotcues |
Empty or |
8 |
unknown_string_5 |
Unknown. |
9 |
unknown_string_6 |
Unknown, usually empty. |
10 |
date_added |
When the track was added to the rekordbox collection. |
11 |
release_date |
When the track was released. |
12 |
mix_name |
Name of the track remix, if any. |
13 |
unknown_string_7 |
Unknown, usually empty. |
14 |
analyze_path |
File path of the track analysis. |
15 |
analyze_date |
When track analysis was performed. |
16 |
comment |
Track comment assigned by the DJ. |
17 |
title |
Track title. |
18 |
unknown_string_8 |
Unknown, usually empty. |
19 |
filename |
Name of track audio file. |
20 |
file_path |
File path of track audio. |
DeviceSQL Strings
Many row types store string values, sometimes by directly embedding them, but more often by storing an offset to a location elsewhere in the heap. In either case the string itself uses the structure described in this section. Strings can be stored in a variety of formats. The first byte of the structure seems to be a bunch of flags from which the format can be determined. We are not certain of the details because not all formats are present in the export files we have seen, so this represents our best guess so far.[6]
Our best guess as to the interpretation of these bits follows:
bit | label | purpose |
---|---|---|
7 |
E (endianness) |
If set, the string seems to be little-endian |
6 |
A (ASCII) |
The string following is encoded in ASCII, endianness does not apply |
5 |
N (narrow) |
The contained string is encoded in UTF-8, endianness does not apply (not yet seen in practice though DeviceSQL claims to support encoding in ASCII, UTF-8, and UTF-16) |
4 |
W (wide) |
The contained string is encoded in UTF-16, endianness is determined by E |
0 |
S (short) |
If this bit is set, then the string is a Short ASCII string, and the other "flag" bits in this byte actually store its length (see below) |
The details of this analysis are somewhat speculative because the only
bit patterns we have seen in practice when S is zero are 0b01000000
for long-form ascii strings and 0b10010000
for long-form utf16le
strings (rekordbox probably just does not use the other supported
string formats).
As described above, when S is 1
, we are dealing with a short ASCII
string, and other flags are replaced by the seven-bit length field for
the string field (including this type-and-length byte, so the string
itself can be up to 126 characters long).
Short ASCII Strings
The flag byte described above is labeled lk (lengthAndKind) below. If S (the low-order bit of lk) is set, it means the string field holds a short ASCII string. The length of such a field can be extracted by right-shifting lk once (or, equivalently, dividing it by two). This length is for the entire string field, including lk itself, so the maximum length of actual string data is 126 bytes.
DeviceSQL strings do not have terminator bytes, so attempting to read more bytes than present can lead to garbage characters being present or crashing the parser for the more complex Unicode strings. ISRC Strings are the only exception. |
Long Strings
Again the flag byte described above is labeled lk (lengthAndKind) below. If S (the low-order bit of lk) is zero, it means the string field holds a long or wide string, whose format is specified by the other flag bits described above, and whose length is determined by a two-byte length field which follows the flag byte:
As always, length represents the length of the entire field including the header bytes, so the length of the actual string data is . We have only ever seen zero values for the pad byte. The encoding of the string data is determined by the flag bits in lk_ as described above.
ISRC Strings
When an International Standard Recording Code is present as the
first string pointer in a track row, it is marked with kind 90
but
does not actually hold a UTF-16-LE string. Instead, the first byte
after the pad
value following the length is the value 03
and then there are
bytes of ASCII, followed by a null byte. Crate
Digger does not yet attempt to cope with this.
00
byte in DeviceSQL UTF strings; we previously believed they were big-endian.