Decoding the nVault Format


A short while ago, I hit a dead-end with an issue I had been working on with a game server I was maintaining. The game server was HLDS running AMXModX, with the “Nintendo Mod” plugin. The details of the server aren’t really important, except to note that the person who originally configured it had set it up to use MySQL as its datastore.

Because of the data-intensive nature of the server, any latency between the HLDS instance and the MySQL database resulted in possible data corruption. This had to be fixed.

In order to fix this, I decided to move the data storage from MySQL to AMXX’s native datastore, nVault.

The Goal

In order to migrate the MySQL database to the nVault format, I would need to dump the database into a tab-delimited format, determine the format of the data in a working nVault vault for the plugin, and then process the tab-delimited data into the vault format.

MySQL

Although this post isn’t necessarily about the process I used for this plugin, I suspect that others might be interested in the method I used, for their own uses.

The MySQL table which Nintendo Mod stored its data in had two important columns. These columns essentially mapped to the keys and values for the nVault version of the datastore. The key column was in the format “{SteamID}_{CharacterName}”, such as “STEAM_0:0:2007001_Bowser”. The value column was a string of integers, in the format “{Level} {Experience} {Skill1} {Skill2} {Skill3} {Powerup}”, such as “1 0 0 0 0 0″.

To dump this table into a tab-delimited file, I simply used the mysqldump utility with the –tab switch. This will output the dumped tables into individual CSV files in the directory you specify in the parameter to the –tab switch.

Once I had the data, all that was required was a quick PHP script to process the file into the nVault format.

The nVault Format

Unfortunately, the only editors that existed for nVault, prior to my exploits later mentioned, were GUI editors that only ran on Windows. None of them are at all useful for creating new vault files, except for if you wish to spend a great deal of time manually entering entries.

Thus, I had to write my own handler for the format. To do this, I had to decode the format using a hex editor and the source code for the module that AMXX itself uses to work with the format.

The Chase – Cut to it.

If you’re uninterested in the details of how I decoded the format, and simply want to know how to work with it, you can read the wiki article I wrote for AlliedMods.net on the subject.

AMXModX’s Way

AMXX’s module for working with nVault can be found in its Mercurial repository; the two files of interest are NVault.cpp and NVault.h. The rest is just supplementary code.

The important part of NVault.h can be found from line 10 to line 23. The #defines specify two important constants, and the remainder is a vague explanation of how the format comes together.

NVault.cpp is where the “magic” happens. It is here that we can see how the files are written and read by AMXX, so that we can translate that to a higher language (in this case, I rewrote it in PHP).

To begin to understand how the format comes together, one can simply read the NVault::_ReadFromFile() method starting at line 62.

The important variables to keep in mind are initialized at line 77 and on. These are important because nVault is essentially a crude dump of the program’s memory into a file. Because of this, the endianness of each variable must be taken into account.

The method starts by opening a file handle to the vault, and initializing its BinaryReader object. The BinaryReader class simply handles the reading in of the different variable types.

Next, it initializes a few important variables, which it will store each entry of the vault in as it reads.

The “magic” of the file is then checked, to ensure that what is being read in is indeed an nVault file. This is a common trick used in proprietary formats, where the file begins with a string of characters indicating the file format. In the case of nVault, the string is “nVLT” in ASCII, stored as a UInt32 (4 bytes). Keep this in mind for when we go to read the file in using another language, as the magic actually provides us a way to determine the endianness of the system which wrote the vault.

Next, the version of nVault used to write the file is checked, which is stored as a UInt16 (2 bytes). Because there is no standard for the file, beyond the files we’re reading, it’s best to assume that any future or past version will not be compatible with any other.

Lastly, before the main loop, is a count of the entries in the vault, stored as an Int32. This number isn’t all that important, as there isn’t any footer, and we can therefore just loop until we hit the EOF, but the way that the module is written uses the number to iterate until the end.

Now begins the main loop of the read. Each iteration begins by reading in an Int32 indicating the time when the entry was added, in Unix Timestamp format.

The length of the key, a UInt8 (1 byte), and the length of the value, a UInt16 (2 bytes), are read in. This allows each entry to have a key of a maximum of 255 characters, and a value of a maximum of 65,535 characters.

After a quick check to make sure its buffers are large enough (to avoid buffer overflows), the iteration reads in the number of bytes indicated by both the key and the value.

The data is then added to a hashmap, so that AMXX can look it up quickly.

Making It Work in PHP

Because the format is heavily dependent upon data types that PHP isn’t inherently aware of, with its scalar variables, it takes a bit of doing to read in and write out nVault files. However, all of the functionality for doing so is available in the core of PHP.

The Chase – Cut to it, again.

If you don’t care how one can read and write nVault files in PHP, but rather just want the code for doing so, my nVault class for PHP can be found here.

Working With Binary Data in PHP

This almost deserves another post entirely, as it’s very useful to know, but I’ll detail my specific process for nVault here.

As mentioned elsewhere, every nVault file starts with four bytes of magic, two bytes indicating the version, and then the number of entries.

To begin reading an nVault file with PHP, we must first determine the endianness of the file. I’ve mentioned this word a couple of times already, but it’s now that it’s actually important to know what it means.

The quick and dirty explanation is that different architectures use different “endiannesses,” which is the order in which binary data is represented. With big-endian, each bit doubles its addition to the value, so that the first bit adds 1 to the value, the second adds 2, the third adds 4 and so on. In little-endian, this is reversed; the last bit adds 1, the second-to-last adds 2, and so on. That’s it.

In order to determine what the endianness is, we need only determine the endianness of the vault magic. Because we know what the value should be, we can “unpack” the binary data in both endiannesses, and check which one matches. Whichever produces “nVLT” (0x6E564C54) is the correct endianness.

After we’ve determined that, we can read in the vault version and determine whether it’s correct. And, finally, the number of entries.

Let’s examine the code I use in my PHP class to do this:

Could not embed GitHub Gist 1352563: Not Found

Here we are testing the endianness of the file. To do this, we read four bytes from the vault, and then use the unpack function with the format of “V” (Unsigned 32-bit Integer, little-endian) and “N” (Unsigned 32-bit Integer, big-endian).

Whichever produces the magic is the correct endian.

Could not embed GitHub Gist 1352563: Not Found

Now that we know what the endianness of the file is, we can unpack the version. For clarity’s sake, I use ternary IFs, where $endian being true means little-endian, and false being big-endian.

If the version from the file matches the version we want, – 2, in this case – we proceed.

Could not embed GitHub Gist 1352563: Not Found

Lastly before the main loop, we check the number of entries. While I mentioned that this isn’t actually necessary, I use the iteration method to conform as closely as possible to the AMXX implementation, for posterity’s sake.

Now onto the main loop.

Could not embed GitHub Gist 1352563: Not Found

This part’s pretty self-explanatory, but I’ll just reiterate that we’re iterating for the number of entries that the vault file specifies in its header.

Could not embed GitHub Gist 1352563: Not Found

Beginning each entry is the four-byte Unix Timestamp indicating when the entry was added.

Could not embed GitHub Gist 1352563: Not Found

Next comes the 1-byte Key Length, followed by the 2-byte Value Length. These indicate the number of bytes of read in for each.

Could not embed GitHub Gist 1352563: Not Found

We now read the key and value in, and then store the entry in an array. The key and value are both strings, which don’t have an endianness, and therefore need not be unpacked.

Conclusion

So, in the end, the format’s pretty easy to understand, notwithstanding the whole use of primitives. The next issue I’ve encountered with this is getting around nVault’s journalling to make it accept the file. Once I’ve figured that out, I’ll update this post.

I’ve created an online editor for nVault files, which is still in the works. I’ll be releasing the source for it shortly, but for now you can find it at http://nvault.thefrozenfire.com.

Additionally, if I’ve made any glaring mistakes in my analysis of the format, or my resulting code, please feel free to mention it in the comments section. When I first approached this issue, I had only the understanding of C++ that comes from working with PHP, a C-style language. I’m probably wrong somewhere.

  1. #1 by aparadekto on October 26, 2010 - 1:13 pm

    Hey, I can’t view your site properly within Opera, I actually hope you look into fixing this.

    • #2 by Justin Martin on November 5, 2010 - 7:00 am

      I suspect the issue was that I was using the Google Fonts plugin to provide nicer fonts for the site. I ran the site through browsershots and found that some browsers were displaying broken characters.

      I’ve reverted the site to web-safe fonts.

(will not be published)
*