hivex - Windows Registry "hive" extraction library
#include <hivex.h>
hive_h *hivex_open (const char *filename, int flags);
int hivex_close (hive_h *h);
hive_node_h hivex_root (hive_h *h);
int64_t hivex_last_modified (hive_h *h);
char *hivex_node_name (hive_h *h, hive_node_h node);
size_t hivex_node_name_len (hive_h *h, hive_node_h node);
int64_t hivex_node_timestamp (hive_h *h, hive_node_h node);
hive_node_h *hivex_node_children (hive_h *h, hive_node_h node);
hive_node_h hivex_node_get_child (hive_h *h, hive_node_h node, const char *name);
size_t hivex_node_nr_children (hive_h *h, hive_node_h node);
hive_node_h hivex_node_parent (hive_h *h, hive_node_h node);
hive_value_h *hivex_node_values (hive_h *h, hive_node_h node);
hive_value_h hivex_node_get_value (hive_h *h, hive_node_h node, const char *key);
size_t hivex_node_nr_values (hive_h *h, hive_node_h node);
size_t hivex_value_key_len (hive_h *h, hive_value_h val);
char *hivex_value_key (hive_h *h, hive_value_h val);
int hivex_value_type (hive_h *h, hive_value_h val, hive_type *t, size_t *len);
size_t hivex_node_struct_length (hive_h *h, hive_node_h node);
size_t hivex_value_struct_length (hive_h *h, hive_value_h val);
hive_value_h hivex_value_data_cell_offset (hive_h *h, hive_value_h val, size_t *len);
char *hivex_value_value (hive_h *h, hive_value_h val, hive_type *t, size_t *len);
char *hivex_value_string (hive_h *h, hive_value_h val);
char **hivex_value_multiple_strings (hive_h *h, hive_value_h val);
int32_t hivex_value_dword (hive_h *h, hive_value_h val);
int64_t hivex_value_qword (hive_h *h, hive_value_h val);
int hivex_commit (hive_h *h, const char *filename, int flags);
hive_node_h hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name);
int hivex_node_delete_child (hive_h *h, hive_node_h node);
int hivex_node_set_values (hive_h *h, hive_node_h node, size_t nr_values, const hive_set_value *values, int flags);
int hivex_node_set_value (hive_h *h, hive_node_h node, const hive_set_value *val, int flags);
Link with -lhivex.
Hivex is a library for extracting the contents of Windows Registry "hive" files. It is designed to be secure against buggy or malicious registry files.
Unlike other tools in this area, it doesn't use the textual .REG format, because parsing that is as much trouble as parsing the original binary format. Instead it makes the file available through a C API, and then wraps this API in higher level scripting and GUI tools.
There is a separate program to export the hive as XML (see hivexml(1)), or to navigate the file (see hivexsh(1)). There is also a Perl script to export and merge the file as a textual .REG (regedit) file, see hivexregedit(1).
If you just want to export or modify the Registry of a Windows virtual machine, you should look at virt-win-reg(1).
Hivex is also comes with language bindings for OCaml, Perl, Python and Ruby.
hive_h *
This handle describes an open hive file.
hive_node_h
This is a node handle, an integer but opaque outside the library. Valid node handles cannot be 0. The library returns 0 in some situations to indicate an error.
hive_type
The enum below describes the possible types for the value(s) stored at each node. Note that you should not trust the type field in a Windows Registry, as it very often has no relationship to reality. Some applications use their own types. The encoding of strings is not specified. Some programs store everything (including strings) in binary blobs.
enum hive_type {
/* Just a key without a value */
hive_t_REG_NONE = 0,
/* A Windows string (encoding is unknown, but often UTF16-LE) */
hive_t_REG_SZ = 1,
/* A Windows string that contains %env% (environment variable expansion) */
hive_t_REG_EXPAND_SZ = 2,
/* A blob of binary */
hive_t_REG_BINARY = 3,
/* DWORD (32 bit integer), little endian */
hive_t_REG_DWORD = 4,
/* DWORD (32 bit integer), big endian */
hive_t_REG_DWORD_BIG_ENDIAN = 5,
/* Symbolic link to another part of the registry tree */
hive_t_REG_LINK = 6,
/* Multiple Windows strings. See http://blogs.msdn.com/oldnewthing/archive/2009/10/08/9904646.aspx */
hive_t_REG_MULTI_SZ = 7,
/* Resource list */
hive_t_REG_RESOURCE_LIST = 8,
/* Resource descriptor */
hive_t_REG_FULL_RESOURCE_DESCRIPTOR = 9,
/* Resouce requirements list */
hive_t_REG_RESOURCE_REQUIREMENTS_LIST = 10,
/* QWORD (64 bit integer), unspecified endianness but usually little endian */
hive_t_REG_QWORD = 11,
};
hive_value_h
This is a value handle, an integer but opaque outside the library. Valid value handles cannot be 0. The library returns 0 in some situations to indicate an error.
hive_set_value
The typedef hive_set_value
is used in conjunction with the hivex_node_set_values
call described below.
struct hive_set_value {
char *key; /* key - a UTF-8 encoded ASCIIZ string */
hive_type t; /* type of value field */
size_t len; /* length of value field in bytes */
char *value; /* value field */
};
typedef struct hive_set_value hive_set_value;
To set the default value for a node, you have to pass key = ""
.
Note that the value
field is just treated as a list of bytes, and is stored directly in the hive. The caller has to ensure correct encoding and endianness, for example converting dwords to little endian.
The correct type and encoding for values depends on the node and key in the registry, the version of Windows, and sometimes even changes between versions of Windows for the same key. We don't document it here. Often it's not documented at all.
hive_h *hivex_open (const char *filename, int flags);
Opens the hive named filename
for reading.
Flags is an ORed list of the open flags (or 0
if you don't want to pass any flags). These flags are defined:
Verbose messages.
Very verbose messages, suitable for debugging problems in the library itself.
This is also selected if the HIVEX_DEBUG
environment variable is set to 1.
Open the hive for writing. If omitted, the hive is read-only.
Open the hive in unsafe mode that enables heuristics to handle corrupted hives.
This may allow to read or write registry keys/values that appear intact in an otherwise corrupted hive. Use at your own risk.
Returns a new hive handle. On error this returns NULL and sets errno.
int hivex_close (hive_h *h);
Close a hive handle and free all associated resources.
Note that any uncommitted writes are not committed by this call, but instead are lost. See "WRITING TO HIVE FILES" in hivex(3).
Returns 0 on success. On error this returns -1 and sets errno.
This function frees the hive handle (even if it returns an error). The hive handle must not be used again after calling this function.
hive_node_h hivex_root (hive_h *h);
Return root node of the hive. All valid hives must contain a root node.
Returns a node handle. On error this returns 0 and sets errno.
int64_t hivex_last_modified (hive_h *h);
Return the modification time from the header of the hive.
The returned value is a Windows filetime. To convert this to a Unix time_t
see: http://stackoverflow.com/questions/6161776/convert-windows-filetime-to-second-in-unix-linux/6161842#6161842
char *hivex_node_name (hive_h *h, hive_node_h node);
Return the name of the node.
Note that the name of the root node is a dummy, such as $$$PROTO.HIV
(other names are possible: it seems to depend on the tool or program that created the hive in the first place). You can only know the "real" name of the root node by knowing which registry file this hive originally comes from, which is knowledge that is outside the scope of this library.
The name is recoded to UTF-8 and may contain embedded NUL characters.
Returns a string. The string must be freed by the caller when it is no longer needed. On error this returns NULL and sets errno.
size_t hivex_node_name_len (hive_h *h, hive_node_h node);
Return the length of the node name as produced by hivex_node_name
.
Returns a size. On error this returns 0 and sets errno.
int64_t hivex_node_timestamp (hive_h *h, hive_node_h node);
Return the modification time of the node.
The returned value is a Windows filetime. To convert this to a Unix time_t
see: http://stackoverflow.com/questions/6161776/convert-windows-filetime-to-second-in-unix-linux/6161842#6161842
hive_node_h *hivex_node_children (hive_h *h, hive_node_h node);
Return an array of nodes which are the subkeys (children) of node
.
Returns a 0-terminated array of nodes. The array must be freed by the caller when it is no longer needed. On error this returns NULL and sets errno.
hive_node_h hivex_node_get_child (hive_h *h, hive_node_h node, const char *name);
Return the child of node with the name name
, if it exists.
The name is matched case insensitively.
Returns a node handle. If the node was not found, this returns 0 without setting errno. On error this returns 0 and sets errno.
size_t hivex_node_nr_children (hive_h *h, hive_node_h node);
Return the number of nodes as produced by hivex_node_children
.
Returns a size. On error this returns 0 and sets errno.
hive_node_h hivex_node_parent (hive_h *h, hive_node_h node);
Return the parent of node
.
The parent pointer of the root node in registry files that we have examined seems to be invalid, and so this function will return an error if called on the root node.
Returns a node handle. On error this returns 0 and sets errno.
hive_value_h *hivex_node_values (hive_h *h, hive_node_h node);
Return the array of (key, value) pairs attached to this node.
Returns a 0-terminated array of values. The array must be freed by the caller when it is no longer needed. On error this returns NULL and sets errno.
hive_value_h hivex_node_get_value (hive_h *h, hive_node_h node, const char *key);
Return the value attached to this node which has the name key
, if it exists.
The key name is matched case insensitively.
Note that to get the default key, you should pass the empty string ""
here. The default key is often written "@"
, but inside hives that has no meaning and won't give you the default key.
Returns a value handle. On error this returns 0 and sets errno.
size_t hivex_node_nr_values (hive_h *h, hive_node_h node);
Return the number of (key, value) pairs attached to this node as produced by hivex_node_values
.
Returns a size. On error this returns 0 and sets errno.
size_t hivex_value_key_len (hive_h *h, hive_value_h val);
Return the length of the key (name) of a (key, value) pair as produced by hivex_value_key
. The length can legitimately be 0, so errno is the necessary mechanism to check for errors.
In the context of Windows Registries, a zero-length name means that this value is the default key for this node in the tree. This is usually written as "@"
.
The key is recoded to UTF-8 and may contain embedded NUL characters.
Returns a size. On error this returns 0 and sets errno.
char *hivex_value_key (hive_h *h, hive_value_h val);
Return the key (name) of a (key, value) pair. The name is reencoded as UTF-8 and returned as a string.
The string should be freed by the caller when it is no longer needed.
Note that this function can return a zero-length string. In the context of Windows Registries, this means that this value is the default key for this node in the tree. This is usually written as "@"
.
Returns a string. The string must be freed by the caller when it is no longer needed. On error this returns NULL and sets errno.
int hivex_value_type (hive_h *h, hive_value_h val, hive_type *t, size_t *len);
Return the data length and data type of the value in this (key, value) pair. See also hivex_value_value
which returns all this information, and the value itself. Also, hivex_value_*
functions below which can be used to return the value in a more useful form when you know the type in advance.
Returns 0 on success. On error this returns -1 and sets errno.
size_t hivex_node_struct_length (hive_h *h, hive_node_h node);
Return the length of the node data structure.
Returns a size. On error this returns 0 and sets errno.
size_t hivex_value_struct_length (hive_h *h, hive_value_h val);
Return the length of the value data structure.
Returns a size. On error this returns 0 and sets errno.
hive_value_h hivex_value_data_cell_offset (hive_h *h, hive_value_h val, size_t *len);
Return the offset and length of the value's data cell.
The data cell is a registry structure that contains the length (a 4 byte, little endian integer) followed by the data.
If the length of the value is less than or equal to 4 bytes then the offset and length returned by this function is zero as the data is inlined in the value.
Returns 0 and sets errno on error.
Returns a value handle. On error this returns 0 and sets errno.
char *hivex_value_value (hive_h *h, hive_value_h val, hive_type *t, size_t *len);
Return the value of this (key, value) pair. The value should be interpreted according to its type (see hive_type
).
The value is returned as an array of bytes (of length len
). The value must be freed by the caller when it is no longer needed. On error this returns NULL and sets errno.
char *hivex_value_string (hive_h *h, hive_value_h val);
If this value is a string, return the string reencoded as UTF-8 (as a C string). This only works for values which have type hive_t_string
, hive_t_expand_string
or hive_t_link
.
Returns a string. The string must be freed by the caller when it is no longer needed. On error this returns NULL and sets errno.
char **hivex_value_multiple_strings (hive_h *h, hive_value_h val);
If this value is a multiple-string, return the strings reencoded as UTF-8 (in C, as a NULL-terminated array of C strings, in other language bindings, as a list of strings). This only works for values which have type hive_t_multiple_strings
.
Returns a NULL-terminated array of C strings. The strings and the array must all be freed by the caller when they are no longer needed. On error this returns NULL and sets errno.
int32_t hivex_value_dword (hive_h *h, hive_value_h val);
If this value is a DWORD (Windows int32), return it. This only works for values which have type hive_t_dword
or hive_t_dword_be
.
int64_t hivex_value_qword (hive_h *h, hive_value_h val);
If this value is a QWORD (Windows int64), return it. This only works for values which have type hive_t_qword
.
int hivex_commit (hive_h *h, const char *filename, int flags);
Commit (write) any changes which have been made.
filename
is the new file to write. If filename
is null/undefined then we overwrite the original file (ie. the file name that was passed to hivex_open
).
Note this does not close the hive handle. You can perform further operations on the hive after committing, including making more modifications. If you no longer wish to use the hive, then you should close the handle after committing.
The flags parameter is unused. Always pass 0.
Returns 0 on success. On error this returns -1 and sets errno.
hive_node_h hivex_node_add_child (hive_h *h, hive_node_h parent, const char *name);
Add a new child node named name
to the existing node parent
. The new child initially has no subnodes and contains no keys or values. The sk-record (security descriptor) is inherited from the parent.
The parent must not have an existing child called name
, so if you want to overwrite an existing child, call hivex_node_delete_child
first.
Returns a node handle. On error this returns 0 and sets errno.
int hivex_node_delete_child (hive_h *h, hive_node_h node);
Delete the node node
. All values at the node and all subnodes are deleted (recursively). The node
handle and the handles of all subnodes become invalid. You cannot delete the root node.
Returns 0 on success. On error this returns -1 and sets errno.
int hivex_node_set_values (hive_h *h, hive_node_h node, size_t nr_values, const hive_set_value *values, int flags);
This call can be used to set all the (key, value) pairs stored in node
.
node
is the node to modify.
The flags parameter is unused. Always pass 0.
values
is an array of (key, value) pairs. There should be nr_values
elements in this array.
Any existing values stored at the node are discarded, and their hive_value_h
handles become invalid. Thus you can remove all values stored at node
by passing nr_values = 0
.
Returns 0 on success. On error this returns -1 and sets errno.
int hivex_node_set_value (hive_h *h, hive_node_h node, const hive_set_value *val, int flags);
This call can be used to replace a single (key, value)
pair stored in node
. If the key does not already exist, then a new key is added. Key matching is case insensitive.
node
is the node to modify.
The flags parameter is unused. Always pass 0.
value
is a single (key, value) pair.
Existing hive_value_h
handles become invalid.
Returns 0 on success. On error this returns -1 and sets errno.
The hivex library supports making limited modifications to hive files. We have tried to implement this very conservatively in order to reduce the chance of corrupting your registry. However you should be careful and take back-ups, since Microsoft has never documented the hive format, and so it is possible there are nuances in the reverse-engineered format that we do not understand.
To be able to modify a hive, you must pass the HIVEX_OPEN_WRITE
flag to hivex_open
, otherwise any write operation will return with errno EROFS
.
The write operations shown below do not modify the on-disk file immediately. You must call hivex_commit
in order to write the changes to disk. If you call hivex_close
without committing then any writes are discarded.
Hive files internally consist of a "memory dump" of binary blocks (like the C heap), and some of these blocks can be unused. The hivex library never reuses these unused blocks. Instead, to ensure robustness in the face of the partially understood on-disk format, hivex only allocates new blocks after the end of the file, and makes minimal modifications to existing structures in the file to point to these new blocks. This makes hivex slightly less disk-efficient than it could be, but disk is cheap, and registry modifications tend to be very small.
When deleting nodes, it is possible that this library may leave unreachable live blocks in the hive. This is because certain parts of the hive disk format such as security (sk) records and big data (db) records and classname fields are not well understood (and not documented at all) and we play it safe by not attempting to modify them. Apart from wasting a little bit of disk space, it is not thought that unreachable blocks are a problem.
Changing the root node.
Creating a new hive file from scratch. This is impossible at present because not all fields in the header are understood. In the hivex source tree is a file called images/minimal
which could be used as the basis for a new hive (but caveat emptor).
Modifying or deleting single values at a node.
Modifying security key (sk) records or classnames. Previously we did not understand these records. However now they are well-understood and we could add support if it was required (but nothing much really uses them).
The visitor pattern is useful if you want to visit all nodes in the tree or all nodes below a certain point in the tree.
First you set up your own struct hivex_visitor
with your callback functions.
Each of these callback functions should return 0 on success or -1 on error. If any callback returns -1, then the entire visit terminates immediately. If you don't need a callback function at all, set the function pointer to NULL.
struct hivex_visitor {
int (*node_start) (hive_h *, void *opaque, hive_node_h, const char *name);
int (*node_end) (hive_h *, void *opaque, hive_node_h, const char *name);
int (*value_string) (hive_h *, void *opaque, hive_node_h, hive_value_h,
hive_type t, size_t len, const char *key, const char *str);
int (*value_multiple_strings) (hive_h *, void *opaque, hive_node_h,
hive_value_h, hive_type t, size_t len, const char *key, char **argv);
int (*value_string_invalid_utf16) (hive_h *, void *opaque, hive_node_h,
hive_value_h, hive_type t, size_t len, const char *key,
const char *str);
int (*value_dword) (hive_h *, void *opaque, hive_node_h, hive_value_h,
hive_type t, size_t len, const char *key, int32_t);
int (*value_qword) (hive_h *, void *opaque, hive_node_h, hive_value_h,
hive_type t, size_t len, const char *key, int64_t);
int (*value_binary) (hive_h *, void *opaque, hive_node_h, hive_value_h,
hive_type t, size_t len, const char *key, const char *value);
int (*value_none) (hive_h *, void *opaque, hive_node_h, hive_value_h,
hive_type t, size_t len, const char *key, const char *value);
int (*value_other) (hive_h *, void *opaque, hive_node_h, hive_value_h,
hive_type t, size_t len, const char *key, const char *value);
/* If value_any callback is not NULL, then the other value_*
* callbacks are not used, and value_any is called on all values.
*/
int (*value_any) (hive_h *, void *opaque, hive_node_h, hive_value_h,
hive_type t, size_t len, const char *key, const char *value);
};
int hivex_visit (hive_h *h, const struct hivex_visitor *visitor, size_t len, void *opaque, int flags);
Visit all the nodes recursively in the hive h
.
visitor
should be a hivex_visitor
structure with callback fields filled in as required (unwanted callbacks can be set to NULL). len
must be the length of the 'visitor' struct (you should pass sizeof (struct hivex_visitor)
for this).
This returns 0 if the whole recursive visit was completed successfully. On error this returns -1. If one of the callback functions returned an error than we don't touch errno. If the error was generated internally then we set errno.
You can skip bad registry entries by setting flag
to HIVEX_VISIT_SKIP_BAD
. If this flag is not set, then a bad registry causes the function to return an error immediately.
This function is robust if the registry contains cycles or pointers which are invalid or outside the registry. It detects these cases and returns an error.
int hivex_visit_node (hive_h *h, hive_node_h node, const struct hivex_visitor *visitor, size_t len, void *opaque);
Same as hivex_visit
but instead of starting out at the root, this starts at node
.
Note: To understand the relationship between hives and the common Windows Registry keys (like HKEY_LOCAL_MACHINE
) please see the Wikipedia page on the Windows Registry.
The Windows Registry is split across various binary files, each file being known as a "hive". This library only handles a single hive file at a time.
Hives are n-ary trees with a single root. Each node in the tree has a name.
Each node in the tree (including non-leaf nodes) may have an arbitrary list of (key, value) pairs attached to it. It may be the case that one of these pairs has an empty key. This is referred to as the default key for the node.
The (key, value) pairs are the place where the useful data is stored in the registry. The key is always a string (possibly the empty string for the default key). The value is a typed object (eg. string, int32, binary, etc.).
The hivex C library does not care about or deal with Windows .REG files. Instead we push this complexity up to the Perl Win::Hivex(3) library and the Perl programs hivexregedit(1) and virt-win-reg(1). Nevertheless it is useful to look at the relationship between the Registry and .REG files because they are so common.
A .REG file is a textual representation of the registry, or part of the registry. The actual registry hives that Windows uses are binary files. There are a number of Windows and Linux tools that let you generate .REG files, or merge .REG files back into the registry hives. Notable amongst them is Microsoft's REGEDIT program (formerly known as REGEDT32).
A typical .REG file will contain many sections looking like this:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Stack]
"@"="Generic Stack"
"TileInfo"="prop:System.FileCount"
"TilePath"=str(2):"%systemroot%\\system32"
"ThumbnailCutoff"=dword:00000000
"FriendlyTypeName"=hex(2):40,00,25,00,53,00,79,00,73,00,74,00,65,00,6d,00,52,00,6f,00,\
6f,00,74,00,25,00,5c,00,53,00,79,00,73,00,74,00,65,00,6d,00,\
33,00,32,00,5c,00,73,00,65,00,61,00,72,00,63,00,68,00,66,00,\
6f,00,6c,00,64,00,65,00,72,00,2e,00,64,00,6c,00,6c,00,2c,00,\
2d,00,39,00,30,00,32,00,38,00,00,00,d8
Taking this one piece at a time:
[HKEY_LOCAL_MACHINE\SOFTWARE\Classes\Stack]
This is the path to this node in the registry tree. The first part, HKEY_LOCAL_MACHINE\SOFTWARE
means that this comes from a hive file called C:\WINDOWS\SYSTEM32\CONFIG\SOFTWARE
. \Classes\Stack
is the real path part, starting at the root node of the SOFTWARE
hive.
Below the node name is a list of zero or more key-value pairs. Any interior or leaf node in the registry may have key-value pairs attached.
"@"="Generic Stack"
This is the "default key". In reality (ie. inside the binary hive) the key string is the empty string. In .REG files this is written as @
but this has no meaning either in the hives themselves or in this library. The value is a string (type 1 - see enum hive_type
above).
"TileInfo"="prop:System.FileCount"
This is a regular (key, value) pair, with the value being a type 1 string. Note that inside the binary file the string is likely to be UTF-16LE encoded. This library converts to and from UTF-8 strings transparently in some cases.
"TilePath"=str(2):"%systemroot%\\system32"
The value in this case has type 2 (expanded string) meaning that some %...% variables get expanded by Windows. (This library doesn't know or care about variable expansion).
"ThumbnailCutoff"=dword:00000000
The value in this case is a dword (type 4).
"FriendlyTypeName"=hex(2):40,00,....
This value is an expanded string (type 2) represented in the .REG file as a series of hex bytes. In this case the string appears to be a UTF-16LE string.
Many functions in this library set errno to indicate errors. These are the values of errno you may encounter (this list is not exhaustive):
Corrupt or unsupported Registry file format.
Missing root key.
Passed an invalid argument to the function.
Followed a Registry pointer which goes outside the registry or outside a registry block.
Registry contains cycles.
Field in the registry out of range.
Registry key already exists.
Tried to write to a registry which is not opened for writing.
Setting HIVEX_DEBUG=1 will enable very verbose messages. This is useful for debugging problems with the library itself.
hivexget(1), hivexml(1), hivexsh(1), hivexregedit(1), virt-win-reg(1), Win::Hivex(3), guestfs(3), http://libguestfs.org/, virt-cat(1), virt-edit(1), http://en.wikipedia.org/wiki/Windows_Registry.
Richard W.M. Jones (rjones at redhat dot com
)
Copyright (C) 2009-2010 Red Hat Inc.
Derived from code by Petter Nordahl-Hagen under a compatible license: Copyright (C) 1997-2007 Petter Nordahl-Hagen.
Derived from code by Markus Stephany under a compatible license: Copyright (C) 2000-2004 Markus Stephany.
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; version 2.1 of the License only.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.