Message Bus Messages
The core of Truxton is the message bus.
Contents
Message Structure
Here's the C/C++ structure:
struct TRUXTON_MESSAGE
{
GUID file_id;
GUID parent_id;
GUID media_id;
GUID depot_id;
uint64_t offset;
uint64_t length;
uint8_t * file_contents;
uint32_t queue_is_empty;
uint32_t route_id;
uint32_t priority;
uint32_t signature;
uint32_t dont_route;
uint16_t file_type;
uint16_t types[16];
char depot_filename[MAX_PATH];
char md5[33];
};
The TRUXTON_MESSAGE
structure is 440 bytes long.
Members
When this structure is used to pass a file object around the ETL layer, the members are:
file_id
The GUID of the file.
This corresponds to the [ID]
column of the [File]
table.
parent_id
The GUID of the parent of this file.
This corresponds to the [ParentFileID]
column of the [File]
table.
media_id
The GUID of the file.
This corresponds to the [MediaID]
column of the [File]
table.
depot_id
The GUID of the file.
This corresponds to the [DepotID]
column of the [Content]
table
and the [ID]
column of the [Depot]
table.
offset
This is offset into the depot file where this file's contents begin. If the file has no contents, this value will be zero.
length
The number of bytes of content in the depot file for this file's contents. If the file has no contents, this value will be zero.
file_contents
An optional pointer to the file's contents. Often times, a small file can be included in the message itself.
queue_is_empty
This is non-zero when there are no more files in the message queue. All other members will be zeroed out.
route_id
This is used for routing this file downstream from you.
It corresponds to the [LoadConfigurationID]
column of the [ETLRoute]
table.
You will also see it in the [ID]
column of the [LoadConfiguration]
table as well as the [LoadConfigurationID]
column of the [Media]
table.
priority
The priority of the message. A higher numeric value means the message will be processed before a lower numeric value. 99 is a higher priority than 1.
signature
This contains the first four bytes of the file.
This corresponds to the [Signature]
column of the [File]
table.
dont_route
Contains a non-zero value if this message is for your ETL only and should not be routed to the other ETL's on the message bus.
file_type
The more accurate type for the file's contents.
It can be a defined constant but it must exist in the [ID]
column of the [FileType]
table.
It will be stored in the [FileTypeID]
column of the [File]
table.
types
While the file_type
member contains the most accurate file type, a file may have several types associated with it.
The order in which file types are listed in the types
array should be in the most accurate to most generic.
Take, for example, a Word 2007 document which is a specialization of the open package file format which is based on Zip.
depot_filename
The name of the depot file you need to open to get to the file's contents.
If the file_contents
member doesn't contain the contents, you should open this file and seek to the offset
position to read up to length
bytes.
If the file has no contents or the contents were eliminated, this string will be empty.
md5
This an ASCII representation of the MD5 hash of the file's contents.
Message Database Record
Truxton normally uses PostgreSQL to be the message queue server.
The template for a message queue is defined in the C:\Program Files\Truxton\Database\create-tmb-tables.sql
file.
All message queues have the following schema:
CREATE TABLE "queue template"
(
"QueueItemID" uuid DEFAULT (md5(((random())::text || (clock_timestamp())::text)))::uuid PRIMARY KEY,
"QueueItemAvailable" boolean NOT NULL,
"Priority" integer DEFAULT 1000 NOT NULL,
"RouteID" integer NOT NULL,
"FileID" uuid NOT NULL,
"ParentID" uuid NOT NULL,
"MediaID" uuid NOT NULL,
"DepotID" uuid NOT NULL,
"Offset" bigint NOT NULL,
"Length" bigint NOT NULL,
"MD5" character(32) NOT NULL,
"Signature" integer NOT NULL,
"Type" smallint NOT NULL,
"Types" smallint[],
"DepotFilename" text NOT NULL,
"Don't Route" boolean NOT NULL,
"File Contents" varchar(64000)
);
Mapping Other Messages to this Structure
By mapping structures that have nothing to do with files to the Truxton Message structure, we can leverage all of the message queue code. We map several other types of message to this message structure.
Carve
File carving in Truxton is a parallel process.
The space to be carved is divided into blocks.
Each block is packaged in a message and sent to ETLs that registered for the Type_Truxton_Control_Message_Carve
file type.
Options
There are several options that are set for carving.
Logical File ID
Media ID
The GUID of the media this block came from.
This corresponds to the [ID]
column of the [Media]
table.
Block Size
The number of bytes in a logical block. Scanning will take place on block boundaries. When scanning hard drives, this block size should be equal to the cluster size of the filesystem in the media. For a logical file, the default is 256 but it can be 1 or if there's some section alignment value (like 128 for tar files). Otherwise, it should equal the sector size of the hard drive. The default is 512.
Exclusive-Or Value
The value to exclusive-or with the contents before scanning.
Source
This is a file type to let the carver know here the bytes came from. Typically, this is Type_Freespace or Type_Slack
Origin
This is where the data came from. It is one of the Origin values.
Options
The flags that can be set to control the carve algorithm.
Truxton Message Mapping
The structure members map to the Truxton Message structure as follows:
TRUXTON_MESSAGE | Carve Parameters |
---|---|
file_id
|
43617276-654D-7367-3141-592653589793
|
parent_id
|
Logical File ID?
|
media_id
|
Media ID
|
file_type
|
Type_Truxton_Control_Message_Carve
|
signature
|
Block Size
|
types[11]
|
Exclusive-or Value
|
types[12]
|
Source
|
types[13]
|
Origin
|
types[14] - types[15]
|
Options as bit flags
|
depot_filename
|
depot_filename
|
depot_id
|
depot_id
|
offset
|
depot_filename
|
length
|
Block Size
|
Camera Information (EXIF)
The EXIF_DATA
structure is as follows:
struct EXIF_OFFSETS
{
uint64_t exif;
uint64_t make;
uint64_t model;
uint64_t body_serial_number;
uint64_t lens_serial_number;
uint64_t latitude;
uint64_t longitude;
uint64_t altitude;
uint64_t heading;
uint64_t focal_length;
uint64_t shutter_count;
uint64_t gps_time;
uint64_t device_time;
uint64_t thumbnail_offset;
uint64_t thumbnail_length;
};
struct EXIF_DATA
{
EXIF_OFFSETS offsets;
uint64_t gps_time;
uint64_t device_time;
double latitude;
double longitude;
double altitude;
double heading;
double focal_length;
int32_t shutter_count;
uint32_t thumbnail_offset;
uint32_t thumbnail_length;
wchar_t make[128];
wchar_t model[128];
wchar_t body_serial_number[64];
wchar_t lens_serial_number[64];
};
Truxton Message Mapping
The structure members map to the Truxton Message structure as follows:
TRUXTON_MESSAGE | EXIF_DATA |
---|---|
depot_id (high 64)
|
gps_time
|
depot_id (low 64)
|
device_time
|
offset
|
latitude
|
length
|
longitude
|
types[0] - types[3]
|
altitude
|
types[4] - types[7]
|
heading
|
types[8] - types[11]
|
focal_length
|
signature
|
shutter_count
|
depot_filename
|
make, model, body_serial_number, lens_serial_number
|
types[12] - types[15]
|
offsets.exif
|
ETL Status
The TRUXTON_STATUS_MESSAGE
structure is as follows:
struct TRUXTON_STATUS_MESSAGE
{
GUID ETL_ID;
GUID Media;
uint64_t Sent;
uint64_t Received;
uint32_t Stage;
uint32_t SentFromProcessID;
char FriendlyName[64];
};
Truxton Message Mapping
The structure members map to the Truxton Message structure as follows:
TRUXTON_MESSAGE | TRUXTON_STATUS_MESSAGE |
---|---|
parent_id
|
ETL_ID
|
media_id
|
Media
|
length
|
Sent
|
depot_filename
|
Received
|
offset
|
Stage
|
signature
|
SentFromProcessID
|
md5
|
FriendlyName
|
The remaining data items are:
TRUXTON_MESSAGE | Data Item |
---|---|
file_id
|
A GUID with the first two bytes set to 0xFA 0xCE (FACE).
|
How it Works
Each ETL will report what it is doing via status messages. Here is what each message means:
Dump
When this message is sent from an ETL, it tells the Load Status Monitor (Les) to generate a debugging dump.
TRUXTON_STATUS_MESSAGE | Value |
---|---|
Stage
|
253
|
ETL_ID
|
DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD
|
Media
|
DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD
|
FriendlyName
|
Dump
|
Ping
This tells Les that the ETL process is still processing a file. Les has no way to know if an ETL process dies since Les may be running on a different machine than the ETL process. For this reason, Les will give an ETL a certain amount of time to process a file before it assumes that ETL is either stuck in an endless loop or died. Sending a ping message to Les will prevent him from timing the ETL out.
TRUXTON_STATUS_MESSAGE | Value |
---|---|
FriendlyName
|
Ping
|
New
This tells Les that the ETL process is starting to process new media.
TRUXTON_STATUS_MESSAGE | Value |
---|---|
FriendlyName
|
New
|
Idle
This tells Les that the ETL process isn't processing anything, it is waiting for something to do.
TRUXTON_STATUS_MESSAGE | Value |
---|---|
FriendlyName
|
Idle
|
Verbose On
This tells Les to turn verbose logging on.
TRUXTON_STATUS_MESSAGE | Value |
---|---|
Stage
|
200
|
ETL_ID
|
56565656-5656-5656-5656-565656565656
|
Media
|
4E4E4E4E-4E4E-4E4E-4E4E-4E4E4E4E4E4E
|
FriendlyName
|
Set Verbose On
|
Verbose Off
This tells Les to turn verbose logging off.
TRUXTON_STATUS_MESSAGE | Value |
---|---|
Stage
|
200
|
ETL_ID
|
56565656-5656-5656-5656-565656565656
|
Media
|
66666666-6666-6666-6666-666666666666
|
FriendlyName
|
Set Verbose Off
|
Event
The CALENDAR_EVENT
structure is as follows:
struct CALENDAR_EVENT
{
uint32_t event_type_id;
uint64_t start;
uint64_t end;
std::u16string title;
std::u16string description;
};
Truxton Message Mapping
The structure members map to the Truxton Message structure as follows:
TRUXTON_MESSAGE | CALENDAR_EVENT |
---|---|
signature
|
event_type_id
|
offset
|
start
|
length
|
end
|
depot_filename
|
title
|
depot_filename
|
description
|
The remaining data items are:
TRUXTON_MESSAGE | Data Item |
---|---|
file_id
|
A GUID that corresponds to the [ID] column of the [File] table in the database. It holds the identifier of the file this event came from.
|
media_id
|
A GUID that corresponds to the [ID] column of the [Media] table in the database. It holds the identifier of the media this event came from.
|
parent_id
|
A GUID that corresponds to the [ID] column of the [Event] table in the database.
|
Tag
This message is generated when something is tagged in Truxton. Here's a sample program that uses this message.
Truxton Message Mapping
The tag message members map to the Truxton Message structure as follows:
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Tag
|
file_id
|
Tagged Item. A GUID that corresponds to the identifier of what was tagged. |
media_id
|
Media. A GUID that corresponds to the [ID] column of the [Media] table in the database. It holds the identifier of the media the item that was tagged came from.
|
parent_id
|
Tag ID. A GUID that corresponds to the [ID] column of the [Tag] table in the database.
|
offset
|
Tagged Item Type. The type of the item that was tagged |
signature
|
Source. The source of the tag. 1 means it is an automatic tag (algorithm generated), 2 means a human tagged the item. |
depot_filename
|
Reason. Why the item was tagged. |
Website Visit
The WEB_SITE_VISIT
structure is as follows:
struct WEB_SITE_VISIT
{
char const * account;
char const * url;
uint64_t when;
uint64_t offset_of_url;
uint64_t offset_of_account;
uint16_t type;
uint16_t method;
char local_filename[256];
};
Truxton Message Mapping
The structure members map to the Truxton Message structure as follows:
TRUXTON_MESSAGE | WEB_SITE_VISIT |
---|---|
types[0]
|
type
|
signature
|
method
|
offset
|
offset_of_url
|
length
|
when
|
depot_filename
|
url
|
The remaining data items are:
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Website_Visit |
file_id
|
The GUID of the file the website visit came from |
media_id
|
The GUID of the media the website visit came from |
parent_id
|
The GUID of the website visit. This corresponds to the [ID] column of the [WebsiteVisit] table in the database.
|
Note: Long URLs
If TRUXTON_MESSAGE.types[1]
is set to 1, it means that the entire contents of WEB_SITE_VISIT.url
could not be stored in TRUXTON_MESSAGE.depot_filename
When this happens, the receiver of the message should retrieve the full record from the [WebsiteVisit]
table where the [ID]
column is equal to the GUID in parent_id
.
Maintenance Messages
The Maintenance ETL performs long-running tasks. These tasks are started when particular messages are received.
Clean Database
This message will cause maintenance to clean the database.
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-436C-65616E557021
|
Consolidate Depots
This message will cause maintenance to combine smaller depots into larger ones.
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-436F-6E734465706F
|
Delete Depots
This message will cause maintenance to go through the datadir
and depotdir
and delete any files that have an extension of .ToBeDeleted
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-4465-6C4465706F74
|
Delete Media
This message will cause maintenance to go delete the meta data for the given media.
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-4465-6C4D65646961
|
media_id
|
The GUID of the media to delete |
Import Content from TPIF
This message will cause maintenance to import file content data from the given TPIF.
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-496D-70436E746E74
|
depot_filename
|
The path to the TPIF file |
parent_id
|
When length is 1, this is an investigation id, a length of 2 means this is a media id
|
length
|
1 - parent_id is an investigation id, 2 - parent_id is a media id
|
offset
|
0 means do not reindex the media upon completion, 1 means reindex the media |
Optimize Database
This message will cause maintenance to optimize the database.
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-4F70-74696D697A65
|
Reindex Media
This message will cause maintenance to content index given media.
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-5265-696E64657821
|
media_id
|
The GUID of the media to index |
Set Media Primary Photo
This message will cause maintenance to add a file to Truxton and make it the primary photograph of the given media.
TRUXTON_MESSAGE | Data Item |
---|---|
file_type
|
Type_Immediate_Maintenance |
file_id
|
4D61696E-7461-696E-4D65-6450686F746F
|
media_id
|
The GUID of the media gain the photo |
depot_filename
|
The path to the image file |