Message Bus Messages

From truxwiki.com
Jump to navigation Jump to search

The core of Truxton is the message bus.

Message Structure

Here's the C/C++ structure:

struct TRUXTON_MESSAGE
{
    GUID      file_id;
    GUID      parent_id;
    GUID      media_id;
    GUID      depot_id;
    uint64_t  offset;
    uint64_t  length;
    uint8_t * file_contents;
    uint32_t  queue_is_empty;
    uint32_t  route_id;
    uint32_t  priority;
    uint32_t  signature;
    uint32_t  dont_route;
    uint16_t  file_type;
    uint16_t  types[16];
    char      depot_filename[MAX_PATH];
    char      md5[33];
};

The TRUXTON_MESSAGE structure is 440 bytes long.

Members

When this structure is used to pass a file object around the ETL layer, the members are:

file_id

The GUID of the file. This corresponds to the [ID] column of the [File] table.

parent_id

The GUID of the parent of this file. This corresponds to the [ParentFileID] column of the [File] table.

media_id

The GUID of the file. This corresponds to the [MediaID] column of the [File] table.

depot_id

The GUID of the file. This corresponds to the [DepotID] column of the [Content] table and the [ID] column of the [Depot] table.

offset

This is offset into the depot file where this file's contents begin. If the file has no contents, this value will be zero.

length

The number of bytes of content in the depot file for this file's contents. If the file has no contents, this value will be zero.

file_contents

An optional pointer to the file's contents. Often times, a small file can be included in the message itself.

queue_is_empty

This is non-zero when there are no more files in the message queue. All other members will be zeroed out.

route_id

This is used for routing this file downstream from you. It corresponds to the [LoadConfigurationID] column of the [ETLRoute] table. You will also see it in the [ID] column of the [LoadConfiguration] table as well as the [LoadConfigurationID] column of the [Media] table.

priority

The priority of the message. A higher numeric value means the message will be processed before a lower numeric value. 99 is a higher priority than 1.

signature

This contains the first four bytes of the file. This corresponds to the [Signature] column of the [File] table.

dont_route

Contains a non-zero value if this message is for your ETL only and should not be routed to the other ETL's on the message bus.

file_type

The more accurate type for the file's contents. It can be a defined constant but it must exist in the [ID] column of the [FileType] table. It will be stored in the [FileTypeID] column of the [File] table.

types

While the file_type member contains the most accurate file type, a file may have several types associated with it. The order in which file types are listed in the types array should be in the most accurate to most generic. Take, for example, a Word 2007 document which is a specialization of the open package file format which is based on Zip.

depot_filename

The name of the depot file you need to open to get to the file's contents. If the file_contents member doesn't contain the contents, you should open this file and seek to the offset position to read up to length bytes. If the file has no contents or the contents were eliminated, this string will be empty.

md5

This an ASCII representation of the MD5 hash of the file's contents.

Message Database Record

Truxton normally uses PostgreSQL to be the message queue server. The template for a message queue is defined in the C:\Program Files\Truxton\Database\create-tmb-tables.sql file. All message queues have the following schema:

CREATE TABLE "queue template"
(
  "QueueItemID" uuid DEFAULT (md5(((random())::text || (clock_timestamp())::text)))::uuid PRIMARY KEY,
  "QueueItemAvailable" boolean NOT NULL,
  "Priority" integer DEFAULT 1000 NOT NULL,
  "RouteID" integer NOT NULL,
  "FileID" uuid NOT NULL,
  "ParentID" uuid NOT NULL,
  "MediaID" uuid NOT NULL,
  "DepotID" uuid NOT NULL,
  "Offset" bigint NOT NULL,
  "Length" bigint NOT NULL,
  "MD5" character(32) NOT NULL,
  "Signature" integer NOT NULL,
  "Type" smallint NOT NULL,
  "Types" smallint[],
  "DepotFilename" text NOT NULL,
  "Don't Route" boolean NOT NULL,
  "File Contents" varchar(64000)
);

Mapping Other Messages to this Structure

By mapping structures that have nothing to do with files to the Truxton Message structure, we can leverage all of the message queue code. We map several other types of message to this message structure.

Carve

File carving in Truxton is a parallel process. The space to be carved is divided into blocks. Each block is packaged in a message and sent to ETLs that registered for the Type_Truxton_Control_Message_Carve file type.

Options

There are several options that are set for carving.

Logical File ID

Media ID

The GUID of the media this block came from. This corresponds to the [ID] column of the [Media] table.

Block Size

The number of bytes in a logical block. Scanning will take place on block boundaries. When scanning hard drives, this block size should be equal to the cluster size of the filesystem in the media. For a logical file, the default is 256 but it can be 1 or if there's some section alignment value (like 128 for tar files). Otherwise, it should equal the sector size of the hard drive. The default is 512.

Exclusive-Or Value

The value to exclusive-or with the contents before scanning.

Source

This is a file type to let the carver know here the bytes came from. Typically, this is Type_Freespace or Type_Slack

Origin

This is where the data came from. It is one of the Origin values.

Options

The flags that can be set to control the carve algorithm.

Truxton Message Mapping

The structure members map to the Truxton Message structure as follows:

TRUXTON_MESSAGE Carve Parameters
file_id 43617276-654D-7367-3141-592653589793
parent_id Logical File ID?
media_id Media ID
file_type Type_Truxton_Control_Message_Carve
signature Block Size
types[11] Exclusive-or Value
types[12] Source
types[13] Origin
types[14] - types[15] Options as bit flags
depot_filename depot_filename
depot_id depot_id
offset depot_filename
length Block Size

Camera Information (EXIF)

The EXIF_DATA structure is as follows:

struct EXIF_OFFSETS
{
  uint64_t exif;
  uint64_t make;
  uint64_t model;
  uint64_t body_serial_number;
  uint64_t lens_serial_number;
  uint64_t latitude;
  uint64_t longitude;
  uint64_t altitude;
  uint64_t heading;
  uint64_t focal_length;
  uint64_t shutter_count;
  uint64_t gps_time;
  uint64_t device_time;
  uint64_t thumbnail_offset;
  uint64_t thumbnail_length;
};

struct EXIF_DATA
{
  EXIF_OFFSETS offsets;
  uint64_t     gps_time;
  uint64_t     device_time;
  double       latitude;
  double       longitude;
  double       altitude;
  double       heading;
  double       focal_length;
  int32_t      shutter_count;
  uint32_t     thumbnail_offset;
  uint32_t     thumbnail_length;
  wchar_t      make[128];
  wchar_t      model[128];
  wchar_t      body_serial_number[64];
  wchar_t      lens_serial_number[64];
};

Truxton Message Mapping

The structure members map to the Truxton Message structure as follows:

TRUXTON_MESSAGE EXIF_DATA
depot_id (high 64) gps_time
depot_id (low 64) device_time
offset latitude
length longitude
types[0] - types[3] altitude
types[4] - types[7] heading
types[8] - types[11] focal_length
signature shutter_count
depot_filename make, model, body_serial_number, lens_serial_number
types[12] - types[15] offsets.exif

ETL Status

The TRUXTON_STATUS_MESSAGE structure is as follows:

struct TRUXTON_STATUS_MESSAGE
{
  GUID     ETL_ID;
  GUID     Media;
  uint64_t Sent;
  uint64_t Received;
  uint32_t Stage;
  uint32_t SentFromProcessID;
  char     FriendlyName[64];
};

Truxton Message Mapping

The structure members map to the Truxton Message structure as follows:

TRUXTON_MESSAGE TRUXTON_STATUS_MESSAGE
parent_id ETL_ID
media_id Media
length Sent
depot_filename Received
offset Stage
signature SentFromProcessID
md5 FriendlyName

The remaining data items are:

TRUXTON_MESSAGE Data Item
file_id A GUID with the first two bytes set to 0xFA 0xCE (FACE).

How it Works

Each ETL will report what it is doing via status messages. Here is what each message means:

Dump

When this message is sent from an ETL, it tells the Load Status Monitor (Les) to generate a debugging dump.

TRUXTON_STATUS_MESSAGE Value
Stage 253
ETL_ID DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD
Media DDDDDDDD-DDDD-DDDD-DDDD-DDDDDDDDDDDD
FriendlyName Dump

Ping

This tells Les that the ETL process is still processing a file. Les has no way to know if an ETL process dies since Les may be running on a different machine than the ETL process. For this reason, Les will give an ETL a certain amount of time to process a file before it assumes that ETL is either stuck in an endless loop or died. Sending a ping message to Les will prevent him from timing the ETL out.

TRUXTON_STATUS_MESSAGE Value
FriendlyName Ping

New

This tells Les that the ETL process is starting to process new media.

TRUXTON_STATUS_MESSAGE Value
FriendlyName New

Idle

This tells Les that the ETL process isn't processing anything, it is waiting for something to do.

TRUXTON_STATUS_MESSAGE Value
FriendlyName Idle

Verbose On

This tells Les to turn verbose logging on.

TRUXTON_STATUS_MESSAGE Value
Stage 200
ETL_ID 56565656-5656-5656-5656-565656565656
Media 4E4E4E4E-4E4E-4E4E-4E4E-4E4E4E4E4E4E
FriendlyName Set Verbose On

Verbose Off

This tells Les to turn verbose logging off.

TRUXTON_STATUS_MESSAGE Value
Stage 200
ETL_ID 56565656-5656-5656-5656-565656565656
Media 66666666-6666-6666-6666-666666666666
FriendlyName Set Verbose Off

Event

The CALENDAR_EVENT structure is as follows:

struct CALENDAR_EVENT
{
  uint32_t       event_type_id;
  uint64_t       start;
  uint64_t       end;
  std::u16string title;
  std::u16string description;
};

Truxton Message Mapping

The structure members map to the Truxton Message structure as follows:

TRUXTON_MESSAGE CALENDAR_EVENT
signature event_type_id
offset start
length end
depot_filename title
depot_filename description

The remaining data items are:

TRUXTON_MESSAGE Data Item
file_id A GUID that corresponds to the [ID] column of the [File] table in the database. It holds the identifier of the file this event came from.
media_id A GUID that corresponds to the [ID] column of the [Media] table in the database. It holds the identifier of the media this event came from.
parent_id A GUID that corresponds to the [ID] column of the [Event] table in the database.

Tag

This message is generated when something is tagged in Truxton. Here's a sample program that uses this message.

Truxton Message Mapping

The tag message members map to the Truxton Message structure as follows:

TRUXTON_MESSAGE Data Item
file_type Type_Tag
file_id Tagged Item. A GUID that corresponds to the identifier of what was tagged.
media_id Media. A GUID that corresponds to the [ID] column of the [Media] table in the database. It holds the identifier of the media the item that was tagged came from.
parent_id Tag ID. A GUID that corresponds to the [ID] column of the [Tag] table in the database.
offset Tagged Item Type. The type of the item that was tagged
signature Source. The source of the tag. 1 means it is an automatic tag (algorithm generated), 2 means a human tagged the item.
depot_filename Reason. Why the item was tagged.

Website Visit

The WEB_SITE_VISIT structure is as follows:

struct WEB_SITE_VISIT
{
  char const * account;
  char const * url;
  uint64_t     when;
  uint64_t     offset_of_url;
  uint64_t     offset_of_account;
  uint16_t     type;
  uint16_t     method;
  char         local_filename[256];
};

Truxton Message Mapping

The structure members map to the Truxton Message structure as follows:

TRUXTON_MESSAGE WEB_SITE_VISIT
types[0] type
signature method
offset offset_of_url
length when
depot_filename url

The remaining data items are:

TRUXTON_MESSAGE Data Item
file_type Type_Website_Visit
file_id The GUID of the file the website visit came from
media_id The GUID of the media the website visit came from
parent_id The GUID of the website visit. This corresponds to the [ID] column of the [WebsiteVisit] table in the database.

Note: Long URLs

If TRUXTON_MESSAGE.types[1] is set to 1, it means that the entire contents of WEB_SITE_VISIT.url could not be stored in TRUXTON_MESSAGE.depot_filename When this happens, the receiver of the message should retrieve the full record from the [WebsiteVisit] table where the [ID] column is equal to the GUID in parent_id.

Maintenance Messages

The Maintenance ETL performs long-running tasks. These tasks are started when particular messages are received.

Clean Database

This message will cause maintenance to clean the database.

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-436C-65616E557021

Consolidate Depots

This message will cause maintenance to combine smaller depots into larger ones.

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-436F-6E734465706F

Delete Depots

This message will cause maintenance to go through the datadir and depotdir and delete any files that have an extension of .ToBeDeleted

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-4465-6C4465706F74

Delete Media

This message will cause maintenance to go delete the meta data for the given media.

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-4465-6C4D65646961
media_id The GUID of the media to delete

Import Content from TPIF

This message will cause maintenance to import file content data from the given TPIF.

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-496D-70436E746E74
depot_filename The path to the TPIF file
parent_id When length is 1, this is an investigation id, a length of 2 means this is a media id
length 1 - parent_id is an investigation id, 2 - parent_id is a media id
offset 0 means do not reindex the media upon completion, 1 means reindex the media

Optimize Database

This message will cause maintenance to optimize the database.

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-4F70-74696D697A65

Reindex Media

This message will cause maintenance to content index given media.

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-5265-696E64657821
media_id The GUID of the media to index

Set Media Primary Photo

This message will cause maintenance to add a file to Truxton and make it the primary photograph of the given media.

TRUXTON_MESSAGE Data Item
file_type Type_Immediate_Maintenance
file_id 4D61696E-7461-696E-4D65-6450686F746F
media_id The GUID of the media gain the photo
depot_filename The path to the image file