C Sample Identification ETL
This sample shows the steps needed to implement a byte identifier ETL in C. You can see this same sample implemented in Python. After identifying the file type, you will next need to exploit it. Add your ETL to the Truxton Service for it to automatically start with the other ETL processes.
Sample File Format
This sample will identify a fake file format we call Acme. Acme Corporation is a known supplier of nefarious devices and explosives. Their file format begins with a five byte magic value followed by eleven bytes in a data structure.
0000h: 88 77 66 55 00 11 22 33 44 55 66 77 88 99 AA BB 0010h: CC
Visual Studio Configuration
The steps to creating a program to create a file content identification ETL are:
- Start Visual Studio
- File->New->Project
- Empty Project - C++
- Project name: IdentifyFile
- Press "Create" button
- Remove the
x86
configuration - Right button on the IdentifyFileproject in the Solution Explorer window
- Add->New Item...->C++ File->Add button
- Right button on the IdentifyFileproject in the Solution Explorer window
- Select Properties
- C/C++->Additional Include Directories: add "C:\Program Files\Truxton\SDK"
- Linker->Additional Library Directories: add "C:\Truxton" (or wherever you generated the
TruxtonCAPI.lib
file)
Source Code
1 #include <stdio.h>
2 #include <memory.h>
3 #include <inttypes.h>
4 #include <TruxtonCAPI.h>
5 #include <TruxtonFileTypes.h>
6 #include <TruxtonDefines.h>
7 #pragma comment (lib, "TruxtonCAPI.lib")
8
9 int main(void)
10 {
11 uint8_t buffer[10];
12
13 uint64_t message = 0;
14
15 uint64_t etl_application = truxton_etl_create();
16
17 truxton_etl_set_application_name(etl_application, "Acme Identifier");
18 truxton_etl_set_description(etl_application, "This ETL identifies files using the Acme method");
19 truxton_etl_set_queue_name(etl_application, "acme");
20 truxton_etl_set_stage_number(etl_application, 2);
21
22 truxton_etl_add_desired_file_type(etl_application, Type_Unknown);
23
24 message = truxton_etl_get_message(etl_application);
25
26 while (message != 0)
27 {
28 if (truxton_message_get_depot_length(message) >= 16 &&
29 truxton_message_get_signature(message) == 0x88776655)
30 {
31 uint64_t file_in_truxton = truxton_message_get_file(message);
32
33 if (file_in_truxton != 0)
34 {
35 truxton_file_seek(file_in_truxton, 4, SEEK_SET);
36
37 if (truxton_file_read(file_in_truxton, buffer, 1) == 1)
38 {
39 if (buffer[0] == 0x00)
40 {
41 truxton_file_change_type(file_in_truxton, 11111);
42 truxton_message_set_file_type(message, 11111);
43 truxton_route_message(truxton_file_get_truxton(file_in_truxton), message);
44 }
45 }
46
47 truxton_file_free(file_in_truxton);
48 }
49 }
50
51 truxton_message_destroy(message);
52
53 message = truxton_etl_get_message(etl_application);
54 }
55
56 truxton_etl_destroy(etl_application);
57 return(0);
58 }
Code Walkthrough
The above code shows how to create an ETL process, register for a particular type of file, receive messages from a message queue, read the contents of a file in Truxton, and send a message to other ETL processes.
Lines 15-22 setup the ETL.
The message queue name will be "acme", we are an early stage and want to receive Type_Unknown
files.
Line 24 starts the ETL logic and waits until a message arrives on the "acme" queue.
Line 28 looks at the data in the message to see if it is even possible for the file we have received to be an Acme file.
The signature
member contains the first four bytes of the file.
Acme's format has a five byte signature which means we will have to read bytes from the file in order to perform a valid check.
Opening a file is rather expensive and we want identification to be as fast as possible.
By using signature
to check the first four bytes, we can avoid unnecessarily incurring a performance hit by reading from a file we know can't possibly be Acme.
Line 31 gives you a read-only file object so you can read from it.
Lines 37-39 read the fifth byte in the file and checks it for validity.
Line 41 changes the file type to our identifier for Acme. We talked about file type identifiers in a previous article.
Line 42-43 begins the process of sending this newly identified file to any ETL process that has registered for it.
The first step is to overwrite the filetype
, which should contain Type_Unknown
, with the identifier for our file type.
The last step is to call truxton_route_message()
which tells Truxton to send this message to any ETLs that want it.
Line 53 completes the loop by getting another message from the acme message queue.