ETL Stages
Contents
Explanation
ETL Stages are key to solving the problem of process coordination. The Truxton exploitation process was broken down into a series of steps. These steps are performed in a stage. When the stages are complete, Truxton is finished processing that media.
By associating a stage number (1-255) to an exploitation process, Truxton can manage the transition from chaotic to ordered processing.
ETL Communications
The ETL processes communicate using a message bus. The messages contain information about a file needing processing and how to get the file contents.
Simple Exploitation Walk Through
The first stage is Load. It performs the following tasks:
- Navigate the source media (disk image, folder or file)
- Puts file contents into depot files
- Puts meta data into the database
- Identifies the file's contents
- Based on the file's type, the file is routed to the exploitation process by putting a message into that ETL's message queue.
The ETL process will:
- Wait until a message arrives for it to process
- Retrieve the contents of the file referenced in the message
- Report status to the Load Status Monitor (Les)
- Exploit those contents to produce more files, or entities, or chat messages, etc.
- Files produced by one ETL can be sent to another
When all ETLs have finished, the load is complete.
Stages and Status Monitoring
In order to process all of the data, we must begin with utter chaos and transition to a ordered steps that must be completed linearly. Some ETLs can thrive in the chaos, some cannot and some live in both worlds. There are ranges of stage values for an ETL. The rule is, if two ETL processes have the same stage value, they can operate in parallel. If one ETL has a higher stage value than another, it will execute after that other ETL. This is ignored in the Chaos region, becomes relevant in the Semi-Chaotic region and becomes law in the Linear.
It is the job of the Load Status Monitor (Les) to watch all of the ETLs and advance the media through the stages of exploitation.
Chaos
The chaotic stages are where files are produced and/or atomically exploited. If a file is stand-alone, not requiring any other files to exploit it, it is considered to be "atomic." This is the easiest ETL to write.
One example of a chaotic file is a zip file. An ETL process that unzips the file to produce child files is atomic in that all it needs to do its job is the contents of that one zip file. ETL processes run in a parallel and operate on different media simultaneously (participate in different "loads"). Chaotic stages produce files in random order.
These stages can be thought of as executing in non-linear time. Child files can be processed before their parents. Files are processed in random order.
In the above illustration, loading and expanding are the chaos stages. They can produce any number of files in any order.
Semi-Chaotic
When the chaotic ETLs have finished producing files for a piece of media, the next stage of exploitation can begin. Semi-chaotic ETLs produce files but in a more orderly fashion and after the chaos is finished. When they produce files, everything falls down again into utter chaos.
The exploitation process enters this loop between Chaos and Semi-Chaotic until no Semi-Chaotic ETL produces any files, then the linear processing can take place.
One example of a semi-chaotic file is a spanned zip file. This is a zip archive that spans several files. It cannot reliably be exploited in the Chaotic stages because all of the files in the archive might not yet exist in Truxton. By waiting until the chaos has subsided, we know that all of the files in the span will be present. Truxton will notify the Poly File Expander when the chaos is complete. Poly will then:
- Search the current media for any files known to be part of a type that requires more than one file to exploit
- It will then exploit that file during which, the other files can be queried for
- It produces child files which cause the chaos stage to reignite
For the spanned zip, Poly will find all pieces of the span, combine them together then expand the archive. Since Poly will restart the chaos, it keeps track of which files it has processed. When it is again told to process a piece of media, it can ignore files it has already processed.
Semi-chaotic ETLs operate in non-linear time but after chaos. Once they are complete, processing enters the linear stage.
Note: Stage Value Behavior
Stitching (fragmented file carving) is another semi-chaotic ETL. Poly is stage 32 while Stitch is stage 40. This means that Stitch will not start until Poly is completely finished. If Stitch was stage 32, it would be told to execute at the same time as Poly and miss any data produced by poly file expansion.
Linear
Linear stages are where some sort of sanity is brought to the process. ETLs in this stage execute one after another. For example, Alerts (stage 128) is executed before Reports (stage 160) because any alerts that were generated need to be included in reports. If two different linear ETLs have the same stage number, they will execute in parallel.
While earlier stages were mainly concerned with producing files and artifacts, Linear and later stages are concerned with Media processing. Early stage ETLs were routed based on file type, from this stage forward, everyone gets notified of media to process instead of a file to process.
Untracked
There is a class of ETLs that operate in a fashion that does not affect the status of media being processed. Some examples include:
- SOLRFile - Indexes contents and metadata into a solr cluster for full text searching.
- Forensic Logging - log messages are sent to various destination such as Azure Log Analytics
- SendGrid Notifier - When a load completes, the Media Summary report is emailed to a distribution list
Stage Ranges
Here are the values for all of the stages;.
Name | Value (Inclusive) | Meaning |
---|---|---|
Load/Expand | 1-31 | Chaos. ETLs in this stage range can produce files and artifacts in random order. |
Poly File Expansion | 32-63 | Semi-Chaotic. ETLs in this stage range can produce files and artifacts in random order but only after previous stage ranges have completed. |
Summarizing | 64-96 | All files and artifacts (entities) have been produced. ETLs in this range query the data to produce summaries such as unique lists of artifacts. |
Alerting | 128-159 | ETLs in this range query the data to alerts any analysts may have wanted. |
Reporting | 160-191 | ETLs in this range query the data produced by any previous stage and create reports from it. |
Feeding | 192-223 | ETLs in this range query the data produced by any previous stage and feed it to external systems. |
Finished | 240-254 | ETLs perform any final tasks needed to make the media ready for the analyst. No further processing will take place. |
DoNotTrack | 255 | This ETL should not be considered when determining the status of media |
ETLs and Their Stages
Here's a list of ETLs, their stages and message bus queue names. Remember, stage 255 means "do not track."
ETL | Executable | Stage | Percent Complete | Queue Name |
---|---|---|---|---|
Load | Load.exe
|
1 | 48% | loadq |
Truxton Alert Generator | Alert.exe
|
128 | 75% | alert |
Truxton Archive Expander | Archives.exe
|
6 | 48% | archives |
Truxton Azure Image Analyzer | Azure.AnalyzeImage.exe
|
9 | 48% | azureanalyzeimage |
Truxton Azure OCR | Azure.OCR.exe
|
9 | 48% | azureocr |
Truxton Carve | Carve.exe
|
4 | 48% | carve |
Truxton Contact Sheet Creator | ContactSheet.exe
|
6 | 48% | contactsheet |
Load as an ETL | Load.exe
|
11 | 48% | loadq |
Truxton Email | EMail.exe
|
12 | 48% | |
Truxton Expand | Expand.exe
|
3 | 48% | expand |
Truxton Finished Loads Monitor | Finished.exe
|
240 | 94-100% | finished |
Truxton Forensic Finding Logger | ForensicLogger.exe
|
255 | N/A | flogger |
Truxton Identify | Identify.exe
|
2 | 48% | identify |
Truxton Language Identifier | LangID.exe
|
18 | 48% | langid |
Maintenance | Maintenance.exe
|
255 | 100% | Maintenance |
Truxton Notifier | Notify.exe
|
255 | 100% | Notify |
Truxton Poly File Coordinator | Poly.exe
|
32 | 60% | poly |
Truxton PST Processor | PST.exe
|
18 | 48% | pst |
Truxton Registry Expander | Registry.exe
|
9 | 48% | registry |
RegRipper | RegRipper.exe
|
8 | 48% | regripper |
Truxton Remote Expand | RemoteFileExpander.exe
|
3 | 48% | remoteexpand |
Report | Report.exe
|
160 | 80% | report |
Truxton SOLR Contents Indexer | SOLR.exe
|
192 | 90% | solrcontentstage |
Truxton SOLR File Indexer | SOLRFile.exe
|
255 | N/A | solrfile |
Truxton File Stitcher | Stitch.exe
|
40 | 62% | stitch |
Truxton Text Extractor | TextExtract.exe
|
15 | 48% | tqueue |
Truxton Thumbnail Generator | Thumbnail.exe
|
8 | 48% | thumbnail |
Truxton Yara Scanner | Yara.exe
|
7 | 48% | yara |
ETLs sorted by Stage:
ETL | Executable | Stage | Percent Complete | Queue Name |
---|---|---|---|---|
Load | Load.exe
|
1 | 48% | loadq |
Truxton Identify | Identify.exe
|
2 | 48% | identify |
Truxton Expand | Expand.exe
|
3 | 48% | expand |
Truxton Carve | Carve.exe
|
4 | 48% | carve |
Truxton Archive Expander | Archives.exe
|
6 | 48% | archives |
Truxton Yara Scanner | Yara.exe
|
7 | 48% | yara |
Truxton Thumbnail Generator | Thumbnail.exe
|
8 | 48% | thumbnail |
Truxton Registry Expander | Registry.exe
|
9 | 48% | registry |
Truxton Remote Expand | RemoteFileExpander.exe
|
10 | 48% | remoteexpand |
Load as an ETL | Load.exe
|
11 | 48% | loadq |
Truxton Email | EMail.exe
|
12 | 48% | |
RegRipper | RegRipper.exe
|
13 | 48% | regripper |
Truxton Contact Sheet Creator | ContactSheet.exe
|
14 | 48% | contactsheet |
Truxton Text Extractor | TextExtract.exe
|
15 | 48% | tqueue |
Truxton Azure OCR | Azure.OCR.exe
|
16 | 48% | azureocr |
Truxton Azure Image Analyzer | Azure.AnalyzeImage.exe
|
17 | 48% | azureanalyzeimage |
Truxton Language Identifier | LangID.exe
|
18 | 48% | langid |
Truxton PST Processor | PST.exe
|
18 | 48% | pst |
Truxton Poly File Coordinator | Poly.exe
|
32 | 60% | poly |
Truxton File Stitcher | Stitch.exe
|
40 | 62% | stitch |
Truxton Alert Generator | Alert.exe
|
128 | 75% | alert |
Report | Report.exe
|
160 | 80% | report |
Truxton SOLR Contents Indexer | SOLR.exe
|
192 | 90% | solrcontentstage |
Truxton Finished Loads Monitor | Finished.exe
|
240 | 94-100% | finished |
Truxton Notifier | Notify.exe
|
255 | 100% | Notify |
Truxton Forensic Finding Logger | ForensicLogger.exe
|
255 | N/A | flogger |
Truxton SOLR File Indexer | SOLRFile.exe
|
255 | N/A | solrfile |
Maintenance | Maintenance.exe
|
255 | 100% | Maintenance |