What is Truxton?
TL;DR: Truxton is an extensible forensics system I can modify to do what I need with the amount of data I have using all of the computers at my disposal.
Contents
Description
Truxton is a digital forensics processing system designed to handle the variety, volume and velocity of data in modern investigations. Applications built on this framework include an analyst desktop GUI, data exploitation programs, report generators, data exporters and whatever you wish to add. You can extend this framework to recognize and exploit proprietary data that cannot be released to the forensics community.
Truxton is an open ended system. It is a collection of exploitation pieces that you can arrange to solve whatever problem you have. Let's take a look at some of those problems.
Principles
Truxton was created with a core set of values.
Start Analysis with Information not Data
There's too much data. Too much for a human to go through. Instead of making a system to present files to a user, we created a data exploitation framework where the important artifacts are gathered to a point where they can be used to generate reports. Let investigations begin with these reports.
Be Open
Nothing in Truxton should be hidden to the user. The way information is stored and retrieved should be easily understood. Customers should be able to get to their data in Truxton without requiring Truxton. Let customers extend Truxton to handle their own proprietary data types, artifact types, event types, and use their own proprietary exploitation techniques.
Do as Much as Possible
Automate the extraction of information. Perform this automation by default. Do everything we know how to do to help the investigator. Have the user turn off processing they don't need rather than have them remember to turn on processing they want.
Data
Variety
There are many different types of files, many types of forensic artifacts, many ways to correlate data, etc. Truxton handles this variety of raw data and exploitation techniques by separating the identification of the data from the processing of the data.
Truxton enumerates through files in the given media and identifies them. Information about the file is stored in a database. Exploitation of the file is based upon the its type. The file can be exploited by different programs for different purposes. When a ZIP file is found, one process can decompress it to produce more files while another scans it for malware. Exploiting a photograph may produce serial numbers and geographic coordinates. These artifacts are stored in the database so they can be presented to the user and for the basis for cross case correlation.
Because no single company, no matter how good we are, can keep up with the ever changing world of forensics, we have made Truxton extensible. You can add your own file types, your own exploitation algorithms and output your own reports.
Volume
The amount of data to be processed is ever increasing. The number of things that must be done to data is ever increasing. Truxton was designed to scale the three problem areas when dealing with "big data": processing, database (lots of things) and storage (lots of data).
Processing
There are two choke points when it comes to processing large amounts of data, CPU and IO. CPU-bound problems require lots of calculations to complete. IO-bound problems are when you reach the limit of how much data can possibly move through the computer. Truxton uses a message queue architecture to balance these two problems. An exploitation process reads a unique message from the queue and operates on it. Scalable processing is achieved by running more copies of that process. If one machine gets overwhelmed, add another machine running more processes. In general, if the problem is CPU-bound, add more processes until you reach 100% CPU usage. If the problem is IO-bound, add more machines.
Database
Digital investigations produce a host of information, managing files, data provenance, investigator actions, artifacts, reports, et al. Exploiting data produces a tidal wave of things that must be stored and easily found. Databases are great for this type of problem. Truxton uses an SQL database to store everything produced in an investigation. We expect advanced users, whose needs we cannot predict, to query this database directly and have made it as humanly readable as possible. Sample queries are provided to assist them.
When it comes time to share with other organizations, send them your data not the original media. Truxton stores information in the database in a way that allows it to be safely shared with other installations. If you have media that contains information of interest to another organization, you can export it from your Truxton and imported into their Truxton without fear of overwriting any of their data. This design also helps when there are too many users are querying the database at the same time. An overloaded database server can be replicated another server and queries handled split between them.
Storage
Truxton stores the files extracted from the source media in large files called data depots. All of the files from a hard drive will be stored in a single depot file. This is done to reduce the burden of managing millions of files from the operating system. Instead, let the operating system manage a few large files instead of lots of small files. When you store millions of files in a directory structure, the operating system begins to suffer from having to perform database-like tasks. Given 10 million filenames, databases are good at searching for the right one while file systems are not. The database has a table that tells you which depot file contains the desired file contents.
Things that a user wants to find are stored in the database. Things the user wants to see are stored in a depot.
Velocity
Truxton was designed to utilize multiple machines exploiting multiple pieces of media simultaneously. There is never a need for an investigator to wait until processing is complete before beginning their analysis. This exploitation process runs in the background without human intervention, 24 hours a day, on weekends, it never tires. You can dynamically tune the process to get the most performance for your type of data.
Truxton will accelerate to the speed of your machines, from a laptop to a rack of servers. You choose which is best for you.
Roles
Lightly Trained Users on a Laptop
Organizations can't have fully trained staff for everything but they can teach people to perform tasks. This typically happens in small offices where you have people that just want to see forensic reports and not be confused with a bunch of digital forensics techno speak. Teach them how to image seized phones, thumb drives and hard drives. Put those images into a folder on an external hard drive. The user would then right-button on that folder, hit the Easy Button and wait for a browser to appear with a list of all artifacts found in all of the seized media.
Subject Matter Expert on a Network
Just show me Autocads. Glance-able cross case correlations.
Large Organization in a Data Center
This is industrial scale forensics working 24 hours a day. Raw data comes in using a system such as NiFi, it is logged and queued for exploitation. Truxton is installed on dedicated loader machines, or VMs, and gets things to load from a queue. As media exploitation completes, status messages are sent to a chat room. Quality control checks are performed and the data pushed to the team's instance of Truxton. Summary reports are generated and delivered to the owner's of the data letting them know there's new media to investigate. Different types of artifacts are extracted and sent to internal systems for analysis.