Thursday, September 18, 2014

Back to basics - Code Analysis

This post was triggered by the great blog post explaining code analysis in gdb.

http://erenyagdiran.github.io/I-was-just-asked-to-crack-a-program-Part-1/

Code analysis is not part of a forensic technician's required skill set and even some digital forensic analysts would never need to know how to trace a code in debug.  In some cases, an investigator might be lucky enough to have a case with simple enough code to quickly see a pattern and see if it might help the investigation in order to even "mess" with interpreting the code.

The following simple code might be worth while to quickly see what the password is, so we could use that information somewhere else in the case.  We know this code uses XOR ( ^ ) to check the password and even have a code commented out to decode the password for us.  So, we either need to know the characteristics of XOR and decode it ourselves or use the code and compile it ourselves to see the solution.

In XOR, a clear text XORd with a key results in ciphertext, but if we have the ciphertext and the key, we can XOR them to derive at the clear text password.
Clear text       10010101                                   Ciphertext      10110000
Key                00100101                                   Key                00100101
Ciphertext     10110000                                Clear text        10010101


Also, with XOR, if we do not know the key, but able to monitor the cypher text, if we enter clear texts and it results of a ciphertext of all zeroes, then the clear text entered is the key itself.

Guessed password 10010101011
Unknown key         10010101011
Ciphertext               00000000000

Thus, the guessed password is the key we are looking for.  So, every ciphertext that we'll find on this system can be easily decrypted using the discovered key.

I've written a simple code to practice this process and see if you can decode my password by hand or see if you know how to compile a C++ code to let the code do it for you.  Thus, basic understanding of encryption and basic knowledge of compiling code might be required in this field and in degree plans for those interested in digital forensics.  Of course, you might like this type of investigations and you'd like to learn much more about programming, in that case, you might need to pursue computer science at higher institutions in order to take your skills to the next level.

#include <iostream>
#include<string>

using namespace std;

int main(){
string password;
string key = "abcdefgh";
string pass = "\x1b\r\xf\x10\x4\bV\\";

cout << "Please enter your password: ";
cin >> password;

//encode password
for (int index = 0; index<password.length(); index++)
password[index] = password[index] ^ key[index];

cout << "encoded: " << password << endl;

//decode password
//for (int index = 0; index<password.length(); index++)
// password[index] = password[index] ^ key[index];
//cout << "decoded: " << password << endl;

if (strcmp(&password[0],&pass[0])==0)
cout << "You got the password" << endl;
else
cout << "Incorrect password was entered!!!" << endl;

return 0;
}

Can you write a flow chart for this code and a methodology for the decoding approach?

Sunday, September 14, 2014

Back to basics - Operator Precedence

Why do we need to test forensic tools why the programmers compiled the code without any errors?  The concept of logical errors and algorithm implementations can not be detected by compiling code, they can be found by continuous testing with the right input and output needs to be monitored for the correct values.  We need to avoid garbage in, garbage out conditions for reliable tool testings.  One of the implementation issues that can be detected by testing is the operator precedence.

In this presentation, I wanted to talk about the order of operations that are ignored in many cases.  Order of operations are used by systems to evaluate the value of an expression by parsing the expression by operator precedence as defined for the given system.

Analyzing code requires not just pattern recognition to specific code, but also the recognition of logical errors that might have been exploited.

In this chart, I give an example of the flow of operator evaluation, but the accompanying video will give a more in-depth explanation.  http://youtu.be/7EQ5YZOU7tw

             You can practice operator precedence on the command line by setting variables 
             by arithmetic operations.                                                                                         
             C:\>set /a test=(9*9)*4/(9*(5*5*5)-(14-6))                                                               
             0                                                                                                                              


Saturday, September 6, 2014

Back to Basics - FAT File/Folder Structure

Have you ever wondered how File Allocation Table ( FAT ) maintains the file system structure?  Many forensic books and certification exams discuss the structure of the file system, but I yet to see discussion on how the file system links the directory structure together.  In this post, I wanted to examine and model the links between files and folders.

Many books discuss the concept that we can navigate the file system by running cd . or cd .. to change directory to the current directory or to the parent of the current directory.  The . and .. files turned out to be very important to understand how FAT maintains the directory structure.

Each directory maintains its own Directory Entry ( DE ) in a unique cluster where the root DE is considered as the cluster 0.  Cluster 1 was never referenced.  Referring to the FAT table, we know that FAT signature in FAT16 is F8FF and another FFFF that refers to the DE.  Thus, F8FF is cluster reference 0 while FFFF following F8FF should be the reference to cluster 1.  Thus, the first usable cluster for files is cluster 2.  

I have created test case on a thumb drive using the following structure:

D:\file1.txt
D:\folder1
         ->file2.txt
         ->folder1-1
                 ->file3.txt

I have traced the file system structures to their starting and ending sector numbers to find a pattern that lead me to understand how the files are stored.


The chart of sector numbers was used to develop a model of file structure on storage device.


The model can be verified by examining the actual structure of the DEs to establish the links between the DE entries.


A simplified view of relevant cluster number designations shows the repeating pattern of folders pointing to themselves by referring to the cluster number where the DE resides holding the DE entry for the file and the .. file entry is referring to the parent's DE cluster.


In some cases, we can examine the actual data structures on disk to reveal patterns that can be used to understand how technology works.  The steps, documentation, and methodology are all crucial skills for any beginning forensic examiner or analyst while forensic technicians would not have to know technology at this level.  Only education and hard work can develop a forensic analyst for a higher level of understanding of data structures while training of forensic technicians will never be able to develop professionals capable of this type of skills.  I hope, the type of documents will help even technicians understand that there is more to learn about technology than pushing buttons and reading output from invalidated tools.


Tuesday, September 2, 2014

The True Scientific Model of Digital Forensic Analysis

The formal model of digital forensic analysis can be summarized in a single application methodology since users interact with applications ( an operating system is a special purpose application that functions to manage basic resources ( I/O, interrupt ), processes, memory, rights, and file systems ).

Many people talk about and write books on what digital forensics is, but most covers the forensic technician skills.  Forensic technicians are trained Personal Computer ( PC ) technicians with skills for the most recent technology in order to mainly acquire and retrieve digital data.  In many cases, the technology is so new and techniques of retrieving data is so unknown to a sector of technicians that it is considered "woodoo forensics".  ( i.e. Chip-off )

At the end, digital forensic analysis is the true detective work where acquisition becomes the sub-process that supports the actual investigation.  Forensic technicians can be trained to focus on risk management in order to maintain evidence integrity, but getting the data in new devices violates many forensic science rules.  ( i.e. uploading client software to phones in order to acquire physical data )

Digital forensic analysis is to result showing  human involvement using application(s) to commit an act unlawful or against policy in a way that resulting relevant evidence can be presented in court proceedings.

Scientists rely on facts, numbers, and logic.  Technicians rely on tools, methodologies, and skills.  Courts require relevant, scientific, and admissible evidence.  Digital forensics could grow into science if scientists will focus on analysis in a scientific manner and not technicians try to prove their cutting edge skills as science.  The ultimate goal is to find a human connection to the digital data and not to look at digital data extraction as the "Holy Grail".  A phlebotomist is not a doctor, he/she is a trained technician with tools, methodologies, and skills to draw blood, but at the end the doctor will use that acquired specimen to draw conclusions and to gather numbers to see a larger problem than just an individual being sick.  The phlebotomist will just draw blood seeing discreet individuals.  Thus, data acquisition is nothing to do with the analysis of data nor the technician needs to be scientifically educated, only trained on how to extract data with various methodologies.

So, the scientific analysis result shows data states at the time in question ( stored, transactional, transmission ).  The data itself can be generated by the user, application, or operating system where the user generated data is considered hearsay, thus the weakest evidence.  User generated data must be supported and validated by business records ( application and/or operating system generated data ).  The data content need to be also considered in order to lead the investigator toward the truth of finding intent if unregulated encoding or encryption of data is located on storage device(s).  Many user activities result in data being deleted or hidden from view of "normal means".  All these activities and modifications of data can be traced back to the ability and motivation of the human involved that can be as deep as cultural influence.  Since activities do not necessarily mean illegal activities, the scope of the investigation need to be considered in order to find out the who, what, when, and how questions or determine the need to extend the scope of the investigation ( scope creep ).  Since science does not guarantee undisputed evidence, but merely offer the scientifically proven facts based on knowledge at the time of question, it is the investigator's duty to find relevant evidence that is unbiased in nature ( inculpatory vs. exculpatory ).

Digital forensics is not a business process driven by monetary gains, but the location of the truth.  Those believing that looking for only inculpatory evidence is what digital forensics is about should not be considered forensic analysts, but merely business men.  Digital forensic analysis is also not merely the location of digital data, location if digital data is done by technicians.  Court require the evidence to be scientifically produced and scientific method does not exists for partial methodology, but for the location of the truth.


Tuesday, August 5, 2014

Preparing for a Career in Digital Forensics



Understanding an emerging industry like digital forensics is a challenge since there are not many sources of reliable data to refer to. Emerging by definition means that it is coming into view, thus it was not seen before. That means the lack of statistical data specific to the emerging field. Cybersecurity is different in a way that people are aware of the growing problem and the subsequent need of skilled professionals. People are aware of the problem and might even Googled how to recover data. Because there is no relevant scientific community of true researchers that want to focus on defining and clearing definitions for people, there is the feel of due diligence and that is seen as all it is needed by the industry. People see the job market need by reference and they think they understand the job functions from TV shows like CSI. 

There are school advertising single courses as digital forensics and computer science course titles are changed to secure coding while the contents are same as before. There is a real unethical emergence of feeding on people's lack of understanding of the industry. Awareness and a 3 day boot camp is not what the job market is looking for. The digital forensic analyst job titles are in most cases cover nothing more than acquisition specialists. What digital forensics is a lot of education combined with training that needs to be continuous since technology is changing rapidly. The job market can drive the development of new programs and it should drive the educational institutions focus, but not by fundamental concept of education. Labs and assignments can mirror the job market needs, but the educational aspects like the scientific process driven problem solving and critical thinking has nothing to do with what the job market is advertising. Job market is focusing on business aspects that is driven by profits. If there is a demand for data recovery specialists, then they will advertise for data recovery specialist positions. When the demand is for forensic analysts, they will advertise for forensic analysts. 

When the demand is in eDiscovery then they will advertise for that specific "keyword". In a grand scheme of things, they are really looking for problem solvers that are trained for a specific methodology and tools, but in an environment requiring more and more education in order to gain those skills. Education should not change to meet the changing "keyword soup" job market, but it has to have the agility to quickly adopt to the specific needs. That is what junior colleges are good at. They are not technical schools providing training, but preserving the educational need of the students. They are not continuing education, not boot camps competing with corporate training institutions charging outrageous prices for the latest training that is not fit for those new to the industry. "Sucking it through a fire hose" might fit for other fields, but this field should have a higher ethical standard. Junior colleges are there to provide the latest skills and education that needed for people to reach a long term goal and not just a short term hype. 

I see many people looking at the junior college as an opportunity to save money over corporate training prices and being disappointed that coverage is not just about the tools and skills needed to properly use a specific tool. That is not the goal and will never be, but it does not mean that it is not covering what the institution is aiming for. It means the student did not understand the industry or the goals of different institutions. There is a fit for everyone, but the research and decision is placed on the student to know what he/she needs in order to reach a long term goal. It is not enough to know about this field, you need to follow up with due care in order to fulfill a personal gap; in order to fit not just the current job market, but a long term job market need of highly educated and problem solving workforce.


Saturday, July 5, 2014

Just for fun on 4th of July



How might have Francis Scott Key changed his words in today's cyberwarfare environment.  
Happy birthday America! - I'm not a citizen by birth; I'm a citizen by choice!

Oh, say can you see by the POST's early stage
What so scared we hailed at the boot screen’s last gleaming?
Whose broad stripes and bright stars thru the infected drive,
O'er the keyboard we watched were so gallantly bleeping?
And the LED's red glare, the logic bombs bursting in pair,
Gave logs through the night that our data was still there.
Oh, say does that personally identifiable information yet safe
O'er the land of the digital age and the home of the malware?

On the command line, dimly seen through the pixels of the screen,
Where the hacker's haughty host in dread stealth scan reposes,
What is that which the bits, o'er the towering baseline,
As it fitfully fragments, half conceals, half discloses?
Now it caches the keyword search of the morning's first SMS,
In firewall glory inspected now shines in the null device:
'Tis the ftp site’s welcome banner! Oh legal may it be
O'er the land of the digital age and the home of the malware!

And where is that SNORT who by signatures swore
That the havoc of cyber war and the data breach's confusion,
A personal data and national secrets should leave us no more!
Their friendly electron has washed out their malware pollution.
No boot sector could save the malicious and corrupted data
From the terror of delete, or the gloom of the wipe:
And the personally identifiable information yet safe
O'er the land of the digital age and the home of the malware!

Oh! thus be it little-endian or big, when RAM slack shall be cleared
Between their beloved personal data and the cyber war's desolation!
Blest with logs and honeypots, may the heav'n rescued data
Praise the Power surge that SCADA made and woke us up as a nation.
Then conquer the security education we must, when our virtual cause is just,
And this be our motto: "In cybersecurity education for all we trust."
And the personally identifiable information yet safe
O'er the land of the digital age and the home of the malware!

Friday, June 27, 2014

PowerShell logging

If you ever wondered how to capture everything you type and run in Windows Command Line Interface ( CLI ) like you do in Linux with the script command, then you will like this post.

Just type Start-Transcript
Transcript started, output file is C:\Users\<IUD>\Documents\PowerShell_transcript.20140627204603.txt

Then, you can just type commands as you would normally do without worrying about taking notes on the commands or output of utilities.

$PSVersionTable
Name                                        Value                                                                                                  
----                                             ----                                                                                                   
CLRVersion                                2.0.50727.5477                                                                                
BuildVersion                                6.1.7601.17514                                                                                
PSVersion                                   2.0                                                                                                    
WSManStackVersion                  2.0                                                                                                    
PSCompatibleVersions               {1.0, 2.0}                                                                                          
SerializationVersion                     1.1.0.1                                                                                              
PSRemotingProtocolVersion       2.1                                                                                                    

get-help out*
Name                              Category  Synopsis                                                                                             
----                                    --------  --------                                                                                              
Out-Null                      Cmdlet    Deletes output instead of sending it to the console.                                    
Out-Default                 Cmdlet    Sends the output to the default formatter and to the default output   cmdlet.
Out-Host                     Cmdlet    Sends output to the command line.                                                           
Out-File                       Cmdlet    Sends output to a file.                                                                              
Out-Printer                  Cmdlet    Sends output to a printer.                                                                         
Out-String                   Cmdlet    Sends objects to the host as a series of strings.                                         
Out-GridView             Cmdlet    Sends output to an interactive table in a separate window.                        

Get-ChildItem|out-file c:\temp.txt

When you are done, just type Stop-Transcript
Transcript stopped, output file is C:\Users\<UID>\Documents\PowerShell_transcript.20140627204603.txt

You can just type the file name and its path without any application to open the file, it will open the deafult application associated with the txt file extension.

C:\Users\<UID>\Documents\PowerShell_transcript.20140627204603.txt

The output will be all the same text that you typed and all the output of each command.

**********************                                                                                                                   
Windows PowerShell Transcript Start                                                                                                      
Start time: 20140604110027                                                                                                                   
Username  : FVTC\<UID>                                                                                                                      
Machine  : APPA105B08 (Microsoft Windows NT 6.1.7601 Service Pack 1)                                        
**********************                                                                                                                   
Transcript started, output file is C:\Users\<UID>\Documents\PowerShell_transcript.20140627204603.txt
PS C:\WINDOWS\system32> $PSVersionTable                                                                                  
Name                                             Value                                                                                                
----                                                 -----                                                                                                 
CLRVersion                                   2.0.50727.5477                                                                                
BuildVersion                                   6.1.7601.17514                                                                                
PSVersion                                      2.0                                                                                                    
WSManStackVersion                     2.0                                                                                                    
PSCompatibleVersions                   {1.0, 2.0}                                                                                          
SerializationVersion                         1.1.0.1                                                                                              
PSRemotingProtocolVersion           2.1                                                                                                    
PS C:\WINDOWS\system32> get-help out*                                                                                         
Name                              Category  Synopsis                                                                                          
----                              --------  --------                                                                                                 
Out-Null                        Cmdlet    Deletes output instead of sending it to the console.                                
Out-Default                    Cmdlet    Sends the output to the default formatter and to the default output cmdlet.
Out-Host                       Cmdlet    Sends output to the command line.                                                       
Out-File                         Cmdlet    Sends output to a file.                                                                         
Out-Printer                     Cmdlet    Sends output to a printer.                                                                   
Out-String                      Cmdlet    Sends objects to the host as a series of strings.                                   
Out-GridView                Cmdlet    Sends output to an interactive table in a separate window.                   
PS C:\WINDOWS\system32> Get-ChildItem|out-file c:\temp.txt                                                       
PS C:\WINDOWS\system32> Stop-Transcript                                                                                    
**********************                                                                                                                  
Windows PowerShell Transcript End                                                                                                      
End time: 20140627204603                                                                                                                   
**********************                                                                                                                  

This method can help you in incident response or in live forensic collection when you have to document your interaction with the suspect's system.  I hope, you will find this useful.  Let me know if you do.