Reading about the latest vulnerabilities in Rails, got me thinking about a similar issue we have in Python.
It is well known that using pickle on untrusted data is insecure to the point of allowing arbitrary code execution. Or at least it should be.
If we head to the official documentation for pickle we’ll find this warning:
Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Now, how many of us check the official pickle documentation (or the official docs for any other module) every time we’re going to use it? Even if we’ve read that warning before, it’s easy to forget it and mistake pickle for any other serialization format (specially when solving a problem for which pickle is just a tool). We might even be using someone else’s code that unpickles insecure data.
Pickle as a vulnerability
As a sanity check, to see if I was being paranoid, I went over to GitHub and ran a search for “pickle.loads” which lead to 36,375 results. After that, it took about 10 minutes to find a vulnerable project and exploit the vulnerability (actual code used by companies, not just some learn_pickle_test.py
). I also found many “potentially vulnerable” projects, but didn’t taking the time to try to demonstrate the issue on each of them.
Exploiting the vulnerability
The vulnerable code looks something like this:
and at some point makes the server listen by doing: Exploiting this is as easy as opening netcat withnc -l -p 9000
on an accessible server and running:
This will give the attacker a shell as the user running the vulnerable code, which would give an attacker a great deal of control over the server and the possibility of escalating privileges (if the user is not already root!)
This is a very simple exploit, and it will leave a trace of us being there (though we could eliminate it once we have shell access), but it is incredibly fast and easy to write. It’s possible to write much more sophisticated exploits with a bit of knowledge of the internals of pickle.
The shellcode we’re using in the example, or any other shellcode, can be obtained by a code like this:
This is because pickle is a very simple stack language that allows arbitrary objects to declare how they should be pickled by defining a __reduce__
method.
Protecting ourselves
Only unpickle stuff you pickled yourself
This is harder than it sounds. Initially we could trust things that come from our own server (i.e.: if it’s in the DB, redis, the cache, etc, then I can trust it), but if someone has access to manipulate them, then they can also exploit this vulnerability. An example would be getting the server above to only accept conections from localhost, but run it on shared hosting.
Just avoid pickle
The easiest way of protecting ourselves is to not use pickle unless it’s actually necessary, and to check our dependencies to see if they are vulnerable to these (and other) atacks. If all we need is a serialization format for data, JSON is probably enough.
Don’t trust anyone
This is just another case of atackers using unvalidated data to break into our system. As with any user submited data, we need to somehow make sure that it is not being spoofed and/or that our code can handle malicious data graciously (also, harder than it sounds).
Better interfaces
The best way to avoid seeing any vulnerability in the wild is to educate ourselves to know when a piece of code is vulnerable. As API developers, a way of doing this is making sure that unsafe actions raise the necessary warnings or are self-documenting. In the case of pickle I’d like to see: