Question

I m thinking of implementing a configuration file written in Python syntax, not unlike what Django does.

While I ve seen one or two SO questions about the merits of using executable code in configuration files, I m curious whether there is a way to execute the config file code in a "sandbox" to prevent mistakes in the code from locking up the host application.

Because the host application is a programmer s tool, I m not concerned about teaching Python syntax or introducing security holes as mentioned in at least one other SO question. But I am worried about the configuration code branching to Fishkill and wedging the host app. I d much rather that the host app trap those problems and display diagnostic error information.

Has anyone tried this sort of sandboxing for a Python configuration file? And, if so, what techniques proved useful, and what pitfalls cropped up that I should be aware of?

Answer 1

We do this for some of our internal tools

What we do protects us from exception issues and discourages any attempts by the users to get overly creative in the config scripts. However it doesn t protect us from infinite loops or actively malicious third parties.

The core of the approach here is to run the script in a locked down exec.

First we go through the __ builtin __ module and del everything we don t want them to be able to touch, especially __ import __. We actually do this in a context manager which backs the original values up and dels them on the way in and then restores the original values on the way back out.
Next we create an empty dictionary to be the config scripts namespace.
Then we exec the config with the namespace.
The exec is of course wrapped in a try except that will catch anything.
And finally we inspect the namespace to extract the variables we are interested in.

Points to note here:

It might be tempting to prepopulate the namespace with stuff that might be useful to the config script, but you want to be very careful doing that you quickly open up hooks back into the host program.
The config scripts can still create functions and classes so you might get back something that looks like a string for example, but is actually an arbitrary blob of executable code.

Because of these we impose the restriction that our config scripts are expected to produce pure primitive data structures (generally just ints, strings, lists, tuples and None) that we then separately verify.

Answer 2

Unfortunately there isn t a lot you can do about this issue with standard Python. When the Python interpreter is running the "configuration code" that code can do whatever it likes including accessing the host program or not returning control. Running the configuration code in a separate process might help but also limits the interaction between the host and config code.

Your best bet would be to check out the PyPy project s sandbox feature. This might be what you need but may also involve quite a bit of work on your part to integrate.

Is there an alternative to rexec for Python sandboxing? also discusses this topic.

You should probably also ask yourself how important this problem actually is to you. I guess that depends on your use case and who s going to be writing the configuration code.

友情链接