mnfy — minify/obfuscate Python 3 source code

What the heck is mnfy for?

The mnfy project was created for two reasons:

  • To show that shipping bytecode files without source, as a form of obfuscation, is not the best option available
  • Provide a minification of Python source code when total byte size of source code is paramount

When people ship Python code as only bytecode files (i.e. only .pyo files and no .py files), there are couple drawbacks. First and foremost, it prevents users from using your code with all available Python interpreters such as Jython and IronPython. Another drawback is that it is a poor form of obfuscation as projects like Meta allow you to take bytecode and reverse-engineer the original source code as enough details are kept that the only details missing are single-line comments.

When the total number of bytes used to ship Python code is paramount, then you want to minify the source code. Bytecode files actually contain so much detail that the space savings can be miniscule (e.g. the decimal module from Python’s standard libary, which is the largest single file in the stdlib, has a bytecode file that is only 5% smaller than its original source code).

Usage

A note about version numbers and Python version compatibility

The version number for mnfy is PEP 386 compliant, taking the form of PPP.FFF.BBB. The FFF.BBB represents the feature and bugfix version numbers of mnfy itself. The PPP portion of the version number represents the Python version that mnfy is compatible with: '{}{}'.format(*sys.version_info[:2]).

The Python version that mnfy is compatible with is directly embedded in the version number as Python’s AST is not guaranteed to be backwards-compatible. This means that you should use each version of mnfy with a specific version of Python. Since mnfy works with source code and not bytecode you can safely use mnfy on code that must be backwards-compatible with older versions of Python, just make sure the interpreter you use with mnfy is correct and can parse the source code (e.g. just because the latest version of mnfy only works with Python 3.3 does not mean you cannot use that release against source code that must work with Python 3.2, just make sure to use a Python 3.3 interpreter with mnfy and that the source code can be read by a Python 3.3 interpreter).

Command-line Usage

TL;DR: pass the file you want to minify as an argument to mnfy and it will print to stdout the source code minified such that the AST is exactly the same as the original source code. To get transformations that will change the AST to varying degrees you will need to specificy various flags.

See the help message for the project for full instructions on usage:

python3 -m mnfy -h
python3 mnfy.py -h

If you happen to define the MNFY_RICHARD_JONES environment variable then not only will mnfy be installed, but so will nfy which just calls mnfy for you. This is so that you can use python -mnfy to invoke mnfy (i.e. minifying the “mnfy” name). The environment variable name is in honour of Richard Jones who first came up with the minifed name idea.

Transformations

Source emission

If you want no change to the AST compared to the original source code then you want mnfy’s default behaviour of only emitting source code with not AST changes. Any tricks with source code formatting have been verified by passing Python’s standard library through mnfy with only source emission used and comparing the result AST for no changes.

As an example of what source emission does, this code (32 characters):

if True:
  x = 5 + 2
  y = 9 - 1

becomes (19 characters):

if True:x=5+2;y=9-1

Safe transformations

For a transformation to be considered safe it must semantically equivalent to running the code as python3 -OO but can lead to a change in the AST. As the changes are semantically safe there is only a single option to turn on these transformations.

Combine imports

Take imports that are sequentially next to each other and put them on the same line without changing the import order.

From:

import X  # 8 characters
import Y  # 8 characters; 16 total

to:

import X,Y  # 10 characters

From:

from X import y  # 15 characters
from X import z  # 15 characters; 30 total

to:

from X import y,z  # 17 characters

Combine with statements

As of Python 3.2, contextlib.nested() is syntactically supported.

From:

with A:
 with B:pass

to:

with A,B:pass

Eliminate unused constants

If a constant isn’t used then there is no need to keep it around. This primarily eliminates docstrings. If any block becomes completely empty then a pass statement is inserted.

From:

def bacon():
 """Docstring"""

to:

def bacon():pass

From:

if X:pass
else:4+2

to:

if X:pass

Integer constants to power

For sufficiently large integer constants, it saves space to use the power operator (**). Only numbers of base 2 and 10 are used as that is what the math module supports.

From:

4294967296

to:

2**32

Sane transformations

For typical code, sane transformations should be fine (e.g. you are not introspecting local variables). Since these transformations are typically safe you can turn them all on with a single option, but they can also be switched on individually as desired.

Note

Currently there are no sane transformations defined. See the issue tracker for some proposed transformations.

Unsafe transformations

For the more adventurous who know what features of Python their code relies on, unsafe transformations can be used. Just be very aware of what your code depends on before using any specific transformation. For this reason each unsafe transformation must be switched on individually.

Function to lambda

This is unsafe as lambda functions are not exactly like a function (e.g. lambda functions do not have a __name__ attribute).

From:

def identity(x):return x  # 24 characters

to:

identity=lambda x:x  # 19 characters