Less Sequence: 四月 2015

2015年4月16日星期四

Why Are There So Many Pythons? A Python Implementation Comparison

source from here:

http://www.toptal.com/python/why-are-there-so-many-pythons

Python is amazing.

Surprisingly, that’s a fairly ambiguous statement. What do I mean by ‘Python’? Do I mean Python the abstractinterface? Do I mean CPython, the common Python implementation (and not to be confused with the similarly named Cython)? Or do I mean something else entirely? Maybe I’m obliquely referring to Jython, or IronPython, or PyPy. Or maybe I’ve really gone off the deep end and I’m talking about RPython or RubyPython (which are very, very different things).

While the technologies mentioned above are commonly-named and commonly-referenced, some of them serve completely different purposes (or, at least, operate in completely different ways).

Throughout my time working with the Python interfaces, I’ve run across tons of these .*ython tools. But not until recently did I take the time to understand what they are, how they work, and why they’re necessary (in their own ways).

In this tutorial, I’ll start from scratch and move through the various Python implementations, concluding with a thorough introduction to PyPy, which I believe is the future of the language.

It all starts with an understanding of what ‘Python’ actually is.

If you have a good understanding for machine code, virtual machines, and the like, feel free to skip ahead.

“Is Python interpreted or compiled?”

This is a common point of confusion for Python beginners.

The first thing to realize when making a comparison is that ‘Python’ is an interface. There’s a specification of what Python should do and how it should behave (as with any interface). And there are multipleimplementations (as with any interface).

The second thing to realize is that ‘interpreted’ and ‘compiled’ are properties of an implementation, not aninterface.

So the question itself isn’t really well-formed.

Is Python interpreted or compiled? The question isn't really well-formed.

That said, for the most common Python implementation (CPython: written in C, often referred to as simply ‘Python’, and surely what you’re using if you have no idea what I’m talking about), the answer is: interpreted, with some compilation. CPython compiles* Python source code to bytecode, and then interprets this bytecode, executing it as it goes.

* Note: this isn’t ‘compilation’ in the traditional sense of the word. Typically, we’d say that ‘compilation’ is taking a high-level language and converting it to machine code. But it is a ‘compilation’ of sorts.

Let’s look at that answer more closely, as it will help us understand some of the concepts that come up later in the post.

Bytecode vs. Machine Code

It’s very important to understand the difference between bytecode vs. machine code (aka native code), perhaps best illustrated by example:

C compiles to machine code, which is then run directly on your processor. Each instruction instructs your CPU to move stuff around.
Java compiles to bytecode, which is then run on the Java Virtual Machine (JVM), an abstraction of a computer that executes programs. Each instruction is then handled by the JVM, which interacts with your computer.

In very brief terms: machine code is much faster, but bytecode is more portable and secure.

Machine code looks different depending on your machine, but bytecode looks the same on all machines. One might say that machine code is optimized to your setup.

Returning to CPython implementation, the toolchain process is as follows:

CPython compiles your Python source code into bytecode.
That bytecode is then executed on the CPython Virtual Machine.

Beginners often assume Python is compiled because of .pyc files. There's some truth to that: the .pyc file is the compiled bytecode, which is then interpreted. So if you've run your Python code before and have the .pyc file handy, it will run faster the second time, as it doesn't have to re-compile the bytecode.

Alternative VMs: Jython, IronPython, and More

As I mentioned earlier, Python has several implementations. Again, as mentioned earlier, the most common is CPython, but there are others that should be mentioned for the sake of this comparison guide. This a Python implementation written in C and considered the ‘default’ implementation.

But what about the alternative Python implementations? One of the more prominent is Jython, a Python implementation written Java that utilizes the JVM. While CPython produces bytecode to run on the CPython VM, Jython produces Java bytecode to run on the JVM (this is the same stuff that’s produced when you compile a Java program).

Jython's use of Java bytecode is depicted in this Python implementation diagram.

“Why would you ever use an alternative implementation?”, you might ask. Well, for one, these different Python implementations play nicely with different technology stacks.

CPython makes it very easy to write C-extensions for your Python code because in the end it is executed by a C interpreter. Jython, on the other hand, makes it very easy to work with other Java programs: you can importany Java classes with no additional effort, summoning up and utilizing your Java classes from within your Jython programs. (Aside: if you haven’t thought about it closely, this is actually nuts. We’re at the point where you can mix and mash different languages and compile them all down to the same substance. (As mentioned by Rostin, programs that mix Fortran and C code have been around for a while. So, of course, this isn’t necessarily new. But it’s still cool.))

As an example, this is valid Jython code:

[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_51
>>> from java.util import HashSet
>>> s = HashSet(5)
>>> s.add("Foo")
>>> s.add("Bar")
>>> s
[Foo, Bar]

IronPython is another popular Python implementation, written entirely in C# and targeting the .NET stack. In particular, it runs on what you might call the .NET Virtual Machine, Microsoft’s Common Language Runtime (CLR), comparable to the JVM.

You might say that Jython : Java :: IronPython : C#. They run on the same respective VMs, you can import C# classes from your IronPython code and Java classes from your Jython code, etc.

It’s totally possible to survive without ever touching a non-CPython Python implementation. But there are advantages to be had from switching, most of which are dependent on your technology stack. Using a lot of JVM-based languages? Jython might be for you. All about the .NET stack? Maybe you should try IronPython (and maybe you already have).

This Python comparison chart demonstrates the differences between Python implementations.

By the way: while this wouldn’t be a reason to use a different implementation, note that these implementations do actually differ in behavior beyond how they treat your Python source code. However, these differences are typically minor, and dissolve or emerge over time as these implementations are under active development. For example, IronPython uses Unicode strings by default; CPython, however, defaults to ASCII for versions 2.x (failing with a UnicodeEncodeError for non-ASCII characters), but does support Unicode strings by default for 3.x.

Just-in-Time Compilation: PyPy, and the Future

So we have a Python implementation written in C, one in Java, and one in C#. The next logical step: a Python implementation written in… Python. (The educated reader will note that this is slightly misleading.)

Here’s where things might get confusing. First, lets discuss just-in-time (JIT) compilation.

JIT: The Why and How

Recall that native machine code is much faster than bytecode. Well, what if we could compile some of our bytecode and then run it as native code? We’d have to pay some price to compile the bytecode (i.e., time), but if the end result was faster, that’d be great! This is the motivation of JIT compilation, a hybrid technique that mixes the benefits of interpreters and compilers. In basic terms, JIT wants to utilize compilation to speed up an interpreted system.

For example, a common approach taken by JITs:

Identify bytecode that is executed frequently.
Compile it down to native machine code.
Cache the result.
Whenever the same bytecode is set to be run, instead grab the pre-compiled machine code and reap the benefits (i.e., speed boosts).

This is what PyPy implementation is all about: bringing JIT to Python (see the Appendix for previous efforts). There are, of course, other goals: PyPy aims to be cross-platform, memory-light, and stackless-supportive. But JIT is really its selling point. As an average over a bunch of time tests, it’s said to improve performance by a factor of 6.27. For a breakdown, see this chart from the PyPy Speed Center:

Bringing JIT to Python interface using PyPy implementation pays off in performance improvements.

PyPy is Hard to Understand

PyPy has huge potential, and at this point it’s highly compatible with CPython (so it can run Flask, Django, etc.).

But there’s a lot of confusion around PyPy (see, for example, this nonsensical proposal to create a PyPyPy…). In my opinion, that’s primarily because PyPy is actually two things:

A Python interpreter written in RPython (not Python (I lied before)). RPython is a subset of Python with static typing. In Python, it’s “mostly impossible” to reason rigorously about types (Why is it so hard? Well consider the fact that:
```
 x = random.choice([1, "foo"])
```
would be valid Python code (credit to Ademan). What is the type of x? How can we reason about types of variables when the types aren’t even strictly enforced?). With RPython, you sacrifice some flexibility, but instead make it much, much easier to reason about memory management and whatnot, which allows for optimizations.
A compiler that compiles RPython code for various targets and adds in JIT. The default platform is C, i.e., an RPython-to-C compiler, but you could also target the JVM and others.

Solely for clarity in this Python comparison guide, I’ll refer to these as PyPy (1) and PyPy (2).

Why would you need these two things, and why under the same roof? Think of it this way: PyPy (1) is an interpreter written in RPython. So it takes in the user’s Python code and compiles it down to bytecode. But the interpreter itself (written in RPython) must be interpreted by another Python implementation in order to run, right?

Well, we could just use CPython to run the interpreter. But that wouldn’t be very fast.

Instead, the idea is that we use PyPy (2) (referred to as the RPython Toolchain) to compile PyPy’s interpreter down to code for another platform (e.g., C, JVM, or CLI) to run on our machine, adding in JIT as well. It’s magical: PyPy dynamically adds JIT to an interpreter, generating its own compiler! (Again, this is nuts: we’re compiling an interpreter, adding in another separate, standalone compiler.)

In the end, the result is a standalone executable that interprets Python source code and exploits JIT optimizations. Which is just what we wanted! It’s a mouthful, but maybe this diagram will help:

This diagram illustrates the beauty of the PyPy implementation, including an interpreter, compiler, and an executable with JIT.

To reiterate, the real beauty of PyPy is that we could write ourselves a bunch of different Python interpreters in RPython without worrying about JIT (barring a few hints). PyPy would then implement JIT for us using the RPython Toolchain/PyPy (2).

In fact, if we get even more abstract, you could theoretically write an interpreter for any language, feed it to PyPy, and get a JIT for that language. This is because PyPy focuses on optimizing the actual interpreter, rather than the details of the language it’s interpreting.

You could theoretically write an interpreter for any language, feed it to PyPy, and get a JIT for that language.

As a brief digression, I’d like to mention that the JIT itself is absolutely fascinating. It uses a technique called tracing, which executes as follows:

Run the interpreter and interpret everything (adding in no JIT).
Do some light profiling of the interpreted code.
Identify operations you’ve performed before.
Compile these bits of code down to machine code.

For more, this paper is highly accessible and very interesting.

To wrap up: we use PyPy’s RPython-to-C (or other target platform) compiler to compile PyPy’s RPython-implemented interpreter.

Wrapping Up

After a lengthy comparison of Python implementations, I have to ask myself: Why is this so great? Why is this crazy idea worth pursuing? I think Alex Gaynor put it well on his blog: “[PyPy is the future] because [it] offers better speed, more flexibility, and is a better platform for Python’s growth.”

In short:

It’s fast because it compiles source code to native code (using JIT).
It’s flexible because it adds the JIT to your interpreter with very little additional work.
It’s flexible (again) because you can write your interpreters in RPython, which is easier to extend than, say, C (in fact, it’s so easy that there’s a tutorial for writing your own interpreters).

Appendix: Other Python Names You May Have Heard

Python 3000 (Py3k): an alternative naming for Python 3.0, a major, backwards-incompatible Python release that hit the stage in 2008. The Py3k team predicted that it would take about five years for this new version to be fully adopted. And while most (warning: anecdotal claim) Python developers continue to use Python 2.x, people are increasingly conscious of Py3k.
Cython: a superset of Python that includes bindings to call C functions.
- Goal: allow you to write C extensions for your Python code.
- Also lets you add static typing to your existing Python code, allowing it to be compiled and reach C-like performance.
- This is similar to PyPy, but not the same. In this case, you’re enforcing typing in the user’s code before passing it to a compiler. With PyPy, you write plain old Python, and the compiler handles any optimizations.
Numba: a “just-in-time specializing compiler” that adds JIT to annotated Python code. In the most basic terms, you give it some hints, and it speeds up portions of your code. Numba comes as part of theAnaconda distribution, a set of packages for data analysis and management.
IPython: very different from anything else discussed. A computing environment for Python. Interactive with support for GUI toolkits and browser experience, etc.
Psyco: a Python extension module, and one of the early Python JIT efforts. However, it’s since been marked as “unmaintained and dead”. In fact, the lead developer of Psyco, Armin Rigo, now works on PyPy.

Python Language Bindings

RubyPython: a bridge between the Ruby and Python VMs. Allows you to embed Python code into your Ruby code. You define where the Python starts and stops, and RubyPython marshals the data between the VMs.
PyObjc: language-bindings between Python and Objective-C, acting as a bridge between them. Practically, that means you can utilize Objective-C libraries (including everything you need to create OS X applications) from your Python code, and Python modules from your Objective-C code. In this case, it’s convenient that CPython is written in C, which is a subset of Objective-C.
PyQt: while PyObjc gives you binding for the OS X GUI components, PyQt does the same for the Qt application framework, letting you create rich graphic interfaces, access SQL databases, etc. Another tool aimed at bringing Python’s simplicity to other frameworks.

JavaScript Frameworks

pyjs (Pyjamas): a framework for creating web and desktop applications in Python. Includes a Python-to-JavaScript compiler, a widget set, and some more tools.
Brython: a Python VM written in JavaScript to allow for Py3k code to be executed in the browser.

Become More Advanced: Master the 10 Most Common Python Programming Problems

The source from here:
http://www.toptal.com/python/top-10-mistakes-that-python-programmers-make

About Python

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive forRapid Application Development, as well as for use as a scripting or glue language to connect existing components or services. Python supports modules and packages, thereby encouraging program modularity and code reuse.

About this article

Python’s simple, easy-to-learn syntax can mislead Python developers – especially those who are newer to the language – into missing some of its subtleties and underestimating the power of the diverse Python language.

With that in mind, this article presents a “top 10” list of somewhat subtle, harder-to-catch mistakes that can bite even some more advanced Python developers in the rear.

This Python found himself caught in an advanced Python programming problem.

(Note: This article is intended for a more advanced audience than Common Mistakes of Python Programmers, which is geared more toward those who are newer to the language.)

Common Mistake #1: Misusing expressions as defaults for function arguments

Python allows you to specify that a function argument is optional by providing a default value for it. While this is a great feature of the language, it can lead to some confusion when the default value is mutable. For example, consider this Python function definition:

>>> def foo(bar=[]):        # bar is optional and defaults to [] if not specified...
    bar.append("baz")    # but this line could be problematic, as we'll see....
..    return bar

A common mistake is to think that the optional argument will be set to the specified default expression each time the function is called without supplying a value for the optional argument. In the above code, for example, one might expect that calling foo() repeatedly (i.e., without specifying a bar argument) would always return 'baz', since the assumption would be that each time foo() is called (without a barargument specified) bar is set to [] (i.e., a new empty list).

But let’s look at what actually happens when you do this:

>>> foo()
["baz"]
>>> foo()
["baz", "baz"]
>>> foo()
["baz", "baz", "baz"]

Huh? Why did it keep appending the default value of "baz" to an existing list each time foo() was called, rather than creating a new list each time?

The more advanced Python programming answer is that the default value for a function argument is only evaluated once, at the time that the function is defined. Thus, the bar argument is initialized to its default (i.e., an empty list) only when foo() is first defined, but then calls to foo() (i.e., without a bar argument specified) will continue to use the same list to which bar was originally initialized.

FYI, a common workaround for this is as follows:

>>> def foo(bar=None):...
    if bar is None:  # or if not bar:...
        bar = []...
    bar.append("baz")...
    return bar...

>>> foo()
["baz"]
>>> foo()
["baz"]
>>> foo()
["baz"]

Common Mistake #2: Using class variables incorrectly

Consider the following example:

>>> class A(object):...
     x = 1...

>>> class B(A):...
     pass...

>>> class C(A):...
     pass...

>>> print A.x, B.x, C.x
1 1 1

Makes sense.

>>> B.x = 2
>>> print A.x, B.x, C.x
1 2 1

Yup, again as expected.

>>> A.x = 3
>>> print A.x, B.x, C.x
3 2 3

What the $%#!&?? We only changed A.x. Why did C.x change too?

In Python, class variables are internally handled as dictionaries and follow what is often referred to as Method Resolution Order (MRO). So in the above code, since the attribute x is not found in class C, it will be looked up in its base classes (only A in the above example, although Python supports multiple inheritance). In other words, C doesn’t have its own x property, independent of A. Thus, references to C.x are in fact references to A.x. This causes a Python problem unless it’s handled properly. Learn more aout class attributes in Python.

Common Mistake #3: Specifying parameters incorrectly for an exception block

Suppose you have the following code:

>>> try:...
     l = ["a", "b"]...
     int(l[2])...
 except ValueError, IndexError:  # To catch both exceptions, right?
...     pass...

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
IndexError: list index out of range

The problem here is that the except statement does not take a list of exceptions specified in this manner. Rather, In Python 2.x, the syntax except Exception, e is used to bind the exception to the optional second parameter specified (in this case e), in order to make it available for further inspection. As a result, in the above code, the IndexError exception is not being caught by the except statement; rather, the exception instead ends up being bound to a parameter named IndexError.

The proper way to catch multiple exceptions in an except statement is to specify the first parameter as atuple containing all exceptions to be caught. Also, for maximum portability, use the as keyword, since that syntax is supported by both Python 2 and Python 3:

>>> try:...
     l = ["a", "b"]...
     int(l[2])...
 except (ValueError, IndexError) as e:...  
     pass...

>>>

Common Mistake #4: Misunderstanding Python scope rules

Python scope resolution is based on what is known as the LEGB rule, which is shorthand for Local, Enclosing,Global, Built-in. Seems straightforward enough, right? Well, actually, there are some subtleties to the way this works in Python, which brings us to the common more advanced Python programming problem below. Consider the following:

>>> x = 10
>>> def foo():...
     x += 1...
     print x...

>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in foo
UnboundLocalError: local variable 'x' referenced before assignment

What’s the problem?

The above error occurs because, when you make an assignment to a variable in a scope, that variable is automatically considered by Python to be local to that scope and shadows any similarly named variable in any outer scope.

Many are thereby surprised to get an UnboundLocalError in previously working code when it is modified by adding an assignment statement somewhere in the body of a function. (You can read more about this here.)

It is particularly common for this to trip up developers when using lists. Consider the following example:

>>> lst = [1, 2, 3]
>>> def foo1():
...     lst.append(5)   # This works ok...
...
>>> foo1()
>>> lst
[1, 2, 3, 5]

>>> lst = [1, 2, 3]
>>> def foo2():
...     lst += [5]      # ... but this bombs!
...
>>> foo2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in foo
UnboundLocalError: local variable 'lst' referenced before assignment

Huh? Why did foo2 bomb while foo1 ran fine?

The answer is the same as in the prior example problem, but is admittedly more subtle. foo1 is not making an assignment to lst, whereas foo2 is. Remembering that lst += [5] is really just shorthand for lst = lst + [5], we see that we are attempting to assign a value to lst (therefore presumed by Python to be in the local scope). However, the value we are looking to assign to lst is based on lst itself (again, now presumed to be in the local scope), which has not yet been defined. Boom.

Common Mistake #5: Modifying a list while iterating over it

The problem with the following code should be fairly obvious:

>>> odd = lambda x : bool(x % 2)
>>> numbers = [n for n in range(10)]
>>> for i in range(len(numbers)):...
     if odd(numbers[i]):...
         del numbers[i]  # BAD: Deleting item from a list while iterating over it...

Traceback (most recent call last):
     File "<stdin>", line 2, in <module>
IndexError: list index out of range

Deleting an item from a list or array while iterating over it is a Python problem that is well known to any experienced software developer. But while the example above may be fairly obvious, even advanced developers can be unintentionally bitten by this in code that is much more complex.

Fortunately, Python incorporates a number of elegant programming paradigms which, when used properly, can result in significantly simplified and streamlined code. A side benefit of this is that simpler code is less likely to be bitten by the accidental-deletion-of-a-list-item-while-iterating-over-it bug. One such paradigm is that of list comprehensions. Moreover, list comprehensions are particularly useful for avoiding this specific problem, as shown by this alternate implementation of the above code which works perfectly:

>>> odd = lambda x : bool(x % 2)
>>> numbers = [n for n in range(10)]
>>> numbers[:] = [n for n in numbers if not odd(n)]  # ahh, the beauty of it all
>>> numbers
[0, 2, 4, 6, 8]

Common Mistake #6: Confusing how Python binds variables in closures

Considering the following example:

>>> def create_multipliers():...
     return [lambda x : i * x for i in range(5)]
>>> for multiplier in create_multipliers():...
     print multiplier(2)...

You might expect the following output:

But you actually get:

Surprise!

This happens due to Python’s late binding behavior which says that the values of variables used in closures are looked up at the time the inner function is called. So in the above code, whenever any of the returned functions are called, the value of i is looked up in the surrounding scope at the time it is called (and by then, the loop has completed, so i has already been assigned its final value of 4).

The solution to this common Python problem is a bit of a hack:

>>> def create_multipliers():...
     return [lambda x, i=i : i * x for i in range(5)]...

>>> for multiplier in create_multipliers():...
     print multiplier(2)...

0
2
4
6
8

Voilà! We are taking advantage of default arguments here to generate anonymous functions in order to achieve the desired behavior. Some would call this elegant. Some would call it subtle. Some hate it. But if you’re a Python developer, it’s important to understand in any case.

Common Mistake #7: Creating circular module dependencies

Let’s say you have two files, a.py and b.py, each of which imports the other, as follows:

In a.py:

import b

def f():
    return b.x
 
print f()

And in b.py:

import a

x = 1

def g():
    print a.f()

First, let’s try importing a.py:

>>> import a
1

Worked just fine. Perhaps that surprises you. After all, we do have a circular import here which presumably should be a problem, shouldn’t it?

The answer is that the mere presence of a circular import is not in and of itself a problem in Python. If a module has already been imported, Python is smart enough not to try to re-import it. However, depending on the point at which each module is attempting to access functions or variables defined in the other, you may indeed run into problems.

So returning to our example, when we imported a.py, it had no problem importing b.py, since b.py does not require anything from a.py to be defined at the time it is imported. The only reference in b.py to a is the call to a.f(). But that call is in g() and nothing in a.py or b.py invokes g(). So life is good.

But what happens if we attempt to import b.py (without having previously imported a.py, that is):

>>> import b
Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "b.py", line 1, in <module>
    import a
     File "a.py", line 6, in <module>
 print f()
     File "a.py", line 4, in f
 return b.x
AttributeError: 'module' object has no attribute 'x'

Uh-oh. That’s not good! The problem here is that, in the process of importing b.py, it attempts to import a.py, which in turn calls f(), which attempts to access b.x. But b.x has not yet been defined. Hence the AttributeError exception.

At least one solution to this is quite trivial. Simply modify b.py to import a.py within g():

x = 1

def g():
    import a # This will be evaluated only when g() is called
    print a.f()

No when we import it, everything is fine:

>>> import b
>>> b.g()
1 # Printed a first time since module 'a' calls 'print f()' at the end
1 # Printed a second time, this one is our call to 'g'

Common Mistake #8: Name clashing with Python Standard Library modules

One of the beauties of Python is the wealth of library modules that it comes with “out of the box”. But as a result, if you’re not consciously avoiding it, it’s not that difficult to run into a name clash between the name of one of your modules and a module with the same name in the standard library that ships with Python (for example, you might have a module named email.py in your code, which would be in conflict with the standard library module of the same name).

This can lead to gnarly problems, such as importing another library which in turns tries to import the Python Standard Library version of a module but, since you have a module with the same name, the other package mistakenly imports your version instead of the one within the Python Standard Library. This is where bad Python errors happen.

Care should therefore be exercised to avoid using the same names as those in the Python Standard Library modules. It’s way easier for you to change the name of a module within your package than it is to file a Python Enhancement Proposal (PEP) to request a name change upstream and to try and get that approved.

Common Mistake #9: Failing to address differences between Python 2 and Python 3

Consider the following file foo.py:

import sys

def bar(i):
    if i == 1:
        raise KeyError(1)
    if i == 2:
        raise ValueError(2)

def bad():
    e = None
    try:
        bar(int(sys.argv[1]))
    except KeyError as e:
        print('key error')
    except ValueError as e:
        print('value error')
    print(e)

bad()

On Python 2, this runs fine:

$ python foo.py 1
key error
1
$ python foo.py 2
value error
2

But now let’s give it a whirl on Python 3:

$ python3 foo.py 1
key error
Traceback (most recent call last):
  File "foo.py", line 19, in <module>
    bad()
  File "foo.py", line 17, in bad
    print(e)
UnboundLocalError: local variable 'e' referenced before assignment

What has just happened here? The “problem” is that, in Python 3, the exception object is not accessible beyond the scope of the except block. (The reason for this is that, otherwise, it would keep a reference cycle with the stack frame in memory until the garbage collector runs and purges the references from memory. More technical detail about this is available here).

One way to avoid this issue is to maintain a reference to the exception object outside the scope of the except block so that it remains accessible. Here’s a version of the previous example that uses this technique, thereby yielding code that is both Python 2 and Python 3 friendly:

import sys

def bar(i):
    if i == 1:
        raise KeyError(1)
    if i == 2:
        raise ValueError(2)

def good():
    exception = None
    try:
        bar(int(sys.argv[1]))
    except KeyError as e:
        exception = e
        print('key error')
    except ValueError as e:
        exception = e
        print('value error')
    print(exception)

good()

Running this on Py3k:

$ python3 foo.py 1
key error
1
$ python3 foo.py 2
value error
2

Yippee!

(Incidentally, our Python Hiring Guide discusses a number of other important differences to be aware of when migrating code from Python 2 to Python 3.)

Common Mistake #10: Misusing the `del` method

Let’s say you had this in a file called mod.py:

import foo

class Bar(object):
        ...
    def __del__(self):
        foo.cleanup(self.myhandle)

And you then tried to do this from another_mod.py:

import mod
mybar = mod.Bar()

You’d get an ugly AttributeError exception.

Why? Because, as reported here, when the interpreter shuts down, the module’s global variables are all set toNone. As a result, in the above example, at the point that __del__ is invoked, the name foo has already been set to None.

A solution to this somewhat more advanced Python programming problem would be to use atexit.register() instead. That way, when your program is finished executing (when exiting normally, that is), your registered handlers are kicked off before the interpreter is shut down.

With that understanding, a fix for the above mod.py code might then look something like this:

import foo
import atexit

def cleanup(handle):
    foo.cleanup(handle)


class Bar(object):
    def __init__(self):
        ...
        atexit.register(cleanup, self.myhandle)

This implementation provides a clean and reliable way of calling any needed cleanup functionality upon normal program termination. Obviously, it’s up to foo.cleanup to decide what to do with the object bound to the name self.myhandle, but you get the idea.

Wrap-up

Python is a powerful and flexible language with many mechanisms and paradigms that can greatly improve productivity. As with any software tool or language, though, having a limited understanding or appreciation of its capabilities can sometimes be more of an impediment than a benefit, leaving one in the proverbial state of “knowing enough to be dangerous”.

Familiarizing oneself with the key nuances of Python, such as (but by no means limited to) the moderately advanced programming problems raised in this article, will help optimize use of the language while avoiding some of its more common errors.

You might also want to check out our Insider’s Guide to Python Interviewing for suggestions on interview questions that can help identify Python experts.

We hope you’ve found the pointers in this article helpful and welcome your feedback.

订阅：博文 (Atom)