Wednesday, February 4, 2009

When will NumPy (and SciPy) migrate to Python 3?

Python 3.0 was released on December 3rd, 2008. This release is a major redesign of the language, which intentionally breaks compatibility with the 2.x series of releases. Now that it has been released, many projects have to decide how and when to migrate to Python 3.

The Python developers have attempted to make this transition as painless as possible. For instance, Python 2.6 helps simplify the migration path from the 2.x to the 3.x release series. Python 2.6 incorporates everything from 3.0 that doesn't introduce incompatibilities with the 2.x series. It also can be run with a -3 switch to warn about what will no longer work in Python 3. The developers also provide a Python program, called 2to3, to automatically convert Python 2.x source code to valid 3.x code.

The suggested strategy for migrating to Python 3 is essentially:
  1. port to 2.6
  2. fix all the warnings raised by the -3 switch
  3. run 2to3
  4. fix any remaining issues
A major prerequisite for this transition is excellent test coverage. Both NumPy and SciPy currently lack comprehensive test coverage, but we are making major improvements in this area. Over the last year, we have implemented a new testing framework based on nose and have doubled the number of tests for both projects. Over the next year, we will need to continue this trend and expand our test coverage even more.

While the above procedure of using the 2to3 tool works relatively for pure Python code, there is no automatic conversion tool for extension code. NumPy is mostly written in C and makes extensive use of the Python C-API. So converting NumPy will require much more than running the 2to3 tool. Once NumPy has been successfully ported, we will port SciPy to Python 3. Porting SciPy should be considerably easier. Regardless before porting either project to Python 3, we will need to ensure that both projects fully support Python 2.6.

Porting NumPy/SciPy to Python 2.6

Porting to Python 2.6 is a very pressing issue as at least one Linux distribution (openSUSE 11.1) has all ready moved to Python 2.6. Fedora 11 (scheduled to be released on 5/26/2009) will be based on Python 2.6.

The main issue with 2.6 support is NumPy. Over the last month or so, there has been a significant focus on making both NumPy and SciPy compatible with Python 2.6 largely thanks to the efforts of David Cournapeau. For example, the upcoming SciPy 0.7.0 release has replaced md5 and popen4 with hashlib and subprocess respectively (unfortunately, it appears that subprocess has a potential race condition). On UNIX (including Mac OS X), NumPy 1.2.1 mostly works under Python 2.6. On Windows, however, 1.2.1 has a number of problems related to the compilation process. Fixing these compilation issues required some fairly extensive changes and, thus, will not be included in a 1.2.x bug-fix release. However, these issues have mostly been addressed on the development trunk and will be included in the upcoming NumPy 1.3 release.

Hopefully, we will have a beta release of NumPy 1.3 out in a few weeks.  And we should have a release candidate out shortly after. If all goes well, both NumPy and SciPy will be Python 2.6 compatible for all platforms by March.

Porting NumPy/SciPy to Python 3

Once we finish porting to Python 2.6 and remove all the warnings raised by the -3 switch, we will be ready to start seriously planning to port NumPy to Python 3. Supporting Python 3 will require significant effort, since a lot of C code has to be ported. We have taken a preliminary look at what this port will entail and have identified at least the following issues to be addressed:
  • PyNumberMethods has changed: nb_divide, nb_coerce, nb_oct, nb_hex, and nb_inplace_divide have been removed
  • PyObject_VAR_HEAD has changed to conform to standard C
  • PyString_* is gone, all occurrences will need to be replaced with PyUnicode_* or PyBytes_*
  • PyInt_* is gone, all occurrences will need to be replaced with PyLong_*
  • Buffer interface has changed; this is a fairly big change and will require the most work
Given the amount of developer effort we currently have, it is difficult to imagine how we would be able to reasonably support two development branches (i.e., one for Python 2 and another for Python 3) for any significant amount of time. Obviously things may change; but, at this point, it looks like once we port NumPy to Python 3 we will only make bug-fix releases supporting Python 2.

Before porting to Python 3, we will be paying close attention to how the major Linux distributions will be handling the transition to Python 3. Fedora developers have started a lively discussion about how they will handle the transition to Python 3. The discussion indicates that there is a reluctance on their part to support both Python 2 and 3 in the same release.

With the last release of NumPy (1.2) and the upcoming SciPy release (0.7) we dropped support for Python 2.3. Since many scientists may not be able to quickly upgrade to Python 3, we will be reluctant to drop support for Python 2 for some time. Given our desire to provide the newest releases of NumPy and SciPy to as many users as possible, we will be closely listening to them to determine how quickly we can move to Python 3 and drop support for Python 2. And we will use this time to continue adding features, fixing bugs, improving documentation, and perhaps most importantly extending our test coverage.

It is likely that NumPy 1.4 and SciPy 0.8 (I am hoping that we will be able to release both by the end of 2009 or early 2010) will be based on Python 2. The following releases, NumPy 1.5 and SciPy 0.9, would be the earliest point that I can see us switching to Python 3.

Over the last year there has been a lot of discussion about the transition to Python 3 on the mailing lists, at our annual conference, during coding sprints and planning meetings, as well as private conversations. One topic that has come up is whether this transition would be a good opportunity to simultaneously do a major redesign of NumPy.

While there is a temptation to take advantage of this opportunity for a major release, we quickly realized that doing so would be a huge mistake. First, it would make it difficult for scientists and researchers to isolate the root cause of errors in their code when porting to the new release. Is the problem with the Python 3 port or the port to the new NumPy release? Second, it would be extremely poor community behavior. If other major packages succumb to this temptation as well, switching to Python 3 will become an increasingly daunting task for all the code out there, which use these packages. So when we port NumPy to Python 3, we will do so in a release that includes no API or ABI changes not strictly related to Python 3.

12 comments:

cartman said...

Thanks for the heads up, numpy 2.6 port is greatly appreciated.

Noel O'Boyle said...

Python 3 support in late 2010 then. I'm sorry to hear that it'll take so long. This will hold up the whole ecosystem of Python packages (and their users) that depends on numpy. But it seems it's unavoidable :-/

Anonymous said...

As I understand it, Cython is already working to generate Python 3 compatible .c files from .pyx and also supports the buffer interface -- we might be able to leverage their work to get something working sooner than anticipated, and perhaps even to have the same source (.pyx and .c) work for both 2.x and 3.

Thanks for the post.

René Dudfield said...

hi,

with pygame, we have made our 'pgreloaded' branch work with py3k and python 2.x. Well Marcus did it all himself in a day or so part time.

So it is possible to do.

Also Lenard managed to partially compile numpy for python 2.6 on windows... so if you need any help from him with this please ask him - he's a windows ninja.

http://www3.telus.net/len_l/pygame/


cu.

Unknown said...

I have the feeling that is something wrong with NumPy, Python goal is to make the life and the program simple and easy, if developpers of NumPy created a dinosaur that cannot be maintained this is a big problem, I can see two solutions (because wait more that one year that one package to move for me is the dead of the package):
1. ask Google and all the Python community for help (don't be scary to avoid that you need external help, you have actions inside Matlab?)
2. decide inside the Python community that we need to build alternative package Python 3 oriented and be able to support more quickly the Python version changes.

Jarrod Millman said...

NumPy is quite far from being a dinosaur. It is an active, vibrant community project that has undergone at least two major rewrites in its 10+ year history.

My post wasn't intended to suggest in any way that the project is dying--far from it. I agree that for a pure Python project and even some extension code porting to Python 3 should be relatively easy and may happen quickly. However, for the reasons that I listed I don't believe NumPy will support Python 3 until at least 2010.

Of course we would welcome additional developers. And as Andrew suggested, it is possible that converting some or most of the C extension code in NumPy to Cython may be a useful approach. If anyone is interested in looking into this, it would be very useful (perhaps a Google Summer of Code project). Last summer, Dag Sverre Seljebotn completed a related GSoC project to provide enhanced NumPy support in Cython, which is all ready being used in SciPy (for instance see the kd-tree class for efficient nearest-neighbor queries).

Barry Wardell said...

I'm a student planning on applying for the Summer of Code this year. I think a project porting NumPy/SciPy to Python 3 sounds interesting. Is there somewhere I could find more information on this project? I've looked at the SummerofCodeIdeas wiki page, but there isn't much information there. Is there somewhere I could find more information (potential mentors, experience/skills required, what would be a realistic goal for a summer project, etc.)?

Jarrod Millman said...

Hello Barry,

Thanks for the interest in working on the Python 3 port as part of the 2009 Google Summer of Code. The best place to find additional information is to ask on the NumPy and SciPy developer's list. There is all ready at least one other student who has started looking into this. I am not sure that porting to Python 3 is a particularly good project, but there are a number of related sub-projects that could be of great benefit. In particular, increasing test coverage in preparation for the port would be a very reasonable place to focus.

Barry Wardell said...

Thanks, I've joined the mailing list now and am catching up on the Python 3 discussions there.

Anonymous said...

Any word on the status of the 3.0 port?

Neil said...

I believe the current development version of numpy is compatible with Python 3.

Unknown said...

As far as I understand it from the NumPy mailing list, we are not too far away from a release which will deliver support for bothy Python 2.7 and 3.1, see also: http://mail.scipy.org/pipermail/numpy-discussion/2010-July/051436.html