Wednesday, February 11, 2009

SciPy 0.7.0 released

I'm pleased to announce SciPy 0.7.0. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more.

This release comes sixteen months after the 0.6.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.7.0 requires Python 2.4 or greater (but not Python 3) and NumPy 1.2.0 or greater.

For information, please see the release notes or my previous post. You can download the release from here. Thanks to everybody who contributed to this release.

Wednesday, February 4, 2009

When will NumPy (and SciPy) migrate to Python 3?

Python 3.0 was released on December 3rd, 2008. This release is a major redesign of the language, which intentionally breaks compatibility with the 2.x series of releases. Now that it has been released, many projects have to decide how and when to migrate to Python 3.

The Python developers have attempted to make this transition as painless as possible. For instance, Python 2.6 helps simplify the migration path from the 2.x to the 3.x release series. Python 2.6 incorporates everything from 3.0 that doesn't introduce incompatibilities with the 2.x series. It also can be run with a -3 switch to warn about what will no longer work in Python 3. The developers also provide a Python program, called 2to3, to automatically convert Python 2.x source code to valid 3.x code.

The suggested strategy for migrating to Python 3 is essentially:
  1. port to 2.6
  2. fix all the warnings raised by the -3 switch
  3. run 2to3
  4. fix any remaining issues
A major prerequisite for this transition is excellent test coverage. Both NumPy and SciPy currently lack comprehensive test coverage, but we are making major improvements in this area. Over the last year, we have implemented a new testing framework based on nose and have doubled the number of tests for both projects. Over the next year, we will need to continue this trend and expand our test coverage even more.

While the above procedure of using the 2to3 tool works relatively for pure Python code, there is no automatic conversion tool for extension code. NumPy is mostly written in C and makes extensive use of the Python C-API. So converting NumPy will require much more than running the 2to3 tool. Once NumPy has been successfully ported, we will port SciPy to Python 3. Porting SciPy should be considerably easier. Regardless before porting either project to Python 3, we will need to ensure that both projects fully support Python 2.6.

Porting NumPy/SciPy to Python 2.6

Porting to Python 2.6 is a very pressing issue as at least one Linux distribution (openSUSE 11.1) has all ready moved to Python 2.6. Fedora 11 (scheduled to be released on 5/26/2009) will be based on Python 2.6.

The main issue with 2.6 support is NumPy. Over the last month or so, there has been a significant focus on making both NumPy and SciPy compatible with Python 2.6 largely thanks to the efforts of David Cournapeau. For example, the upcoming SciPy 0.7.0 release has replaced md5 and popen4 with hashlib and subprocess respectively (unfortunately, it appears that subprocess has a potential race condition). On UNIX (including Mac OS X), NumPy 1.2.1 mostly works under Python 2.6. On Windows, however, 1.2.1 has a number of problems related to the compilation process. Fixing these compilation issues required some fairly extensive changes and, thus, will not be included in a 1.2.x bug-fix release. However, these issues have mostly been addressed on the development trunk and will be included in the upcoming NumPy 1.3 release.

Hopefully, we will have a beta release of NumPy 1.3 out in a few weeks.  And we should have a release candidate out shortly after. If all goes well, both NumPy and SciPy will be Python 2.6 compatible for all platforms by March.

Porting NumPy/SciPy to Python 3

Once we finish porting to Python 2.6 and remove all the warnings raised by the -3 switch, we will be ready to start seriously planning to port NumPy to Python 3. Supporting Python 3 will require significant effort, since a lot of C code has to be ported. We have taken a preliminary look at what this port will entail and have identified at least the following issues to be addressed:
  • PyNumberMethods has changed: nb_divide, nb_coerce, nb_oct, nb_hex, and nb_inplace_divide have been removed
  • PyObject_VAR_HEAD has changed to conform to standard C
  • PyString_* is gone, all occurrences will need to be replaced with PyUnicode_* or PyBytes_*
  • PyInt_* is gone, all occurrences will need to be replaced with PyLong_*
  • Buffer interface has changed; this is a fairly big change and will require the most work
Given the amount of developer effort we currently have, it is difficult to imagine how we would be able to reasonably support two development branches (i.e., one for Python 2 and another for Python 3) for any significant amount of time. Obviously things may change; but, at this point, it looks like once we port NumPy to Python 3 we will only make bug-fix releases supporting Python 2.

Before porting to Python 3, we will be paying close attention to how the major Linux distributions will be handling the transition to Python 3. Fedora developers have started a lively discussion about how they will handle the transition to Python 3. The discussion indicates that there is a reluctance on their part to support both Python 2 and 3 in the same release.

With the last release of NumPy (1.2) and the upcoming SciPy release (0.7) we dropped support for Python 2.3. Since many scientists may not be able to quickly upgrade to Python 3, we will be reluctant to drop support for Python 2 for some time. Given our desire to provide the newest releases of NumPy and SciPy to as many users as possible, we will be closely listening to them to determine how quickly we can move to Python 3 and drop support for Python 2. And we will use this time to continue adding features, fixing bugs, improving documentation, and perhaps most importantly extending our test coverage.

It is likely that NumPy 1.4 and SciPy 0.8 (I am hoping that we will be able to release both by the end of 2009 or early 2010) will be based on Python 2. The following releases, NumPy 1.5 and SciPy 0.9, would be the earliest point that I can see us switching to Python 3.

Over the last year there has been a lot of discussion about the transition to Python 3 on the mailing lists, at our annual conference, during coding sprints and planning meetings, as well as private conversations. One topic that has come up is whether this transition would be a good opportunity to simultaneously do a major redesign of NumPy.

While there is a temptation to take advantage of this opportunity for a major release, we quickly realized that doing so would be a huge mistake. First, it would make it difficult for scientists and researchers to isolate the root cause of errors in their code when porting to the new release. Is the problem with the Python 3 port or the port to the new NumPy release? Second, it would be extremely poor community behavior. If other major packages succumb to this temptation as well, switching to Python 3 will become an increasingly daunting task for all the code out there, which use these packages. So when we port NumPy to Python 3, we will do so in a release that includes no API or ABI changes not strictly related to Python 3.