Friday, January 23, 2009

SciPy 0.7.0 release candidate

I just released the second release candidate for SciPy 0.7.0. Due to an issue with the Window's build scripts, the first release candidate wasn't announced. Unless a major regression or release blocker is discovered, this will become the 0.7.0 final release in a few weeks.

For more information, please see the release notes and my previous blog post. You can download the release from here.

Sunday, January 18, 2009

Matrix SIG archive (1995-2000) back online

I have recently been interested in the early history of multidimensional array support in Python. Most of the early development was discussed on the Matrix SIG. In January 2000, the Matrix SIG was retired and further discussion moved to the numpy-discussion mailing list hosted at sourceforge. More recently the mailing list moved to the SciPy server, which is hosted by Enthought. The archive from 2000 to the present is here.

Unfortunately, the Matrix SIG archive (1995-2000) was missing. And had been missing since at least 2005 when Robert Kern noticed that it had disappeared. Thanks to Skip Montanaro, Brad Knowles, and Barry Warsaw the archive has been restored.

Monday, January 12, 2009

SciPy 0.7 coming soon . . .

After 16 months of hard work the next stable release of scipy is almost ready to be tagged. This is the most significant scipy release in several years. It contains many new features, numerous bug-fixes, improved test coverage, and better documentation.

The scipy developer community started porting scipy to numpy at the end of 2005. Most of the work during 2006-2007 was focused on porting and bug-fixes and culminated in scipy 0.6. However, since the 0.6 release, the development effort has shifted from porting and maintenance to a much greater focus on infrastructure, architecture, and functionality.

One of the most important developments during the last year has been the extensive work on the testing and documentation framework. The improvements to testing and documentation first appeared in the numpy 1.2 release.

Our new testing infrastructure is based on the nose testing framework. This testing framework makes writing tests much easier than previously. The simplified testing framework as well as an increasing recognition of the importance of unit testing among the scipy developers has led to a doubling of the number of tests since the last stable release.

This release also brings immense documentation improvements. You can now view the scipy reference manual online or download it as a PDF file. The new reference guide was built using the popular Sphinx tool. We have also updated the scipy tutorial, which hadn't been touched for several years. Both the reference manual and the tutorial are easily editable using our web-based documentation editor. If you find want to improve the documentation, please register a user name in our web-based editor and correct the issues.

In addition to huge improvements in the testing and documentation infrastructure, this release cleans up a number of annoyances and removes old cruft. There have been a number of deprecations and well-documented API changes in this release. This is also the first stable release since the sandbox was removed.

We also did a fairly extensive review of the code to make sure that all the code is correctly licensed. About a year ago, I noticed that there was code not licensed under the revised BSD license. I was trying to "correct" the license information for the scipy Fedora package, but was told that the Fedora packaging policy required including all licenses found in a package. After doing some grepping and looking at svn commit logs, it was easy to figure out what code needed to be relicensed and who had committed the code. Fortunately I was able to track everyone down and everyone kindly agreed to relicense their code. There was also some code included in scipy derived from "Numerical Recipes in C" code. Unfortunately, the "Numerical Recipes in C" code doesn't permit redistribution. The scipy developers quickly reimplemented the offending code from scratch. With the 0.7 release, scipy only includes code licensed under the revised BSD.

The 0.7 release should be out by the end of the month and we have all ready started working on the next feature release. During the development of the 0.7 release, there has been a rapid increase in community involvement and numerous infrastructure improvements to lower the barrier to contributions (e.g., more explicit coding standards, improved testing infrastructure, better documentation tools). I look forward to seeing this trend continue and invite everyone to become more involved. To learn more about the changes in the 0.7 release, please see the release notes.

Monday, July 7, 2008

SciPy 2008 Conference Program posted

The SciPy 2008 Program Committee has just finalized the conference schedule for this year's conference. So I wanted to take the opportunity to mention some of the many things that have been going on leading up to this 7th annual meeting.

Since 2002, the conference has been driven almost entirely by Enthought (Austin, TX) with on-site co-ordination and assistance by the Center for Advanced Computing Research (Caltech, Pasadena, CA). This year the community took a much larger role in conference planning. I am co-chairing the conference with Travis Vaught of Enthought. Gaël Varoquaux and Stéfan van der Walt have invested a huge amount of time in developing a TurboGears conference website. We have also created a much larger program committee and tutorials committee with members from Europe, Africa, and North America. You can see the entire list of organizers here. During past conferences Enthought has sponsored a small number of students. This year we are also very excited that the Python Software Foundation (PSF) has agreed to help Enthought fund more student sponsorships for this year's conference–bringing the number of students to ten for the first time.

The tutorial committee has decided to offer two tutorial tracks this year, rather than one. The first is a two day in-depth introductory course to scientific computing with Python. The advanced track consists of eight two-hour sessions covering a variety of topics from building extensions to graphical user interfaces.

The program committee has done an excellent job putting together a very interesting schedule. We are very fortunate to have Alex Martelli for our keynote address this year. Although the conference will last the same number of days, there will be a larger number of shorter talks this year (16 talks last year to 23 talks this year). This will be the first year that we'll be publishing a proceedings book for selected talks. Travis Vaught and I will be giving the first annual "State of SciPy" talk. We will also have an expert panel discussion at the end of the conference.

Finally, we are extending the post-conference code sprint from one to two days. Last year the coding sprint was very successful, but there was a feeling that one day was too short. We are hoping to get a large number of participants this year. We already have commitments from several core NumPy, SciPy, IPython, SymPy, Mayavi, Numscons, and ETS developers.

Early registration ends on Friday, July 11, 2008.

Saturday, December 29, 2007

NumPy/SciPy blog aggregator

As part of yesterday's documentation day, I set up the NumPy/SciPy blog aggregator. Gaël Varoquaux did all the work customizing the templates and style sheets. I will post a longer blog tomorrow detailing what was accomplished during the first NumPy/SciPy documentation day.

Sunday, December 23, 2007

The end of the SciPy sandbox

On October 2, 2005, Travis created the newscipy branch to port SciPy to what is now called NumPy. Three days later scipy.sandbox was created. The sandbox was originally intended to be a staging ground for packages that were undergoing rapid development and whose APIs were in flux. It was also a place where broken code could live.

The sandbox is currently creating more problems than it solves:
  1. Sandbox code limits group development, since it is often viewed as a place where a specific developer (or maybe a small group of developers) is experimenting. In fact, several of the packages are simply named after the developer. And branching would be a more appropriate way for experimental work done by a small group of developers.
  2. The ambiguous nature of the sandbox (i.e., in the SciPy trunk, but not in the release) plus a greater tolerance for broken code allows loose coding and documentation standards, which creates a barrier to inclusion in the core.
  3. Having packages included in the trunk implies that the code will eventually move into official releases; but several of the packages (e.g., old graphics packages) will not be included in future releases.
  4. Finally and most importantly, the sandbox leads to confusion and installation headaches. Users expect to have access to sandbox packages when they install SciPy binaries. But if they want to use a sandbox package, they are encouraged to download the source code, edit configuration files, and build a SciPy.
At the recent meeting in Berkeley, it was unanimously agreed upon that we should get rid of the sandbox for the SciPy 0.7.0 release, which is planned for late March. By the 0.7.0 release, the existing sandbox packages will either need to be officially moved into scipy, made into a scikit, moved into a branch, or simply deleted.

Eventually, we would like to see all of the following code/packages/functionality moved into scipy: arpack, buildgrid, constants, delaunay, ga, image, lowbpcg, montecarlo, netcdf, newoptimize, rbf, rkern, and spline. Most of this code will not likely be ready by the 0.7.0 release, so it will probably just be moved into a branch for now.

We would like to see all of the following code/packages/functionality moved into a scikit: ann, exmplpackage, fdfpack, multigrid, pyem, pyloess, svm, and timeseries. My next blog entry will be focused on Scikits.

The packages belonging to specific developers should probably be moved into a branch: cdavid, duard, oliphant, and rkern. And some of the developers are suggesting that numexpr be moved into a separate, stand-alone package. Finally, several packages can just be deleted: arraysetops, cow, gplt, maskedarray, plt, stats, wavelets, and xplt.

Monday, December 17, 2007

NumPy/SciPy strategic planning

This weekend I hosted a 3-day NumPy/SciPy strategic planning meeting at UC Berkeley. Travis Oliphant, Robert Kern, Fernando Perez, Chris Burns, Tom Waite, Brian Hawthorne, Benjamin Ragan-Kelley, and Matthew Brett joined me in the Computational Infrastructure for Research Laboratory (CIRL) space in Giannini Hall. We were joined remotely by Stefan van der Walt from South Africa. We also had conference call with William Stein, Brian Granger, and David Cournapeau as well as being joined by several developers via IRC.

Over the next few days I will be blogging about some of the things we discussed and started implementing.