Saturday, December 29, 2007

NumPy/SciPy blog aggregator

As part of yesterday's documentation day, I set up the NumPy/SciPy blog aggregator. Gaƫl Varoquaux did all the work customizing the templates and style sheets. I will post a longer blog tomorrow detailing what was accomplished during the first NumPy/SciPy documentation day.

Sunday, December 23, 2007

The end of the SciPy sandbox

On October 2, 2005, Travis created the newscipy branch to port SciPy to what is now called NumPy. Three days later scipy.sandbox was created. The sandbox was originally intended to be a staging ground for packages that were undergoing rapid development and whose APIs were in flux. It was also a place where broken code could live.

The sandbox is currently creating more problems than it solves:
  1. Sandbox code limits group development, since it is often viewed as a place where a specific developer (or maybe a small group of developers) is experimenting. In fact, several of the packages are simply named after the developer. And branching would be a more appropriate way for experimental work done by a small group of developers.
  2. The ambiguous nature of the sandbox (i.e., in the SciPy trunk, but not in the release) plus a greater tolerance for broken code allows loose coding and documentation standards, which creates a barrier to inclusion in the core.
  3. Having packages included in the trunk implies that the code will eventually move into official releases; but several of the packages (e.g., old graphics packages) will not be included in future releases.
  4. Finally and most importantly, the sandbox leads to confusion and installation headaches. Users expect to have access to sandbox packages when they install SciPy binaries. But if they want to use a sandbox package, they are encouraged to download the source code, edit configuration files, and build a SciPy.
At the recent meeting in Berkeley, it was unanimously agreed upon that we should get rid of the sandbox for the SciPy 0.7.0 release, which is planned for late March. By the 0.7.0 release, the existing sandbox packages will either need to be officially moved into scipy, made into a scikit, moved into a branch, or simply deleted.

Eventually, we would like to see all of the following code/packages/functionality moved into scipy: arpack, buildgrid, constants, delaunay, ga, image, lowbpcg, montecarlo, netcdf, newoptimize, rbf, rkern, and spline. Most of this code will not likely be ready by the 0.7.0 release, so it will probably just be moved into a branch for now.

We would like to see all of the following code/packages/functionality moved into a scikit: ann, exmplpackage, fdfpack, multigrid, pyem, pyloess, svm, and timeseries. My next blog entry will be focused on Scikits.

The packages belonging to specific developers should probably be moved into a branch: cdavid, duard, oliphant, and rkern. And some of the developers are suggesting that numexpr be moved into a separate, stand-alone package. Finally, several packages can just be deleted: arraysetops, cow, gplt, maskedarray, plt, stats, wavelets, and xplt.

Monday, December 17, 2007

NumPy/SciPy strategic planning

This weekend I hosted a 3-day NumPy/SciPy strategic planning meeting at UC Berkeley. Travis Oliphant, Robert Kern, Fernando Perez, Chris Burns, Tom Waite, Brian Hawthorne, Benjamin Ragan-Kelley, and Matthew Brett joined me in the Computational Infrastructure for Research Laboratory (CIRL) space in Giannini Hall. We were joined remotely by Stefan van der Walt from South Africa. We also had conference call with William Stein, Brian Granger, and David Cournapeau as well as being joined by several developers via IRC.

Over the next few days I will be blogging about some of the things we discussed and started implementing.