Saturday, October 9, 2010

What's the best way to interleave two Python lists?

[NOTE:  I wrote this in January 2009, but didn't publish it.  Originally, I planned to provide a short discussion about each of the potential solutions listed below, which I never got around to doing.  Anyway I just noticed my draft and decided to go ahead and publish it without adding any more discussion. The code snippets seem fairly self explanatory.  If anyone has any comments on the various solutions, I would be very interested in hearing them.]


Until early 2009, I had to add the following site.cfg file to build numpy or scipy on my 64-bit Fedora Linux box:
[DEFAULT]
library_dirs = /usr/lib64
To make numpy aware of the default location required me to add /usr/lib64 to default_lib_dirs (which I will refer to as lib_dirs for brevity) in numpy/distutils/system_info.py.

Where do 64-bit libraries belong?

The lib64 directory is the default location for 64-bit libraries on Red Hat-based system. Unfortunately, not all Linux distributions conform to this convention; but, fortunately, most distributions that don't use lib64 as the default location for 64-bit libraries at least create a lib64 symlink pointing to whatever their default location happens to be. So it appears I can assume that if I am on a 64-bit machine, looking in lib64 before lib should work in most cases.

Since I only wanted to add the lib64 path on 64-bit machines, I changed the assignment to:
lib_dirs = libpaths(['/usr/lib'], platform_bits)
where libpaths returns ['/usr/lib'] when platform_bits is 32 and ['/usr/lib64', '/usr/lib'] when it is 64. I used the platform module to set platform_bits:
# Determine number of bits
import platform
_bits = {'32bit':32,'64bit':64}
platform_bits = _bits[platform.architecture()[0]]


An outline of the solution

So far everything has been pretty straight-forward. Now all that is left is to write libpaths.
def libpaths(paths, bits): """Return a list of library paths valid on 32 or 64 bit systems. Parameters ---------- paths : sequence A sequence of strings (typically paths) bits : int An integer, the only valid values are 32 or 64. Examples -------- >>> paths = ['/usr/lib'] >>> libpaths(paths,32) ['/usr/lib'] >>> libpaths(paths,64) ['/usr/lib64', '/usr/lib'] """ if bits not in (32, 64): raise ValueError # Handle 32bit case if bits==32: return paths # Handle 64bit case return ????


How to skin the cat?

So we finally arrive at the motivation for this post. At this point, I started thinking that if I had two equal-sized lists that there should be a simple function for interleaving the elements of the two lists to make a new list. Something like zip. But zip returns a list of tuples. After discussing this with several people (Fernando Pérez, Brian Hawthorne, and Stéfan van der Walt), we came up with several different solutions.

  • Solution 1:

from itertools import cycle paths64 = (p+'64' for p in paths) return list((x.next() for x in cycle([iter(paths),paths64])))

  • Solution 2:

def _(): for path in paths: yield path yield path+'64' return list(_())

  • Solution 3:

out = [None]*(2*len(paths)) out[::2] = paths out[1::2] = (p+'64' for p in paths) return out

  • Solution 4:

out = [] for p in paths: out.append(p) out.append(p+'64') return out

  • Solution 5:

out = [] for p in paths: out.extend([p, p+'64']) return out

  • Solution 6:

return [item for items in zip(paths, (p+'64' for p in paths)) for item in items]

  • Solution 7:

from operator import concat return reduce(concat, ([p, p+'64'] for p in paths))
I liked Solution 5 the best and it is what I used.

An itertools recipe

While we were looking for a solution, Fernando and I came up with the following recipe:

from itertools import cycle,imap def fromeach(*iters): """Take elements one at a time from each iterable, cycling them all. It returns a single iterable that stops whenever any of its arguments is exhausted. Note: it differs from roundrobin in the itertools recipes, in that roundrobin continues until all of its arguments are exhausted (for this reason roundrobin also needs more complex logic and thus has more overhead). Examples: >>> list(fromeach([1,2],[3,4])) [1, 3, 2, 4] >>> list(fromeach('ABC', 'D', 'EF')) ['A', 'D', 'E', 'B'] """ return (x.next() for x in cycle(imap(iter,iters)))