Saturday, October 9, 2010

What's the best way to interleave two Python lists?

[NOTE:  I wrote this in January 2009, but didn't publish it.  Originally, I planned to provide a short discussion about each of the potential solutions listed below, which I never got around to doing.  Anyway I just noticed my draft and decided to go ahead and publish it without adding any more discussion. The code snippets seem fairly self explanatory.  If anyone has any comments on the various solutions, I would be very interested in hearing them.]

Until early 2009, I had to add the following site.cfg file to build numpy or scipy on my 64-bit Fedora Linux box:
library_dirs = /usr/lib64
To make numpy aware of the default location required me to add /usr/lib64 to default_lib_dirs (which I will refer to as lib_dirs for brevity) in numpy/distutils/

Where do 64-bit libraries belong?

The lib64 directory is the default location for 64-bit libraries on Red Hat-based system. Unfortunately, not all Linux distributions conform to this convention; but, fortunately, most distributions that don't use lib64 as the default location for 64-bit libraries at least create a lib64 symlink pointing to whatever their default location happens to be. So it appears I can assume that if I am on a 64-bit machine, looking in lib64 before lib should work in most cases.

Since I only wanted to add the lib64 path on 64-bit machines, I changed the assignment to:
lib_dirs = libpaths(['/usr/lib'], platform_bits)
where libpaths returns ['/usr/lib'] when platform_bits is 32 and ['/usr/lib64', '/usr/lib'] when it is 64. I used the platform module to set platform_bits:
# Determine number of bits
import platform
_bits = {'32bit':32,'64bit':64}
platform_bits = _bits[platform.architecture()[0]]

An outline of the solution

So far everything has been pretty straight-forward. Now all that is left is to write libpaths.
def libpaths(paths, bits): """Return a list of library paths valid on 32 or 64 bit systems. Parameters ---------- paths : sequence A sequence of strings (typically paths) bits : int An integer, the only valid values are 32 or 64. Examples -------- >>> paths = ['/usr/lib'] >>> libpaths(paths,32) ['/usr/lib'] >>> libpaths(paths,64) ['/usr/lib64', '/usr/lib'] """ if bits not in (32, 64): raise ValueError # Handle 32bit case if bits==32: return paths # Handle 64bit case return ????

How to skin the cat?

So we finally arrive at the motivation for this post. At this point, I started thinking that if I had two equal-sized lists that there should be a simple function for interleaving the elements of the two lists to make a new list. Something like zip. But zip returns a list of tuples. After discussing this with several people (Fernando Pérez, Brian Hawthorne, and Stéfan van der Walt), we came up with several different solutions.

  • Solution 1:

from itertools import cycle paths64 = (p+'64' for p in paths) return list(( for x in cycle([iter(paths),paths64])))

  • Solution 2:

def _(): for path in paths: yield path yield path+'64' return list(_())

  • Solution 3:

out = [None]*(2*len(paths)) out[::2] = paths out[1::2] = (p+'64' for p in paths) return out

  • Solution 4:

out = [] for p in paths: out.append(p) out.append(p+'64') return out

  • Solution 5:

out = [] for p in paths: out.extend([p, p+'64']) return out

  • Solution 6:

return [item for items in zip(paths, (p+'64' for p in paths)) for item in items]

  • Solution 7:

from operator import concat return reduce(concat, ([p, p+'64'] for p in paths))
I liked Solution 5 the best and it is what I used.

An itertools recipe

While we were looking for a solution, Fernando and I came up with the following recipe:

from itertools import cycle,imap def fromeach(*iters): """Take elements one at a time from each iterable, cycling them all. It returns a single iterable that stops whenever any of its arguments is exhausted. Note: it differs from roundrobin in the itertools recipes, in that roundrobin continues until all of its arguments are exhausted (for this reason roundrobin also needs more complex logic and thus has more overhead). Examples: >>> list(fromeach([1,2],[3,4])) [1, 3, 2, 4] >>> list(fromeach('ABC', 'D', 'EF')) ['A', 'D', 'E', 'B'] """ return ( for x in cycle(imap(iter,iters)))


Lucas Wiman said...

I also like solutions 5 as a general solution, though stylistically, I prefer "map(out.extend, ([p, p + '64'] for p in paths))" to the for loop. I think yours is the more pythonic solution (and might even be more efficient). I like to use map if all a for loop does is call a function on each element of an iterable, since I think it's a bit clearer and it saves a line of code.

Another solution (a modification of solution 7) is "return sum(([p, p + '64'] for p in paths), [])", which saves an import. I'd probably go with that since it's shorter, and efficiency only matters if the lists are extremely long or this function gets called a lot, neither of which is true in your application.

Mike Müller said...

A variation of solution 6:
[item for items in ([p, p + '64'] for p in paths) for item in items]. No need for the zip just use a generator expression.

Lucas Wiman said...

Good thing there's only one way to do it with Python... :-)

Joe Kilner said...

Here's another simple one (note that you are throwing away the result of the map statement).

1> A = [1,2,3,4,5]
2> B = [6,7,8,9,10]
3> out = []
4> map(out.extend,zip(A,B))
<4> [None, None, None, None, None]
5> out
<5> [1, 6, 2, 7, 3, 8, 4, 9, 5, 10]

Joe Kilner said...

Or using a bit from the previous comment (and relating it to your example):

out = []
map(out.extend,([p, p + '64'] for p in paths))

remosu said...

another variation of solution 7:

reduce(lambda out, p: out + [p, p+'64'], paths, list())

Unknown said...

I would use itertools.chain:

from itertools import chain

a = list('abcde')
b = list('12345')


['a', '1', 'b', '2', 'c', '3', 'd', '4', 'e', '5']

It can be extended to merge more than two lists:

c = list('XYZZY')

If the lists are long, or are iterables, or you want to merge the lists lazily, use iterttools.izip:

from itertools import chain, count, izip

for value in chain(*izip(count(),a+b+c)):
print value

yarikoptic said...

Since you liked zip, why not to like reduce (although iirc Guido doesn't like it ;-)), so for the actual interleave just use both functions you would like:

*In [1]: reduce(tuple.__add__, zip([1,2],[11,21]))
Out[2]: (1, 11, 2, 21)