python - why is converting a long 2D list to numpy array so slow? -
i have long list of xy coordinates, , convert numpy array.
>>> import numpy np >>> xy = np.random.rand(1000000, 2).tolist()
the obvious way be:
>>> = np.array(xy) # slow...
however, above code unreasonably slow. interestingly, transpose long list first, convert numpy array, , transpose faster (20x on laptop).
>>> def longlist2array(longlist): ... wide = [[row[c] row in longlist] c in range(len(longlist[0]))] ... return np.array(wide).t >>> = longlist2array(xy) # 20x faster!
is bug of numpy?
edit:
this list of points (with xy coordinates) generated on-the-fly, instead of preallocating array , enlarging when necessary, or maintaining 2 1d lists x , y, think current representation natural.
why looping through 2nd index faster 1st index, given iterating through python list in both directions?
edit 2:
based on @tiago's answer , this question, found following code twice fast original version:
>>> itertools import chain >>> def longlist2array(longlist): ... flat = np.fromiter(chain.from_iterable(longlist), np.array(longlist[0][0]).dtype, -1) # without intermediate list:) ... return flat.reshape((len(longlist), -1))
implementing in cython without checking involved determine dimensionality, etc. eliminates time difference seeing. here's .pyx
file used verify that.
from numpy cimport ndarray ar import numpy np cimport cython @cython.boundscheck(false) @cython.wraparound(false) def toarr(xy): cdef int i, j, h=len(xy), w=len(xy[0]) cdef ar[double,ndim=2] new = np.empty((h,w)) in xrange(h): j in xrange(w): new[i,j] = xy[i][j] return new
i assume time spent in checking length , content of each sublist in order determine datatype, dimension, , size of desired array. when there 2 sublists, has check 2 lengths determine number of columns in array, instead of checking 1000000 of them.
Comments
Post a Comment