python - why is converting a long 2D list to numpy array so slow? -

i have long list of xy coordinates, , convert numpy array.

>>> import numpy np >>> xy = np.random.rand(1000000, 2).tolist() 

the obvious way be:

>>> = np.array(xy) # slow... 

however, above code unreasonably slow. interestingly, transpose long list first, convert numpy array, , transpose faster (20x on laptop).

>>> def longlist2array(longlist): ...     wide = [[row[c] row in longlist] c in range(len(longlist[0]))] ...     return np.array(wide).t >>> = longlist2array(xy) # 20x faster! 

is bug of numpy?


this list of points (with xy coordinates) generated on-the-fly, instead of preallocating array , enlarging when necessary, or maintaining 2 1d lists x , y, think current representation natural.

why looping through 2nd index faster 1st index, given iterating through python list in both directions?

edit 2:

based on @tiago's answer , this question, found following code twice fast original version:

>>> itertools import chain >>> def longlist2array(longlist): ...     flat = np.fromiter(chain.from_iterable(longlist), np.array(longlist[0][0]).dtype, -1) # without intermediate list:) ...     return flat.reshape((len(longlist), -1)) 

implementing in cython without checking involved determine dimensionality, etc. eliminates time difference seeing. here's .pyx file used verify that.

from numpy cimport ndarray ar import numpy np cimport cython  @cython.boundscheck(false) @cython.wraparound(false) def toarr(xy):     cdef int i, j, h=len(xy), w=len(xy[0])     cdef ar[double,ndim=2] new = np.empty((h,w))     in xrange(h):         j in xrange(w):             new[i,j] = xy[i][j]     return new 

i assume time spent in checking length , content of each sublist in order determine datatype, dimension, , size of desired array. when there 2 sublists, has check 2 lengths determine number of columns in array, instead of checking 1000000 of them.


