arrays - Python -- Algorithm efficiency and stability -
i have done 2 algorithms , want check 1 of them more 'efficient' , uses less memory. first 1 creates numpy array , modifies array. second 1 creates python empty array , pushes values array. who's better? first program:
f = open('/users/marcortiz/documents/vlex/pylearn2/mlearning/classify/files/models/model_training.txt') lines = f.readlines() f.close() zeros = np.zeros((60343,4917)) l in lines: row = l.split(",") element in row: zeros[lines.index(l), row.index(element)] = element x = zeros[1,:] y = zeros[:,0] one_hot = np.ones((counter, 2))
the second one:
f = open('/users/marcortiz/documents/vlex/pylearn2/mlearning/classify/files/models/model_training.txt') lines = f.readlines() f.close() x = [] y = [] l in lines: row = l.split(",") x.append([float(elem) elem in row[1:]]) y.append(float(row[0])) x = np.array(x) y = np.array(y) one_hot = np.ones((counter, 2))
my theory first 1 slower uses less memory , it's more 'stable' while working large files. second 1 it's faster uses lot of memory , not stable while working large files (543mb, 70,000 lines)
thanks!
the problem both codes you're loading whole file in memory first using file.readlines()
, should iterate on file object directly 1 line @ time.
from itertools import izip #generator function def func(): open('filename.txt') f: line in f: row = map(float, l.split(",")) yield row[1:], row[0] x, y = izip(*func()) x = np.array(x) y = np.array(y) ...
i sure pure numpy solution going faster this.
Comments
Post a Comment