python - pandas dataframe aggregate - why does it return column names? -
here's simple data frame:
acid balance_1 custid balance_2 0 1 0.082627 1 nan 1 2 0.397579 1 0.459942 2 3 0.201596 2 0.596573 3 4 0.616448 3 0.705697 4 5 0.844865 3 0.483279 5 6 nan 4 0.360260
i have been trying play around aggregate function, after grouping custid.
groupby_obj = time_series.groupby(["custid"]) df = groupeby_obj.agg(set)
this returns
acid \ custid 1 set([balance_1, balance_2, acid, custid]) 2 set([balance_1, balance_2, acid, custid]) 3 set([balance_1, balance_2, acid, custid]) 4 set([balance_1, balance_2, acid, custid]) balance_1 \ custid 1 set([balance_1, balance_2, acid, custid]) 2 set([balance_1, balance_2, acid, custid]) 3 set([balance_1, balance_2, acid, custid]) 4 set([balance_1, balance_2, acid, custid]) balance_2 custid 1 set([balance_1, balance_2, acid, custid]) 2 set([balance_1, balance_2, acid, custid]) 3 set([balance_1, balance_2, acid, custid]) 4 set([balance_1, balance_2, acid, custid])
instead of thinking might do:
acid balance_1 balance_2 custid 1 set([1,2]) set([0.082627, 0.397579]) set([nan, 0.459942]) etc other custids...
why aggregate filling data frame set of column headers?
thanks, anne
here's frame
in [29]: df out[29]: acid balance_1 custid balance_2 0 1 0.082627 1 nan 1 2 0.397579 1 0.459942 2 3 0.201596 2 0.596573 3 4 0.616448 3 0.705697 4 5 0.844865 3 0.483279 5 6 nan 4 0.360260
here's groupings create
in [24]: df.groupby(['custid']).groups out[24]: {1: [0, 1], 2: [2], 3: [3, 4], 4: [5]}
here's way see what's being passed function (its frame)
in [25]: df.iloc[[0,1]] out[25]: acid balance_1 custid balance_2 0 1 0.082627 1 nan 1 2 0.397579 1 0.459942 in [26]: df.iloc[[2]] out[26]: acid balance_1 custid balance_2 2 3 0.201596 2 0.596573
and here's set operation on frame (you list of columns) not interesting/useful operation
in [27]: set(df.iloc[[2]]) out[27]: set(['balance_1', 'balance_2', 'acid', 'custid'])
the point of agg aggregate passed frame series. operation should reduce inputs dimensionaility
in [28]: df.groupby(['custid']).agg(lambda x: x.sum()) out[28]: acid balance_1 balance_2 custid 1 3 0.480206 0.459942 2 3 0.201596 0.596573 3 9 1.461313 1.188976 4 6 nan 0.360260
what you trying accomplish?
Comments
Post a Comment