python - pandas dataframe aggregate - why does it return column names? -


here's simple data frame:

   acid  balance_1  custid  balance_2 0     1   0.082627       1        nan 1     2   0.397579       1   0.459942 2     3   0.201596       2   0.596573 3     4   0.616448       3   0.705697 4     5   0.844865       3   0.483279 5     6        nan       4   0.360260 

i have been trying play around aggregate function, after grouping custid.

groupby_obj = time_series.groupby(["custid"]) df = groupeby_obj.agg(set) 

this returns

                                             acid  \ custid                                               1       set([balance_1, balance_2, acid, custid])    2       set([balance_1, balance_2, acid, custid])    3       set([balance_1, balance_2, acid, custid])    4       set([balance_1, balance_2, acid, custid])                                             balance_1  \ custid                                               1       set([balance_1, balance_2, acid, custid])    2       set([balance_1, balance_2, acid, custid])    3       set([balance_1, balance_2, acid, custid])    4       set([balance_1, balance_2, acid, custid])                                             balance_2   custid                                              1       set([balance_1, balance_2, acid, custid])   2       set([balance_1, balance_2, acid, custid])   3       set([balance_1, balance_2, acid, custid])   4       set([balance_1, balance_2, acid, custid])   

instead of thinking might do:

        acid         balance_1                    balance_2 custid                             1       set([1,2])   set([0.082627, 0.397579])    set([nan, 0.459942])     etc other custids... 

why aggregate filling data frame set of column headers?

thanks, anne

here's frame

in [29]: df out[29]:     acid  balance_1  custid  balance_2 0     1   0.082627       1        nan 1     2   0.397579       1   0.459942 2     3   0.201596       2   0.596573 3     4   0.616448       3   0.705697 4     5   0.844865       3   0.483279 5     6        nan       4   0.360260 

here's groupings create

in [24]: df.groupby(['custid']).groups out[24]: {1: [0, 1], 2: [2], 3: [3, 4], 4: [5]} 

here's way see what's being passed function (its frame)

in [25]: df.iloc[[0,1]] out[25]:     acid  balance_1  custid  balance_2 0     1   0.082627       1        nan 1     2   0.397579       1   0.459942  in [26]: df.iloc[[2]] out[26]:     acid  balance_1  custid  balance_2 2     3   0.201596       2   0.596573 

and here's set operation on frame (you list of columns) not interesting/useful operation

in [27]: set(df.iloc[[2]]) out[27]: set(['balance_1', 'balance_2', 'acid', 'custid']) 

the point of agg aggregate passed frame series. operation should reduce inputs dimensionaility

in [28]: df.groupby(['custid']).agg(lambda x: x.sum()) out[28]:          acid  balance_1  balance_2 custid                             1          3   0.480206   0.459942 2          3   0.201596   0.596573 3          9   1.461313   1.188976 4          6        nan   0.360260 

what you trying accomplish?


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -