python 2.7 - ReshapeError while trying to pivot pandas dataframe -


using pandas 0.11 on python 2.7.3 trying pivot simple dataframe following values:

    studentid questionid answer daterecorded 0        1234        bar        2012/01/21 1        1234        foo      c   2012/01/22 2        4321        bop        2012/01/22 3        5678        bar        2012/01/24 4        8765        baz      b   2012/02/13 5        4321        baz      b   2012/02/15 6        8765        bop      b   2012/02/16 7        5678        bop      c   2012/03/15 8        5678        foo        2012/04/01 9        1234        baz      b   2012/04/11 10       8765        bar        2012/05/03 11       4321        bar        2012/05/04 12       5678        baz      c   2012/06/01 13       1234        bar      b   2012/11/01 

i using following command:

 df.pivot(index='studentid', columns='questionid') 

but getting following error:

reshapeerror: index contains duplicate entries, cannot reshape 

note same dataframe without last line

13       1234        bar      b   2012/11/01 

the pivot results in following:

           answer               daterecorded                                     questionid    bar baz  bop  foo          bar         baz         bop         foo studentid                                                                        1234              b  nan    c   2012/01/21  2012/04/11         nan  2012/01/22 4321              b     nan   2012/05/04  2012/02/15  2012/01/22         nan 5678              c    c      2012/01/24  2012/06/01  2012/03/15  2012/04/01 8765              b    b  nan   2012/05/03  2012/02/13  2012/02/16         nan 

i new pivoting , know why having duplicate studentid, questionid pair causing problem? and, how can fix using df.pivot() function?

thank you.

what expect pivot table duplicate entries? i'm not sure make sense have multiple elements (1234, bar) in pivot table. data looks it's naturally indexed (questionid, studentid, daterecorded).

if go hierarchical index approach (they're not complicated!) i'd try:

in [104]: df2 = df.set_index(['studentid', 'questionid', 'daterecorded'])  in [105]: df2 out[105]:                                    answer studentid questionid daterecorded        1234      bar        2012/01/21                  foo        2012/01/22        c 4321      bop        2012/01/22        5678      bar        2012/01/24        8765      baz        2012/02/13        b 4321      baz        2012/02/15        b 8765      bop        2012/02/16        b 5678      bop        2012/03/15        c           foo        2012/04/01        1234      baz        2012/04/11        b 8765      bar        2012/05/03        4321      bar        2012/05/04        5678      baz        2012/06/01        c 1234      bar        2012/11/01        b  in [106]: df2.unstack('questionid') out[106]:                         answer                questionid                bar  baz  bop  foo studentid daterecorded                       1234      2012/01/21         nan  nan  nan           2012/01/22      nan  nan  nan    c           2012/04/11      nan    b  nan  nan           2012/11/01        b  nan  nan  nan 4321      2012/01/22      nan  nan     nan           2012/02/15      nan    b  nan  nan           2012/05/04         nan  nan  nan 5678      2012/01/24         nan  nan  nan           2012/03/15      nan  nan    c  nan           2012/04/01      nan  nan  nan              2012/06/01      nan    c  nan  nan 8765      2012/02/13      nan    b  nan  nan           2012/02/16      nan  nan    b  nan           2012/05/03         nan  nan  nan 

otherwise can come rule determine of multiple entries take pivot table, , avoid hierarchical index.


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -