python 2.7 - ReshapeError while trying to pivot pandas dataframe -
using pandas 0.11 on python 2.7.3 trying pivot simple dataframe following values:
studentid questionid answer daterecorded 0 1234 bar 2012/01/21 1 1234 foo c 2012/01/22 2 4321 bop 2012/01/22 3 5678 bar 2012/01/24 4 8765 baz b 2012/02/13 5 4321 baz b 2012/02/15 6 8765 bop b 2012/02/16 7 5678 bop c 2012/03/15 8 5678 foo 2012/04/01 9 1234 baz b 2012/04/11 10 8765 bar 2012/05/03 11 4321 bar 2012/05/04 12 5678 baz c 2012/06/01 13 1234 bar b 2012/11/01
i using following command:
df.pivot(index='studentid', columns='questionid')
but getting following error:
reshapeerror: index contains duplicate entries, cannot reshape
note same dataframe without last line
13 1234 bar b 2012/11/01
the pivot results in following:
answer daterecorded questionid bar baz bop foo bar baz bop foo studentid 1234 b nan c 2012/01/21 2012/04/11 nan 2012/01/22 4321 b nan 2012/05/04 2012/02/15 2012/01/22 nan 5678 c c 2012/01/24 2012/06/01 2012/03/15 2012/04/01 8765 b b nan 2012/05/03 2012/02/13 2012/02/16 nan
i new pivoting , know why having duplicate studentid, questionid pair causing problem? and, how can fix using df.pivot() function?
thank you.
what expect pivot table duplicate entries? i'm not sure make sense have multiple elements (1234, bar) in pivot table. data looks it's naturally indexed (questionid, studentid, daterecorded).
if go hierarchical index approach (they're not complicated!) i'd try:
in [104]: df2 = df.set_index(['studentid', 'questionid', 'daterecorded']) in [105]: df2 out[105]: answer studentid questionid daterecorded 1234 bar 2012/01/21 foo 2012/01/22 c 4321 bop 2012/01/22 5678 bar 2012/01/24 8765 baz 2012/02/13 b 4321 baz 2012/02/15 b 8765 bop 2012/02/16 b 5678 bop 2012/03/15 c foo 2012/04/01 1234 baz 2012/04/11 b 8765 bar 2012/05/03 4321 bar 2012/05/04 5678 baz 2012/06/01 c 1234 bar 2012/11/01 b in [106]: df2.unstack('questionid') out[106]: answer questionid bar baz bop foo studentid daterecorded 1234 2012/01/21 nan nan nan 2012/01/22 nan nan nan c 2012/04/11 nan b nan nan 2012/11/01 b nan nan nan 4321 2012/01/22 nan nan nan 2012/02/15 nan b nan nan 2012/05/04 nan nan nan 5678 2012/01/24 nan nan nan 2012/03/15 nan nan c nan 2012/04/01 nan nan nan 2012/06/01 nan c nan nan 8765 2012/02/13 nan b nan nan 2012/02/16 nan nan b nan 2012/05/03 nan nan nan
otherwise can come rule determine of multiple entries take pivot table, , avoid hierarchical index.
Comments
Post a Comment