Pandas merge and join not working -


i have problem merging 2 dataframes

i'm processing list of 10 dataframe pairs, created same sql database , csv files.

on pairs merge(df1, df2) working correctly df1.join(df2) not. example, thesea subsets 1 of pairs

>>>          mod:user studentid 2010453   3891583   2010453 2112086    890910   2112086 2222220    201611   2222220 2346979      7084   2346979 2414996   1817436   2414996 2420317     52821   2420317 2438767    884012   2438767 2451924  20815145   2451924 2515531   2115829   2515531 2536751    494565   2536751 2549050    315295   2549050 2549530         0   2549530 2551532    544968   2551532 2551542       213   2551542 2610206   1257038   2610206 2624429    939670   2624429 2630017         6   2630017 2633815    190564   2633815 2633857   1147211   2633857 2634405   1093092   2634405 2641370   2038012   2641370 2644284    658743   2644284 2649427    220230   2649427 2712372      9468   2712372 2714617   1231577   2714617 2718450   3907345   2718450 2732910         0   2732910 2739711    396876   2739711 8200703       166   8200703 9906492    920875   9906492 oscarl        505    oscarl >>> b          assignment:5 studentid 2010453            70   2010453 2112086            82   2112086 2222220            76   2222220 2346979           nan   2346979 2414996            88   2414996 2438767            50   2438767 2451924           100   2451924 2515531            50   2515531 2536751           100   2536751 2538371            94   2538371 2549050           100   2549050 2551532           100   2551532 2610206            50   2610206 2624429           100   2624429 2630017           nan   2630017 2634405           100   2634405 2641370           100   2641370 2644284           100   2644284 2712372           100   2712372 2714617            69   2714617 2718450           100   2718450 2739711           100   2739711 9906492           100   9906492 >>> pd.merge(a, b, left_on="studentid", right_on="studentid", how="inner")     mod:user studentid  assignment:5 0    3891583   2010453            70 1     890910   2112086            82 2     201611   2222220            76 3       7084   2346979           nan 4    1817436   2414996            88 5     884012   2438767            50 6   20815145   2451924           100 7    2115829   2515531            50 8     494565   2536751           100 9     315295   2549050           100 10    544968   2551532           100 11   1257038   2610206            50 12    939670   2624429           100 13         6   2630017           nan 14   1093092   2634405           100 15   2038012   2641370           100 16    658743   2644284           100 17      9468   2712372           100 18   1231577   2714617            69 19   3907345   2718450           100 20    396876   2739711           100 21    920875   9906492           100 >>> a.join(b, on="studentid", rsuffix="r", how="inner") empty dataframe columns: [mod:user, studentid, assignment:5, studentidr] index: [] >>>  

now, make things strange, on other pair of dataframes merge(df1, df2) not woking df1.join(df2) working.

>>>          mod:user  studentid 2115728   1177712    2115728 2341322    142805    2341322 2447383   1642046    2447383 2510156       141    2510156 2512053    570889    2512053 2527456  12262284    2527456 2529917  11826381    2529917 2533588    183665    2533588 2535922    107131    2535922 2535991    542259    2535991 2543095  11614678    2543095 2548984       225    2548984 2549565   2059072    2549565 2632847  25408938    2632847 2634371    129605    2634371 2714666    755975    2714666 8307654     74576    8307654 >>> b          assignment:5 studentid 2115728         86.67   2115728 2341322         86.67   2341322 2447383         80.00   2447383 2512053         93.33   2512053 2527456         93.33   2527456 2529917         86.67   2529917 2533588         86.67   2533588 2535922         86.67   2535922 2535991         86.67   2535991 2543095        100.00   2543095 2548984        100.00   2548984 2549565         86.67   2549565 2632847        100.00   2632847 2634371         73.33   2634371 2714666         80.00   2714666 8307654         86.67   8307654 >>> pd.merge(a, b, left_on="studentid", right_on="studentid", how="inner") empty dataframe columns: [mod:user, studentid, assignment:5] index: [] >>> a.join(b, on="studentid", rsuffix="r", how="inner")          mod:user  studentid  assignment:5 studentidr 2115728   1177712    2115728         86.67    2115728 2341322    142805    2341322         86.67    2341322 2447383   1642046    2447383         80.00    2447383 2512053    570889    2512053         93.33    2512053 2527456  12262284    2527456         93.33    2527456 2529917  11826381    2529917         86.67    2529917 2533588    183665    2533588         86.67    2533588 2535922    107131    2535922         86.67    2535922 2535991    542259    2535991         86.67    2535991 2543095  11614678    2543095        100.00    2543095 2548984       225    2548984        100.00    2548984 2549565   2059072    2549565         86.67    2549565 2632847  25408938    2632847        100.00    2632847 2634371    129605    2634371         73.33    2634371 2714666    755975    2714666         80.00    2714666 8307654     74576    8307654         86.67    8307654 >>>  

i don't have clue happening , function use

thanks! found issue..it automatic data conversion when reading data frame.

as can see, in first example put string 'oscarl' studentid , resulted in whole columnt being threated string while in second example there no records this, converted int.

i found running suggested solution gave me errror pointed me right direction

pd.concat([a, b], axis=1) traceback (most recent call last):   file "<stdin>", line 1, in <module> ... ... exception: ('cannot have duplicate column names split across dtypes', 'occurred @ index assignment:5') 

Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -