python - Drop NaNs from a pandas dataFrame -
i dont understand how nan's being treated in pandas, happy explanation, because logic seems "broken" me.
i have csv file, im loading using read csv. have "comments" column in file, empty of times.
i've isolated column, , tried varies ways drop empty values. first, when im writing:
marked_results.comments i get:
0 vp 1 vp 2 vp 3 test 4 nan 5 nan .... the rest of column nan. pandas loading empty entries nans. great far. im trying drop entries. iv tried:
marked_results.comments.dropna() and recieved same column. nothing dropped. confused, i'd tried understand why nothing dropped, tried:
marked_results.comments==nan and recieved series of falses. nothing nans... confusing. tried:
marked_results.comments==nan and again, nothing falses. got little pissed there, , thought smarter. did:
in [71]: comments_values = marked_results.comments.unique() comments_values out[71]: array(['vp', 'test', nan], dtype=object) ah, gotya! ive tried:
marked_results.comments==comments_values[2] and surprisingly, still results falses!!! thing worked was:
marked_results.comments.isnull() which returnd desired outcome. can explaine has happend here??
you should use isnull , notnull test nan (these more robust using pandas dtypes numpy), see "values considered missing" in docs.
using series method dropna on column won't affect original dataframe, want:
in [11]: df out[11]: comments 0 vp 1 vp 2 vp 3 test 4 nan 5 nan in [12]: df.comments.dropna() out[12]: 0 vp 1 vp 2 vp 3 test name: comments, dtype: object the dropna dataframe method has subset argument (to drop rows have nans in specific columns):
in [13]: df.dropna(subset=['comments']) out[13]: comments 0 vp 1 vp 2 vp 3 test in [14]: df = df.dropna(subset=['comments'])
Comments
Post a Comment