python - Drop NaNs from a pandas dataFrame -
i dont understand how nan's being treated in pandas, happy explanation, because logic seems "broken" me.
i have csv file, im loading using read csv. have "comments" column in file, empty of times.
i've isolated column, , tried varies ways drop empty values. first, when im writing:
marked_results.comments
i get:
0 vp 1 vp 2 vp 3 test 4 nan 5 nan ....
the rest of column nan. pandas loading empty entries nans. great far. im trying drop entries. iv tried:
marked_results.comments.dropna()
and recieved same column. nothing dropped. confused, i'd tried understand why nothing dropped, tried:
marked_results.comments==nan
and recieved series of falses. nothing nans... confusing. tried:
marked_results.comments==nan
and again, nothing falses. got little pissed there, , thought smarter. did:
in [71]: comments_values = marked_results.comments.unique() comments_values out[71]: array(['vp', 'test', nan], dtype=object)
ah, gotya! ive tried:
marked_results.comments==comments_values[2]
and surprisingly, still results falses!!! thing worked was:
marked_results.comments.isnull()
which returnd desired outcome. can explaine has happend here??
you should use isnull
, notnull
test nan (these more robust using pandas dtypes numpy), see "values considered missing" in docs.
using series method dropna
on column won't affect original dataframe, want:
in [11]: df out[11]: comments 0 vp 1 vp 2 vp 3 test 4 nan 5 nan in [12]: df.comments.dropna() out[12]: 0 vp 1 vp 2 vp 3 test name: comments, dtype: object
the dropna
dataframe method has subset argument (to drop rows have nans in specific columns):
in [13]: df.dropna(subset=['comments']) out[13]: comments 0 vp 1 vp 2 vp 3 test in [14]: df = df.dropna(subset=['comments'])
Comments
Post a Comment