python - Drop NaNs from a pandas dataFrame -


i dont understand how nan's being treated in pandas, happy explanation, because logic seems "broken" me.

i have csv file, im loading using read csv. have "comments" column in file, empty of times.

i've isolated column, , tried varies ways drop empty values. first, when im writing:

marked_results.comments 

i get:

0       vp 1       vp 2       vp 3     test 4      nan 5      nan .... 

the rest of column nan. pandas loading empty entries nans. great far. im trying drop entries. iv tried:

marked_results.comments.dropna() 

and recieved same column. nothing dropped. confused, i'd tried understand why nothing dropped, tried:

marked_results.comments==nan 

and recieved series of falses. nothing nans... confusing. tried:

marked_results.comments==nan 

and again, nothing falses. got little pissed there, , thought smarter. did:

in [71]: comments_values = marked_results.comments.unique() comments_values  out[71]: array(['vp', 'test', nan], dtype=object) 

ah, gotya! ive tried:

marked_results.comments==comments_values[2] 

and surprisingly, still results falses!!! thing worked was:

marked_results.comments.isnull() 

which returnd desired outcome. can explaine has happend here??

you should use isnull , notnull test nan (these more robust using pandas dtypes numpy), see "values considered missing" in docs.

using series method dropna on column won't affect original dataframe, want:

in [11]: df out[11]:   comments 0       vp 1       vp 2       vp 3     test 4      nan 5      nan  in [12]: df.comments.dropna() out[12]: 0      vp 1      vp 2      vp 3    test name: comments, dtype: object 

the dropna dataframe method has subset argument (to drop rows have nans in specific columns):

in [13]: df.dropna(subset=['comments']) out[13]:   comments 0       vp 1       vp 2       vp 3     test  in [14]: df = df.dropna(subset=['comments']) 

Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -