r - Delete rows based on values in the columns and a thresh hold value -
i have table, start below:
sm_h1455 sm_h1456 sm_h1457 sm_h1461 sm_h1462 sm_h1463 ensg00000001617.7 0 0 0 0 0 0 ensg00000001626.9 0 0 0 0 0 0 ensg00000002587.5 10 0 6 2 0 2 ensg00000002726.15 8 14 0 2 16 2 ensg00000002745.8 6 2 2 0 0 4
i want delete rows in >= 80% of columns have value 0. have 6 cols here, if 5 or more of columns in row have 0, row needs deleted.
i have code:
data = data[!rowsums(data == 0), ]
but code delete rows long have 0, without taking account 80% thresh hold.
i think @hong ooi's answer incorrect in case. give result have asked for:
data <- data[rowsums(data==0)/ncol(data) < 0.8, ]
data==0
returns data frame filled true
if value @ location equal zero, otherwise false
. numerically, r treats true
as having value of 1 , false
having value of zero.
rowsums
adds numerical equivalents of true
, false
values each row in data frame returned data==0
. rowsums(data==0)
gives number of elements in each row in data
zero.
ncol
number of columns in original data object.
rowsums(data==0)/ncol(data)
therefore proportion of elements equal 0 in each row.
finally, can discard rows above proprtion not less 80% filtering (using [] notation).
update: @hong ooi's edit means answer correct now.
Comments
Post a Comment