java - Hadoop Streaming Memory Usage -
i'm wondering memory used in following job:
- hadoop mapper/reducer heap size:
-xmx2g
streaming api:
- mapper:
/bin/cat
- reducer:
wc
- mapper:
input file 350mbyte file containg single line full of
a
's.
this simplified version of real problem we've encountered.
reading file hdfs , constructing text
-object should not amount more 700mb heap - assuming text
use 16-bit per character - i'm not sure imagine text
uses 8-bit.
so there these (worst-case) 700mb line. line should fit @ least 2x in heap i'm getting out of memory errors.
is possible bug in hadoop (e.g. unaccary copies) or don't understand required memory intensive steps?
would thankful further hints.
the memory given each child jvm running task can changed setting mapred.child.java.opts
property. default setting -xmx200m, gives each task 200 mb of memory.
when saying -
input file 350mbyte file containg single line full of a's.
i'm assuming file has single line of a's single endline delimiter.
if taken value in map(key, value) function, think, might have memory issues, since, task have can use 200mb , have record in memory of 350mb.
Comments
Post a Comment