java - Hadoop Streaming Memory Usage -

- February 15, 2014

i'm wondering memory used in following job:

hadoop mapper/reducer heap size: -xmx2g
streaming api:
- mapper: /bin/cat
- reducer: wc
input file 350mbyte file containg single line full of a's.

this simplified version of real problem we've encountered.

reading file hdfs , constructing text-object should not amount more 700mb heap - assuming text use 16-bit per character - i'm not sure imagine text uses 8-bit.

so there these (worst-case) 700mb line. line should fit @ least 2x in heap i'm getting out of memory errors.

is possible bug in hadoop (e.g. unaccary copies) or don't understand required memory intensive steps?

would thankful further hints.

the memory given each child jvm running task can changed setting mapred.child.java.opts property. default setting -xmx200m, gives each task 200 mb of memory.

when saying -

input file 350mbyte file containg single line full of a's.

i'm assuming file has single line of a's single endline delimiter.

if taken value in map(key, value) function, think, might have memory issues, since, task have can use 200mb , have record in memory of 350mb.

Search This Blog

Search

java - Hadoop Streaming Memory Usage -

Comments

Post a Comment

Popular posts from this blog

c++ - Creating new partition disk winapi -

VBA function to include CDATA -

php - Warning: file_get_contents() expects parameter 1 to be a valid path, array given 16 -