java - Hadoop Streaming Memory Usage -


i'm wondering memory used in following job:

  • hadoop mapper/reducer heap size: -xmx2g
  • streaming api:

    • mapper: /bin/cat
    • reducer: wc
  • input file 350mbyte file containg single line full of a's.

this simplified version of real problem we've encountered.

reading file hdfs , constructing text-object should not amount more 700mb heap - assuming text use 16-bit per character - i'm not sure imagine text uses 8-bit.

so there these (worst-case) 700mb line. line should fit @ least 2x in heap i'm getting out of memory errors.

is possible bug in hadoop (e.g. unaccary copies) or don't understand required memory intensive steps?

would thankful further hints.

the memory given each child jvm running task can changed setting mapred.child.java.opts property. default setting -xmx200m, gives each task 200 mb of memory.

when saying -

input file 350mbyte file containg single line full of a's.

i'm assuming file has single line of a's single endline delimiter.

if taken value in map(key, value) function, think, might have memory issues, since, task have can use 200mb , have record in memory of 350mb.


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -