hadoop - Does combiner runs conditionally -
min.num.spills.for.combine (default 3)
what signify?
a) min no. of map spills have combiner run? though have specified combiner, not guaranteed run?
b) min no. of spills have before combiner runs on merged/sorted single file created via io.sort.factor. each time new file created merging, combiner runs onto it, provided no. of spills min 3
i feel correct answer a) , can confirm that.
when map function generate intermediate result , first sent them buffer, partitioning , sorting start , , if combiner specified, invoked @ time. process in parallel map function. when map function finishes, spills on disk merged, , combiners invoked @ time too.
the buffer threshold limited io.sort.spill.percent
, during spills created. if number of spills more min.num.spills.for.combine
, combiner gets invoked on spills created before writing disk.
so answer question: right choice a) .
ref : this mail thread.
Comments
Post a Comment