mapreduce - XML data via API to Land in Hadoop -


we receiving huge amounts of xml data via api. in-order handle large data set, planning in hadoop.

needed in understanding how efficiently bring data hadoop. tools available ? there possibility of bringing data real-time ?

please provide inputs.

thanks help.

since receiving huge amounts of data, appropriate way, imho, use aggregation tool flume. flume distributed, reliable, , available service efficiently collecting, aggregating, , moving large amounts of data hadoop cluster different types of sources.

you can write custom sources based on needs collect data. might fins link helpful started. presents custom flume source designed connect twitter streaming api , ingest tweets in raw json format hdfs. try similar xml data.

you might wanna have @ apache chukwa same thing.

hth


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -