- How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
- Write a custom FileInputFormat and override the method isSplitable to always return false.
- The isSplitable() method in your InputFormat is passed each filename; if it returns true then the file can be broken up and processed by multiple Mappers. If it returns false then the file is considered to be 'not splittable' - that is, the entire file must be processed by a single Mapper.
- To make sure jar files other than the one with the Driver Class gets distributed to all nodes in the cluster, the hadoop command should be:
- % hadoop jar job.jar MyDriver -libjar ex1.jar:ex2.jar
- Just to remember <hadoop jar job.jar MyDriver> is the main part.
12.5.15
Need to remember
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment