eBooks

Data Analytics with Hadoop Zaloni_Preview Edition.pdf?hsCtaTracking=8254b676-4354-4c49-97ac-f08e23de7c8a%7Cb8c38c3c-35d1-40c5-b4e9-ab2a4b497a60&__hstc=111218075.a71d874649e61a8f39ac37304909af70.143958

Issue link: https://resources.zaloni.com/i/790569

Contents of this Issue

Navigation

Page 50 of 50

be a _SUCCESS file as well as a _logs directory that store information about the job. In order to read the result of the job, cat the part file from the remote file system and pipe it to less: hostname $ hadoop fs –cat wordcounts/part-00000 | less If something goes wrong in your MapReduce job, you'll need to be able to stop it (consider if you've accidentally added an infinite loop or some memory intensive process to your Mapper or Reducer!). However, typing Ctrl+C (issuing a keyboard interrupt on Unix) will only kill the process displaying the progress, it won't actually stop the job! The hadoop job command will allow you to manage currently running jobs on the cluster. List all running jobs with the -list command: hostname $ hadoop job -list Use the output to identify the job id of the job you'd like to terminate, then kill the job issuing the -kill command: hostname $ hadoop job -kill $JOBID Similar to the NameNode web interface, the ResourceManager also exposes a web interface to view the status of jobs and their log files. the ResourceManager web UI can be accessed via port 8088 of the machine hosting the ResourceManager service. This web UI displays all currently running jobs as well as the status of the NodeMan‐ agers across the cluster. The ResourceManager does not track a historical record of jobs, however - instead use the Job History server, which can be accessed on port 19888 of the machine hosting the Job History service. Conclusion This chapter has presented a lot of detail about the architecture of a Hadoop cluster and briefly touched into many points about the requirements and implementation of a large scale distributed computation system. However, we do not claim that we cov‐ ered everything, simply enough to contextualize the concepts in this book. Addition‐ ally, by going over the conceptual details of MapReduce in the manner we did, we hope to present the foundation of algorithmic development that we will leverage later in the book, but similarly did not do an in-depth treatment of how to write more complex analytical jobs. However, in the next chapter, we will take a specific look at how to write MapReduce jobs in Python using Hadoop Streaming. 48 | Chapter 2: An Operating System for Big Data

Articles in this issue

view archives of eBooks - Data Analytics with Hadoop Zaloni_Preview Edition.pdf?hsCtaTracking=8254b676-4354-4c49-97ac-f08e23de7c8a%7Cb8c38c3c-35d1-40c5-b4e9-ab2a4b497a60&__hstc=111218075.a71d874649e61a8f39ac37304909af70.143958