10.5.15

Hadoop常用命令总结

参考 Hadoop: The Definitive Guide 第三版找到的命令总结:

1. 结合HADOOP_CLASSPATH运行Hadoop


Chapter2. MapReduce /Analyzing the Data with Hadoop/Java MapReduce/A test run
P25

% export HADOOP_CLASSPATH=hadoop-examples.jar
% hadoop MaxTemperature input/ncdc/sample.txt output

要成功运行这个命令,除了要编译相关的程序外,还要建立一个standalone的hadoop环境。所谓Standalone环境的定义就是:

Standalone (or local) mode

There are no daemons running and everything runs in a single JVM. Standalone mode is suitable for running MapReduce programs during development, since it is easy to test and debug them.

另外:
In standalone mode, there is no further action to take, since the default properties are set for standalone mode and there are no daemons to run.

基于这个定义,用docker来运行就最合适了。

1. 启动一个有Java7的docker
2. 下载hadoop
3. 设置HADOOP_HOME和hadoop执行路径

然后就可以了。

2. 查看运行结果


% cat output/part-r-00000
1949 111
1950 22

3. HDFS命令

hadoop fs就是用来操作HDFS文件系统,而且只需要一个standalone的hadoop安装就可以进行操作,也就是说standalone的hadoop可以作为hadoop client。

如果没有任何配置,访问hdfs的时候可以采用全路径,也就是

hdfs://<hostname>/....

但也可以设置缺省路径。在$HADOOP/etc/hadoop/core-site.xml中加入:

  <property>
    <name>fs.default.name</name>
    <value>hdfs://<name>:8020</value>
    <description>NameNode host</description>
  </property>

这个配置可以简单的理解为:file system's default name node url

也可以特别设置一个配置,然后用 -conf <config file> 的方式在运行进指定配置。

3.1 建立文件夹:
-mkdir

建立文件夹最容易遇到的问题就是permission denied。原因是hdfs有一个很奇怪的user策略:直接到本机的用户名映射成hdfs的用户名,而super user的用户名是hdfs,所以可以这样做:

#usercreate hdfs
#sudo -u hdfs -mkdir hdfs://<host>/blahblah...
#sudo -u hdfs -chown <yourname> hdfs://<host>/blahblah...

-put
-ls
-copyFromLocal
-copyToLocal
-text
-cat
-rm
-rmdir
-du
-df
-cp
-get
-moveFromLocal
-moveToLocal
-mv
-tail
-touchz


% hadoop fs -put max_temperature bin/max_temperature
% hadoop fs -put input/ncdc/sample.txt sample.txt
% hadoop pipes \
% hadoop fsck / -files -blocks
% hadoop fs -copyFromLocal input/docs/quangle.txt hdfs://localhost/user/tom/
% hadoop fs -copyFromLocal input/docs/quangle.txt /user/tom/quangle.txt
% hadoop fs -copyFromLocal input/docs/quangle.txt quangle.txt
% hadoop fs -copyToLocal quangle.txt quangle.copy.txt
% md5 input/docs/quangle.txt quangle.copy.txt

% hadoop fs -mkdir books
% hadoop fs -ls .
three to two, which reduces disk usage by 25% to 30%
% hadoop fs -ls file:///

% hadoop URLCat hdfs://localhost/user/tom/quangle.txt
% hadoop FileSystemCat hdfs://localhost/user/tom/quangle.txt
% hadoop FileSystemDoubleCat hdfs://localhost/user/tom/quangle.txt
% hadoop FileCopyWithProgress input/docs/1400-8.txt hdfs://localhost/user/tom/
% hadoop ListStatus hdfs://localhost/ hdfs://localhost/user/tom
% hadoop distcp hdfs://namenode1/foo hdfs://namenode2/bar
% hadoop distcp -update hdfs://namenode1/foo hdfs://namenode2/bar/foo
% hadoop distcp hftp://namenode1:50070/foo hdfs://namenode2/bar
% hadoop distcp webhdfs://namenode1:50070/foo webhdfs://namenode2:50070/bar
% hadoop fs -lsr /my/files
% hadoop archive -archiveName files.har /my/files /my
% hadoop fs -ls /my
% hadoop fs -ls /my/files.har
% hadoop fs -lsr har:///my/files.har
% hadoop fs -lsr har:///my/files.har/my/files/dir
% hadoop fs -lsr har://hdfs-localhost:8020/my/files.har/my/files/dir
% hadoop fs -rmr /my/files.har
% echo "Text" | hadoop StreamCompressor org.apache.hadoop.io.compress.GzipCodec \
% hadoop FileDecompressor file.gz
% hadoop MaxTemperatureWithCompression input/ncdc/sample.txt.gz output

% hadoop fs -text numbers.seq | head
% hadoop fs -text sorted/part-00000 | head
% hadoop fs -text numbers.map/data | head
% hadoop fs -text numbers.map/index
% hadoop fs -mv numbers.map/part-00000 numbers.map/data
% hadoop fs -conf conf/hadoop-localhost.xml -ls .




No comments: