搭過Hadoop的人都知道,Hadoop的搭建過程非常的繁瑣,需要配置大量的環(huán)境,修改大量的配置文件,因此搭建一個可用的測試環(huán)境非常浪費時間。好在Docker的出現(xiàn),就是幫助我們解決這類問題,有了Docker我們可以快速搭建一個可用的Hadoop集群供測試使用。
本文使用Github上的一個Dockerfile來實現(xiàn),做了一些細微的修改來增強國內(nèi)使用的體驗。Github地址
直接clone github的repository,進入repository目錄:
以下內(nèi)容摘自README.md
Apache Hadoop 2.7.1 Docker image
Note: this is the master branch - for a particular Hadoop version always check the related branch
A few weeks ago we released an Apache Hadoop 2.3 Docker image - this quickly become the most popular Hadoop image in the Docker registry.
Following the success of our previous Hadoop Docker images, the feedback and feature requests we received, we aligned with the Hadoop release cycle, so we have released an Apache Hadoop 2.7.1 Docker image - same as the previous version, it's available as a trusted and automated build on the official Docker registry.
FYI: All the former Hadoop releases (2.3, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2, 2.6.0) are available in the GitHub branches or our Docker Registry - check the tags.
適合國內(nèi)使用的修改
這個版本修改Dockerfile時區(qū)為中國區(qū)??紤]到中國網(wǎng)絡(luò)下載下列文件會非常的慢,所以把所有文件全部改為自行提供,而不是通過curl的方式調(diào)用,因此需要提供幾個文件在當前目錄下:
可以分別另尋渠道自行下載
添加docker-compose.yml文件,添加logs映射,快速啟動
Build the image
If you'd like to try directly from the Dockerfile you can build the image as:
docker build -t sequenceiq/hadoop-docker:2.7.1 .
Pull the image
The image is also released as an official Docker image from Docker's automated build repository - you can always pull or refer the image when launching containers.
docker pull sequenceiq/hadoop-docker:2.7.1
通過docker-compose啟動
docker-compose up -d
測試環(huán)境可用
使用
docker exec -it 容器名稱 bash
進入容器終端
執(zhí)行下面的命令:
cd $HADOOP_PREFIX
# run the mapreduce
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'
# check the output
bin/hdfs dfs -cat output/*
Hadoop native libraries, build, Bintray, etc
The Hadoop build process is no easy task - requires lots of libraries and their right version, protobuf, etc and takes some time - we have simplified all these, made the build and released a 64b version of Hadoop nativelibs on this Bintray repo. Enjoy.
Automate everything
As we have mentioned previousely, a Docker file was created and released in the official Docker repository
結(jié)尾
最后提供幾個Hadoop的常用web url:
- 查看集群狀態(tài):http://server:8088/cluster
- 瀏覽HDFS文件:http://server:50070/explorer.html