Happy Path: Setup HDFS-2.6.5 on Pseudo-Distributed Operation
this article covers how to setup Hadoop as Pseudo-distributed Operation
Setup
- download hadoop-2.6.5 from this link
- transfer tar.gz file into VM
mkdir /opt/bigdata
tar -zxvf hadoopxxx -C /opt/bigdata
-
vi /etc/profile
export HADOOP_HOME=/opt/bigdata/hadoop-2.6.5 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
cd $HADOOP/etc/hadoop
-
vi hadoop-env.sh
replace JAVA_HOME with the value you get from following command
dirname $(dirname $(readlink -f $(which javac)))
-
vi core-site.xml
<property> <name>fs.defaultFS</name <value>hdfs://node01:9000</value> </property>
setup the port for Name Node
-
vi hdfs-site.xml
<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/var/bigdata/hadoop/local/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/var/bigdata/hadoop/local/dfs/data</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>node01:50090</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/var/bigdata/hadoop/local/dfs/secondary</value> </property>
setup Data Node, and the folder for Name Node and Data Node, since the dafault folder is under /tmp which can be deleted randomly.
-
vi slaves
localhost
Run
-
hdfs namenode -format
- create all folders if not exist
- create an empty fsImage
- VERSION indicate cluster-id
Image file /var/bigdata/hadoop/local/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
-
cd /var/bigdata/hadoop/local/dfs
only name folder get created
-
cat /name/current/VERSION
pay attention to clusterID
start-dfs.sh
-
jps
[root@localhost dfs]# jps 7760 NameNode 8114 Jps 8009 SecondaryNameNode 7870 DataNode
-
ls /var/bigdata/hadoop/local/dfs
data name secondary
- each folder has VERSION too
- C:\Windows\System32\drivers\etc\hosts
- in Browser, open
<ip of node01>:50070
-
hdfs dfs -mkdir -p /user/root
create a new folder in HDFS
hdfs dfs -put <file_to_upload> <target folder in hdfs>
upload a file onto hdfscd /var/.../current/finalized/subdir0/subdir0
check the uploaded file in hdfs- for i in `seq 100000`; do echo “hello hadoop $i” » data.txt; done
-
hdfs dfs -D dfs.blocksize=1048576 -put test_data.txt
default to upload the file to /user/root/ , using the 1M as block size