1 minute read

this article covers how to setup Hadoop as Pseudo-distributed Operation

reference

Setup

  1. download hadoop-2.6.5 from this link
  2. transfer tar.gz file into VM
  3. mkdir /opt/bigdata
  4. tar -zxvf hadoopxxx -C /opt/bigdata
  5. vi /etc/profile

    export HADOOP_HOME=/opt/bigdata/hadoop-2.6.5
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
  6. cd $HADOOP/etc/hadoop
  7. vi hadoop-env.sh

    replace JAVA_HOME with the value you get from following command

    dirname $(dirname $(readlink -f $(which javac)))

  8. vi core-site.xml

    <property>
      <name>fs.defaultFS</name
      <value>hdfs://node01:9000</value>
    </property>
    

    setup the port for Name Node

  9. vi hdfs-site.xml

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/var/bigdata/hadoop/local/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/var/bigdata/hadoop/local/dfs/data</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node01:50090</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>/var/bigdata/hadoop/local/dfs/secondary</value>
    </property>
    

    setup Data Node, and the folder for Name Node and Data Node, since the dafault folder is under /tmp which can be deleted randomly.

  10. vi slaves

    localhost
    

Run

  1. hdfs namenode -format

    1. create all folders if not exist
    2. create an empty fsImage
    3. VERSION indicate cluster-id
    Image file /var/bigdata/hadoop/local/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
       
    
  2. cd /var/bigdata/hadoop/local/dfs

    only name folder get created

  3. cat /name/current/VERSION

    pay attention to clusterID

  4. start-dfs.sh
  5. jps

    [root@localhost dfs]# jps
    7760 NameNode
    8114 Jps
    8009 SecondaryNameNode
    7870 DataNode
    
  6. ls /var/bigdata/hadoop/local/dfs

    data  name  secondary
    
    1. each folder has VERSION too
  7. C:\Windows\System32\drivers\etc\hosts
  8. in Browser, open <ip of node01>:50070
  9. hdfs dfs -mkdir -p /user/root

    create a new folder in HDFS

  10. hdfs dfs -put <file_to_upload> <target folder in hdfs> upload a file onto hdfs
  11. cd /var/.../current/finalized/subdir0/subdir0 check the uploaded file in hdfs
  12. for i in `seq 100000`; do echo “hello hadoop $i” » data.txt; done
  13. hdfs dfs -D dfs.blocksize=1048576 -put test_data.txt

    default to upload the file to /user/root/ , using the 1M as block size