发布于2021-07-25 06:37 阅读(616) 评论(0) 点赞(7) 收藏(0)
Hadoop3.2 集群是比较新稳定版本的搭建详细讲解过程,从下面第一张官方的图来看:
自己搭建的hadoop集群,顺便记录一下,害怕忘记。
本次操作为root,主要是方便操作,操作系统为:Centos7
1、准备两台主机192.168.0.233、192.168.0.234(资源有限,就用一台主机了)
2、对应的hosts为m1.example.com、m2.example.com
具体命令操作如下
vim /etc/hosts
192.168.0.233 m1.example.com
192.168.0.233 m2.example.com
3、主机免密钥登录
#主机操作
ssh-keygen -t rsa -C m1.example.com
#免登密码操作
cat ~/.ssh/id_rsa.pub >> root@m1.example.com
cat ~/.ssh/id_rsa.pub >> root@m2.example.com
1、下载JDK1.8
#创建目录
mkdir /soft
#上传下载好的安装包
jdk-8u201-linux-x64.tar. gz
#解压
tar -zxvf jdk-8u201-linux-x64.tar.gzmv jdk1.8.0_161 jdk1.8
2、配置Java环境
#配置Java环境
echo -e 'export JAVA_HOME=/soft/jdk1.8 export export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar \nexport PATH=$PATH:$JAVA_HONE/bin’ >> /etc/profile
3、环境变量生效 source /etc/profile
1、两台主机创建好hadoop的文件夹(目前我使用的一台主机,创建一份就可以了)
mkdir -p /hadoop/hdfs/data
mkdir -p /hadoop/hdfs/name
mkdir -p /hadoop/hdfs/tmp
2、环境配置 hadoop-3.2.0.tar.gz
#下载
wget http://地扯/hadoop-3.2.0.tar. gz
#解压
tar -zxvf hadoop-3.2.0.tar. gz
#更名
mv hadoop-3.2.0 hadoop
#配置Hadoop环境
echo -e 'export HADOOP_HOME=/soft/hadoop \nexport PATH=$PATH:$HADOOP_HOME/bin : $HADOOP_HOME/sbin' >> /etc/profile
3、参数配置
运行用户操作,添加如下几句到 /soft/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/soft/jdk1.8
export YARN__RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
4、环境变量生效 source /etc/profile
5、hadoop 配置文件
core-site.xml 配置
<property>
<name>hadoop.tmp.dir</name>
<value>file:/hadoop/hdfs/tmp</value></property>
<property>
<name>io.file.buffer.size</name><value>131072</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://m1.example.com:9000</value></property>
<property>
<name>hadoop.proxyuser.root.hosts</name><value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name><value>*</value>
</property>
hdfs-site.xml 配置
<property>
<name>dfs.replication</name><value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name><value>file:/hadoop/hdfs/name</value><final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name><value>file:/hadoop/hdfs/data</value><final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name><value>m2.example.com:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name><value>true</value>
</property>
<property>
<name>dfs.permissions</name><value>false</value>
</property>
yarn-site.xml 配置
<property>
<name>yarn.resourcemanager.hostname</name><value>m1.example.com</value>
</property>
<property>
<name>yarn.resourcemanager.address</name><value>m1.example.com:8050</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name><value>m1.example.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name><value>m1.example.com:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name><value>ml.example.com:8025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name><value>m1.example.com:8041</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
mapred-site.xml 配置
<property>
<name>mapreduce.framework.name</name><value>yarn</value>
</property>
6、格式化name文件夹 hadoop namenode -format
7、启动hadoop集群
cd /home/hadoop-3.2.0/
验证是否启动成功
8、验证:访问namenode主节点端口为9870
http://192.168.0.233:9870/dfshealth.html#tab-overview
9、访问yarn管理界面:访问 http://192.168.0.233:8088/cluster
10、写一个简单的python程序测试
# -*- coding: utf-8 -*-
import pandas as pd
import hdfs
_author_ = 'joe'
_date_ = '2021/07/22'
hdfs_user = 'root'
hdfs_addr = 'http://192.168.0.233:9870'
client = hdfs.InsecureClient(hdfs_addr, user=hdfs_user)
df = pd.read_csv("D:\\user_visit_action.csv")
print(df)
hdfs_path = '/test/a.csv'
client.write(hdfs_path, df.to_csv(index=False), overwrite=True, encoding='utf-8')
read = client.read(hdfs_path, encoding='utf-8')
with read as reader:
for row in reader:
print(row)
准备工作
运行python程序:
查看写入成功的数据
参考网站:
https://www.jianshu.com/p/3182aaff918d
https://blog.csdn.net/qq_17623363/article/details/102008227
https://blog.csdn.net/weixin_43042683/article/details/105397046
https://blog.csdn.net/daqiang012/article/details/104109578
https://www.jianshu.com/p/181b06293067
https://blog.csdn.net/qq_39314099/article/details/103681298
https://blog.csdn.net/qq_39314099/article/details/103681298?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EOPENSEARCH%7Edefault-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EOPENSEARCH%7Edefault-1.control
原文链接:https://blog.csdn.net/Joe192/article/details/118995804
作者:火腿快跑
链接:http://www.pythonpdf.com/blog/article/424/e048fbc593930c569426/
来源:编程知识网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!