Hive的安裝
Hive官網(wǎng)地址 http://hive.apache.org/
文檔查看地址 https://cwiki.apache.org/confluence/display/Hive/GettingStarted
下載地址 http://archive.apache.org/dist/hive/
github地址 https://github.com/apache/hive
解壓apache-hive-3.1.2-bin.tar.gz到/opt/module/目錄下面
修改apache-hive-3.1.2-bin.tar.gz的名稱為hive
修改/etc/profile.d/my_env.sh,添加環(huán)境變量
tar -zxvf /opt/software/apache-hive-3.1.2-bin.tar.gz -C /opt/module/
mv /opt/module/apache-hive-3.1.2-bin /opt/module/hive
sudo vim /etc/profile.d/my_env.sh
添加內(nèi)容:
#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
替換hive中的guava.jar
cp $HADOOP_HOME/share/hadoop/common/lib/guava-27.0-jre.jar $HIVE-HOME/lib/
rm guava-19.0.jar
解決日志Jar包沖突
mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.bak
將hive的元數(shù)據(jù)配置到MySQL中
拷貝驅(qū)動
cp /opt/software/mysql-connector-java-8.0.23.jar $HIVE_HOME/lib
在hive中創(chuàng)建spark配置文件
vim /opt/module/hive/conf/spark-defaults.conf
添加如下內(nèi)容(在執(zhí)行任務(wù)時,會根據(jù)如下參數(shù)執(zhí)行)。
spark.master yarn
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop102:8020/spark-history
spark.executor.memory 1g
spark.driver.memory 1g
在HDFS創(chuàng)建如下路徑,用于存儲歷史日志。
hadoop fs -mkdir /spark-history
向HDFS上傳Spark純凈版jar包
說明1:由于Spark3.0.0非純凈版默認支持的是hive2.3.7版本,直接使用會和安裝的Hive3.1.2出現(xiàn)兼容性問題。所以采用Spark純凈版jar包,不包含hadoop和hive相關(guān)依賴,避免沖突。
說明2:Hive任務(wù)最終由Spark來執(zhí)行,Spark任務(wù)資源分配由Yarn來調(diào)度,該任務(wù)有可能被分配到集群的任何一個節(jié)點。所以需要將Spark的依賴上傳到HDFS集群路徑,這樣集群中任何一個節(jié)點都能獲取到。
上傳并解壓spark-3.0.0-bin-without-hadoop.tgz
tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz
上傳Spark純凈版jar包到HDFS
hadoop fs -mkdir /spark-jars
hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars
配置Metastore到mysql
在$HIVE_HOME/conf目錄下新建hive-site.xml文件
vim $HIVE_HOME/conf/hive-site.xml
添加以下內(nèi)容
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.exec.parallel.thread.number</name>
<value>8</value>
</property>
<property>
<name>hive.spark.client.connect.timeout</name>
<value>1000000ms</value>
</property>
<property>
<name>hive.spark.client.server.connect.timeout</name>
<value>1000000000ms</value>
</property>
<property>
<name>hive.spark.client.future.timeout</name>
<value>1000000000ms</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop101:3306/metastore?useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<!--mysql的元數(shù)據(jù)倉庫在HDFS上什么位置 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop101:9083</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop101</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<!--Spark依賴位置(注意:端口號8020必須和namenode的端口號一致)-->
<property>
<name>spark.yarn.jars</name>
<value>hdfs://hadoop2:8020/spark-jars/*</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
</configuration>
Hive運行日志信息配置
cd /opt/module/hive/conf/
mv hive-log4j2.properties.template hive-log4j2.properties
vim hive-log4j2.properties
修改
hive.log.dir=/opt/module/hive/logs
啟動hive
在mysql新建hive元數(shù)據(jù)庫,create database metastore;
初始化hive元數(shù)據(jù)庫,bin/schematool -initSchema -dbType mysql -verbose
啟動metastore和hiveserver2
編寫hive服務(wù)啟動腳本,vim $HIVE_HOME/bin/hive-service.sh
添加以下內(nèi)容:
#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
META_PID=$HIVE_HOME/tmp/meta.pid
SERVER_PID=$HIVE_HOME/tmp/server.pid
mkdir -p $HIVE_HOME/tmp
mkdir -p $HIVE_LOG_DIR
function hive_start()
{
nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &
echo $! > $META_PID
sleep 8
nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveserver2.log 2>&1 &
echo $! > $SERVER_PID
}
function hive_stop()
{
if [ -f $META_PID ]
then
cat $META_PID | xargs kill -9
rm $META_PID
else
echo "Meta PID文件丟失,請手動關(guān)閉服務(wù)"
fi
if [ -f $SERVER_PID ]
then
cat $SERVER_PID | xargs kill -9
rm $SERVER_PID
else
echo "Server2 PID文件丟失,請手動關(guān)閉服務(wù)"
fi
}
case $1 in
"start")
hive_start
;;
"stop")
hive_stop
;;
"restart")
hive_stop
sleep 2
hive_start
;;
*)
echo Invalid Args!
echo 'Usage: '$(basename $0)' start|stop|restart'
;;
esac
因為hive的執(zhí)行引擎設(shè)置為spark,所以需要先啟動spark
/opt/module/spark-yarn/sbin/start-master.sh
如果spark有slave,執(zhí)行/opt/module/spark-yarn/sbin/start-all.sh
啟動hive服務(wù)
bin/hive-service.sh start
使用 DataGrip 工具連接hive
創(chuàng)建數(shù)據(jù)庫
create table student(id int,mame string);
插入幾條數(shù)據(jù)
insert into table student values(1001,"zhangsan");
遇到如下報錯:
Permission denied: user=anonymous, access=WRITE, inode=“/user/hive/warehouse/
Permission denied: user=anonymous, access=EXECUTE, inode="/tmp/hadoop-yarn"
執(zhí)行:
hdfs dfs -chmod -R 777 /user/hive/warehouse/
hdfs dfs -chmod -R 777 /tmp
select * from student;
select id,count(*) from student group by id;
參考:
https://blog.csdn.net/weixin_43923463/article/details/123736847