Sqoop使用笔记

一、介绍与安装

       Sqoop(发音:skup)是一款开源的工具,主要用于在HADOOP(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中。


(1)下载地址:

               http://www.apache.org/dyn/closer.cgi/sqoop/1.99.6

               http://mirrors.cnnic.cn/apache/sqoop/1.99.6/

                http://dev.mysql.com/downloads/connector/j/5.1.html

(2)安装:

tar -zxvf sqoop-1.99.6-bin-hadoop200.tar.gz 
mv sqoop-1.99.6-bin-hadoop200 /opt/sqoop2
vim /etc/profile
# 添加sqoop2到系统环境变量中:
export SQOOP2_HOME=/opt/sqoop2
export CATALINA_BASE=$SQOOP2_HOME/server
export PATH=.:$SQOOP2_HOME/bin:$PATH
# 拷贝mysql驱动包到$SQOOP2_HOME/server/lib下
cp /tools/mysql-connector-java-5.1.34-bin.jar /opt/sqoop/server/lib/

注: /opt/hadoop-2.6.0/ 为 hadoop 目录

# 修改配置文件:$SQOOP2_HOME/server/conf/sqoop.properties
vim /opt/sqoop/server/conf/sqoop.properties
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop-2.6.0/etc/hadoop
# 修改配置文件:$SQOOP2_HOME/server/conf/catalina.properties
vim /opt/sqoop/server/conf/catalina.properties
      
common.loader=/opt/hadoop-2.6.0/share/hadoop/common/*.jar,/opt/hadoop-2.6.0/share/hadoop/common/lib/*.jar,/opt/hadoop-2.6.0/share/hadoop/hdfs/*.jar,/opt/hadoop-2.6.0/share/hadoop/hdfs/lib/*.jar,/opt/hadoop-2.6.0/share/hadoop/mapreduce/*.jar,/opt/hadoop-2.6.0/share/hadoop/mapreduce/lib/*.jar,/opt/hadoop-2.6.0/share/hadoop/tools/*.jar,/opt/hadoop-2.6.0/share/hadoop/tools/lib/*.jar,/opt/hadoop-2.6.0/share/hadoop/yarn/*.jar,/opt/hadoop-2.6.0/share/hadoop/yarn/lib/*.jar,/opt/hadoop-2.6.0/share/hadoop/httpfs/tomcat/lib/*.jar

如果想修改tomcat的端口号等信息可以在$SQOOP2_HOME/server/conf/server.xml中进行设置;

二、启动和停止 sqoop

(1)启停sqoop-server:

$SQOOP2_HOME/bin/sqoop.sh server start
$SQOOP2_HOME/bin/sqoop.sh server stop
或者
$SQOOP2_HOME/bin/sqoop2-server start
$SQOOP2_HOME/bin/sqoop2-server stop

(2)启动sqoop客户端:

$SQOOP2_HOME/bin/sqoop.sh client
或者
$SQOOP2_HOME/bin/sqoop2-shell

(3)为客户端配置服务器:

sqoop:000> set server --host hadoop000 --port 12000 --webapp sqoop

(4)查看服务器端信息:

sqoop:000> show server --all

(5)查看所有connector:

$ show connector --all
2 connector(s) to show: 
        Connector with id 1:
            Name: hdfs-connector 
            Class: org.apache.sqoop.connector.hdfs.HdfsConnector
            Version: 1.99.4-cdh5.3.0
      
        Connector with id 2:
            Name: generic-jdbc-connector 
            Class: org.apache.sqoop.connector.jdbc.GenericJdbcConnector
            Version: 1.99.4-cdh5.3.0

三、导入数据

# 查询所有link: 
show link
      
# 删除指定link:
delete link --lid x
delete link -l x
      
# 查询所有job:
show job
      
# 删除指定job:
delete job --jid 1
delete job -j 1
      
# 创建generic-jdbc-connector类型的connector
$ create link --cid 2
      
    Name: First Link
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://hadoop000:3306/hive
    Username: root
    Password: ****
    JDBC Connection Properties: 
    There are currently 0 values in the map:
    entry# protocol=tcp
    There are currently 1 values in the map:
    protocol = tcp
    entry# 
    New link was successfully created with validation status OK and persistent id 3
      
$ show link
+----+-------------+-----------+---------+
| Id |    Name     | Connector | Enabled |
+----+-------------+-----------+---------+
| 3  | First Link  | 2         | true    |
+----+-------------+-----------+---------+
      
# 创建hdfs-connector类型的connector:
$ create link -cid 1
      
    Name: Second Link
    HDFS URI: hdfs://hadoop000:8020
    New link was successfully created with validation status OK and persistent id 4
      
$ show link
+----+-------------+-----------+---------+
| Id |    Name     | Connector | Enabled |
+----+-------------+-----------+---------+
| 3  | First Link  | 2         | true    |
| 4  | Second Link | 1         | true    |
+----+-------------+-----------+---------+
      
$ show link -all
      
    2 link(s) to show: 
    link with id 3 and name First Link (Enabled: true, Created by null at 15-2-2 ??11:28, Updated by null at 15-2-2 ??11:28)
    Using Connector id 2
      Link configuration
        JDBC Driver Class: com.mysql.jdbc.Driver
        JDBC Connection String: jdbc:mysql://hadoop000:3306/hive
        Username: root
        Password: 
        JDBC Connection Properties: 
          protocol = tcp
    link with id 4 and name Second Link (Enabled: true, Created by null at 15-2-2 ??11:32, Updated by null at 15-2-2 ??11:32)
    Using Connector id 1
      Link configuration
        HDFS URI: hdfs://hadoop000:8020
      
# 根据connector id创建job:
$ create job -f 3 -t 4
      
    Creating job for links with from id 3 and to id 4
    Please fill following values to create new job object
    Name: Sqoopy
      
    From database configuration
      
    Schema name: hive
    Table name: TBLS
    Table SQL statement: 
    Table column names: 
    Partition column name: 
    Null value allowed for the partition column: 
    Boundary query: 
      
    ToJob configuration
      
    Output format: 
      0 : TEXT_FILE
      1 : SEQUENCE_FILE
    Choose: 0
    Compression format: 
      0 : NONE
      1 : DEFAULT
      2 : DEFLATE
      3 : GZIP
      4 : BZIP2
      5 : LZO
      6 : LZ4
      7 : SNAPPY
      8 : CUSTOM
    Choose: 0
    Custom compression format: 
    Output directory: hdfs://hadoop000:8020/sqoop2/tbls_import_demo_sqoop1.99.4
      
    Throttling resources
      
    Extractors: 
    Loaders: 
    New job was successfully created with validation status OK  and persistent id 2
      
# 查询所有job:     
$ show job
+----+--------+----------------+--------------+---------+
| Id |  Name  | From Connector | To Connector | Enabled |
+----+--------+----------------+--------------+---------+
| 2  | Sqoopy | 2              | 1            | true    |
+----+--------+----------------+--------------+---------+
      
# 启动指定的job:  该job执行完后查看HDFS上的文件(hdfs fs -ls hdfs://hadoop000:8020/sqoop2/tbls_import_demo_sqoop1.99.4/)
start job --jid 2
      
# 查看指定job的执行状态:
status job --jid 2
      
# 停止指定的job:
stop job --jid 2
      
# 在start job(如:start job --jid 2)时常见错误:
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception
      
# 在sqoop客户端设置查看job详情:
set option --name verbose --value true
show job --jid 2
# 查看hdfs 文件
hdfs dfs -cat /user/hive/warehouse/test/13a23dfe-82d5-477c-b02d-12c2e31254a3.txt
# 删除hdfs 文件 
hdfs dfs -rm -f -r /user/hive/warehouse/test/




by 雪洁 2015-06-09 02:30:13 2078 views
我来说几句

相关文章