nosql:sqoop_erste_schritte
Unterschiede
Hier werden die Unterschiede zwischen zwei Versionen angezeigt.
Beide Seiten der vorigen RevisionVorhergehende ÜberarbeitungNächste Überarbeitung | Vorhergehende Überarbeitung | ||
nosql:sqoop_erste_schritte [2014/08/27 20:35] – gpipperr | nosql:sqoop_erste_schritte [2014/11/17 11:41] (aktuell) – [Die ersten Daten zwischen der Datenbank und Hadoop kopieren] gpipperr | ||
---|---|---|---|
Zeile 1: | Zeile 1: | ||
+ | =====Erste Schritte mit Apache Sqoop 2 - v1.99 ===== | ||
+ | |||
+ | Mit Apache Sqoop (siehe => http:// | ||
+ | |||
+ | Architektur: | ||
+ | |||
+ | |||
+ | {{ : | ||
+ | |||
+ | In der Version 2 (v1.99) wird eine Client Server Architektur realisiert. | ||
+ | |||
+ | |||
+ | ====Installation Apache Sqoop==== | ||
+ | |||
+ | Voraussetzung: | ||
+ | * Hadopp mindestens als Client auf dem Server installiert | ||
+ | * siehe zum Beispiel diese Hadoop Umgebung [[nosql: | ||
+ | |||
+ | |||
+ | Ablauf: | ||
+ | * Downlad Sqoop von [[http:// | ||
+ | * Entpacken nach / | ||
+ | * Umgebung setzen | ||
+ | * Oracle JDBC Treiber herunterladen [[http:// | ||
+ | * JDBC Treiber in das Sqoop Server Lib Verzeichnis kopieren | ||
+ | * Test | ||
+ | |||
+ | Ablauf: | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | # auspacken | ||
+ | |||
+ | cd / | ||
+ | |||
+ | wget http:// | ||
+ | |||
+ | tar xfvz sqoop-1.99.3-bin-hadoop200.tar.gz | ||
+ | |||
+ | rm sqoop-1.99.3-bin-hadoop200.tar.gz | ||
+ | |||
+ | # Verlinken | ||
+ | |||
+ | ln -s sqoop-1.99.3-bin-hadoop200 sqoop | ||
+ | |||
+ | |||
+ | #Umgebung setzen | ||
+ | vi ~/.bashrc | ||
+ | |||
+ | .. | ||
+ | #Sqoop | ||
+ | export SQOOP_HOME=/ | ||
+ | export PATH=$SQOOP_HOME/ | ||
+ | .. | ||
+ | |||
+ | . ~/.bashrc | ||
+ | |||
+ | |||
+ | #Copy Oracle JDBC Driver in das server Lib verzeichnis: | ||
+ | cp ~/ | ||
+ | |||
+ | #lib Verzeichnis anlegen | ||
+ | mkdir $SQOOP_HOME/ | ||
+ | |||
+ | |||
+ | #1. test des Clients | ||
+ | |||
+ | sqoop.sh client | ||
+ | |||
+ | Sqoop home directory: / | ||
+ | Sqoop Shell: Type ' | ||
+ | |||
+ | sqoop: | ||
+ | client version: | ||
+ | Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b | ||
+ | Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 | ||
+ | sqoop: | ||
+ | |||
+ | </ | ||
+ | |||
+ | ===Konfiguration Apache Sqoop=== | ||
+ | |||
+ | Für den Zugriff auf die Hadoop Lib in der Datei catalina.properties in $SQOOP_HOME/ | ||
+ | <code bash> | ||
+ | |||
+ | vi $SQOOP_HOME/ | ||
+ | |||
+ | common.loader=${catalina.base}/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | die Sqoop Server Konfigurieren befindet sich unter **$SQOOP_HOME/ | ||
+ | |||
+ | In der **sqoop.properties** muss das Haddop Config Verzeichniss konfiguriert/ | ||
+ | <code bash> | ||
+ | vi $SQOOP_HOME/ | ||
+ | |||
+ | # Hadoop configuration directory | ||
+ | org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | Die Ports lassen sich mit der Datei **$SQOOP_HOME/ | ||
+ | |||
+ | Server starten: | ||
+ | <code bash> | ||
+ | $SQOOP_HOME/ | ||
+ | </ | ||
+ | |||
+ | Über die URL < | ||
+ | <code bash> | ||
+ | curl http:// | ||
+ | </ | ||
+ | |||
+ | Server stoppen: | ||
+ | <code bash> | ||
+ | $SQOOP_HOME/ | ||
+ | </ | ||
+ | |||
+ | |||
+ | ===Test des Servers und Beheben erster evlt. Fehler=== | ||
+ | |||
+ | Mit client am Server anmelden und testen: | ||
+ | |||
+ | <code bash> | ||
+ | | ||
+ | |||
+ | Sqoop Shell: Type ' | ||
+ | |||
+ | # Am Server anmelden | ||
+ | sqoop: | ||
+ | Server is set successfully | ||
+ | sqoop: | ||
+ | |||
+ | # Server abfragen | ||
+ | sqoop: | ||
+ | client version: | ||
+ | Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b | ||
+ | Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 | ||
+ | |||
+ | Exception has occurred during processing command | ||
+ | Exception: com.sun.jersey.api.client.UniformInterfaceException Message: GET http:// | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | Tritt dieser Fehler auf, liegt vermutlich ein Konfigurationsproblem vor. | ||
+ | |||
+ | Logfile unter $SQOOP_HOME/ | ||
+ | <code bash> | ||
+ | vi localhost.2014-08-09.log | ||
+ | |||
+ | ... | ||
+ | Caused by: org.apache.sqoop.common.SqoopException: | ||
+ | |||
+ | ... | ||
+ | |||
+ | Lösung: | ||
+ | |||
+ | cd $SQOOP_HOME/ | ||
+ | |||
+ | vi sqoop.properties | ||
+ | |||
+ | # Hadoop configuration directory | ||
+ | org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/ | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | Fehler beim Stoppen des Server: | ||
+ | |||
+ | <code bash> | ||
+ | Aug 09, 2014 2:44:41 PM org.apache.catalina.startup.ClassLoaderFactory validateFile | ||
+ | WARNING: Problem with directory [/ | ||
+ | |||
+ | # | ||
+ | mkdir / | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Repository Datenbank ==== | ||
+ | |||
+ | In einer default Installation liegt die Repository Datenbank unter $SQOOP_HOME/ | ||
+ | |||
+ | |||
+ | |||
+ | ==== Die ersten Daten zwischen der Datenbank und Hadoop kopieren ==== | ||
+ | |||
+ | Es soll die Tabelle **SYSTEM.AUD$** aus der Oracle Datenbank als Text Datei in das Hadoop Cluster importiert werden. | ||
+ | |||
+ | Ablauf: | ||
+ | * Client starten und mit dem Sqoop Server verbinden | ||
+ | * Connection Object anlegen | ||
+ | * Job Object definieren | ||
+ | * Job starten und Daten nach Hadoop importieren | ||
+ | |||
+ | |||
+ | **Client starten und mit dem Sqoop Server** | ||
+ | <code bash> | ||
+ | |||
+ | $SQOOP_HOME/ | ||
+ | |||
+ | |||
+ | sqoop: | ||
+ | |||
+ | </ | ||
+ | |||
+ | **Connection Object anlegen** | ||
+ | |||
+ | Für das Connection Object werden die folgenden Informationen benötigt: | ||
+ | * Treibername - oracle.jdbc.driver.OracleDriver | ||
+ | * JDBC URL - < | ||
+ | * Username und Password | ||
+ | |||
+ | |||
+ | <code bash> | ||
+ | |||
+ | sqoop: | ||
+ | Creating connection for connector with id 1 | ||
+ | Please fill following values to create new connection object | ||
+ | Name: GPI DB | ||
+ | |||
+ | Connection configuration | ||
+ | |||
+ | JDBC Driver Class: oracle.jdbc.driver.OracleDriver | ||
+ | JDBC Connection String: jdbc: | ||
+ | Username: system | ||
+ | Password: ****** | ||
+ | JDBC Connection Properties: | ||
+ | There are currently 0 values in the map: | ||
+ | entry# | ||
+ | |||
+ | Security related configuration options | ||
+ | |||
+ | Max connections: | ||
+ | New connection was successfully created with validation status FINE and persistent id 1 | ||
+ | |||
+ | sqoop: | ||
+ | 1 connection(s) to show: | ||
+ | Connection with id 2 and name GPI DB (Enabled: true, Created by hadoop at 8/9/14 5:27 PM, Updated by hadoop at 8/9/14 5:27 PM) | ||
+ | Using Connector id 1 | ||
+ | Connection configuration | ||
+ | JDBC Driver Class: oracle.jdbc.driver.OracleDriver | ||
+ | JDBC Connection String: jdbc: | ||
+ | Username: system | ||
+ | Password: | ||
+ | JDBC Connection Properties: | ||
+ | Security related configuration options | ||
+ | Max connections: | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | **Job Object definieren** | ||
+ | |||
+ | Für das Job Object werden die folgenden Informationen benötigt: | ||
+ | * Name und Schema der Tabelle | ||
+ | * Speicherort im HDFS <code bash> | ||
+ | hdfs dfs -chmod 777 /auditVault | ||
+ | </ | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | sqoop: | ||
+ | Creating job for connection with id 1 | ||
+ | Please fill following values to create new job object | ||
+ | Name: GPI AUD Import | ||
+ | |||
+ | Database configuration | ||
+ | |||
+ | Schema name: SYSTEM | ||
+ | Table name: AUD$ | ||
+ | Table SQL: | ||
+ | Table column names: | ||
+ | Partition column name:USERID | ||
+ | Nulls in partition column: | ||
+ | Boundary query: | ||
+ | |||
+ | Output configuration | ||
+ | |||
+ | Storage type: | ||
+ | 0 : HDFS | ||
+ | Choose: 0 | ||
+ | Output format: | ||
+ | 0 : TEXT_FILE | ||
+ | 1 : SEQUENCE_FILE | ||
+ | Choose: 1 | ||
+ | Compression format: | ||
+ | 0 : NONE | ||
+ | 1 : DEFAULT | ||
+ | 2 : DEFLATE | ||
+ | 3 : GZIP | ||
+ | 4 : BZIP2 | ||
+ | 5 : LZO | ||
+ | 6 : LZ4 | ||
+ | 7 : SNAPPY | ||
+ | Choose: 0 | ||
+ | Output directory: / | ||
+ | |||
+ | Throttling resources | ||
+ | |||
+ | Extractors: | ||
+ | Loaders: | ||
+ | |||
+ | New job was successfully created with validation status FINE and persistent id 1 | ||
+ | |||
+ | sqoop: | ||
+ | +----+----------------+--------+-----------+---------+ | ||
+ | | Id | Name | Type | Connector | Enabled | | ||
+ | +----+----------------+--------+-----------+---------+ | ||
+ | | 1 | GPI AUD Import | IMPORT | 1 | true | | ||
+ | +----+----------------+--------+-----------+---------+ | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | **Job starten und Daten nach Hadoop importieren** | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | sqoop: | ||
+ | Exception has occurred during processing command | ||
+ | Unknown command: No such property: start for class: groovysh_evaluate | ||
+ | |||
+ | |||
+ | ???????? | ||
+ | |||
+ | |||
+ | |||
+ | sqoop: | ||
+ | Exception has occurred during processing command | ||
+ | Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001: | ||
+ | |||
+ | # Fehler meldung sichtbar machen: | ||
+ | sqoop: | ||
+ | sqoop: | ||
+ | |||
+ | ... | ||
+ | |||
+ | Caused by: Exception: java.lang.Throwable Message: GENERIC_JDBC_CONNECTOR_0005: | ||
+ | Stack trace: | ||
+ | |||
+ | ... | ||
+ | sqoop: | ||
+ | |||
+ | #alle alten werte bis auf | ||
+ | Partition column name: TIMESTAMP# | ||
+ | #und | ||
+ | Output directory: / | ||
+ | |||
+ | sqoop: | ||
+ | 2014-08-09 17:56:17 CEST: FAILURE_ON_SUBMIT | ||
+ | Exception: java.io.FileNotFoundException: | ||
+ | |||
+ | </ | ||
+ | |||
+ | Diese Anpassung an Hadoop Configuration löst dieser Verhalten: | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | vi $HADOOP_HOME/ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Nächster Test: | ||
+ | |||
+ | <code bash> | ||
+ | sqoop: | ||
+ | 2014-08-09 18:01:18 CEST: FAILURE_ON_SUBMIT | ||
+ | Exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): | ||
+ | The reported blocks 10 has reached the threshold 0.9990 of total blocks 10. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension | ||
+ | |||
+ | </ | ||
+ | |||
+ | # Lösung: | ||
+ | <code bash> | ||
+ | hdfs dfsadmin -safemode leave | ||
+ | </ | ||
+ | |||
+ | Nächster Test: | ||
+ | |||
+ | <code bash> | ||
+ | sqoop: | ||
+ | Submission details | ||
+ | Job ID: 1 | ||
+ | Server URL: http:// | ||
+ | Created by: hadoop | ||
+ | Creation date: 2014-08-09 18:06:13 CEST | ||
+ | Lastly updated by: hadoop | ||
+ | External ID: job_1407600059752_0001 | ||
+ | http:// | ||
+ | 2014-08-09 18:06:13 CEST: BOOTING | ||
+ | |||
+ | # was tut sich: | ||
+ | |||
+ | |||
+ | sqoop: | ||
+ | Submission details | ||
+ | Job ID: 1 | ||
+ | Server URL: http:// | ||
+ | Created by: hadoop | ||
+ | Creation date: 2014-08-11 21:17:42 CEST | ||
+ | Lastly updated by: hadoop | ||
+ | External ID: job_1407784150673_0001 | ||
+ | http:// | ||
+ | Connector schema: Schema{name=AUD$, | ||
+ | Decimal{name=SESSIONID, | ||
+ | Decimal{name=ENTRYID, | ||
+ | Decimal{name=STATEMENT, | ||
+ | Date{name=TIMESTAMP#, | ||
+ | Text{name=USERID, | ||
+ | Text{name=USERHOST, | ||
+ | Text{name=TERMINAL, | ||
+ | Decimal{name=ACTION#, | ||
+ | Decimal{name=RETURNCODE, | ||
+ | Text{name=OBJ$CREATOR, | ||
+ | Text{name=OBJ$NAME, | ||
+ | Text{name=AUTH$PRIVILEGES, | ||
+ | Text{name=AUTH$GRANTEE, | ||
+ | Text{name=NEW$OWNER, | ||
+ | Text{name=NEW$NAME, | ||
+ | Text{name=SES$ACTIONS, | ||
+ | Decimal{name=SES$TID, | ||
+ | Decimal{name=LOGOFF$LREAD, | ||
+ | Decimal{name=LOGOFF$PREAD, | ||
+ | Decimal{name=LOGOFF$LWRITE, | ||
+ | Decimal{name=LOGOFF$DEAD, | ||
+ | Date{name=LOGOFF$TIME, | ||
+ | Text{name=COMMENT$TEXT, | ||
+ | Text{name=CLIENTID, | ||
+ | Text{name=SPARE1, | ||
+ | Decimal{name=SPARE2, | ||
+ | Binary{name=OBJ$LABEL, | ||
+ | Binary{name=SES$LABEL, | ||
+ | Decimal{name=PRIV$USED, | ||
+ | Decimal{name=SESSIONCPU, | ||
+ | Date{name=NTIMESTAMP#, | ||
+ | Decimal{name=PROXY$SID, | ||
+ | Text{name=USER$GUID, | ||
+ | Decimal{name=INSTANCE#, | ||
+ | Text{name=PROCESS#, | ||
+ | Binary{name=XID, | ||
+ | Text{name=AUDITID, | ||
+ | Decimal{name=SCN, | ||
+ | Decimal{name=DBID, | ||
+ | FloatingPoint{name=SQLBIND, | ||
+ | FloatingPoint{name=SQLTEXT, | ||
+ | Text{name=OBJ$EDITION, | ||
+ | 2014-08-11 21:17:42 CEST: BOOTING | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | Status in YARN: | ||
+ | <code bash> | ||
+ | |||
+ | yarn application -list | ||
+ | |||
+ | 14/08/11 21:21:33 INFO client.RMProxy: | ||
+ | 14/08/11 21:21:33 WARN util.NativeCodeLoader: | ||
+ | Total number of applications (application-types: | ||
+ | Application-Id | ||
+ | application_1407784150673_0001 | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | Leider bricht der Ladevorgang am Ende immer ab: | ||
+ | <code bash> | ||
+ | Caused by: org.apache.sqoop.common.SqoopException: | ||
+ | at org.apache.sqoop.connector.jdbc.GenericJdbcExecutor.< | ||
+ | at org.apache.sqoop.connector.jdbc.GenericJdbcImportExtractor.extract(GenericJdbcImportExtractor.java: | ||
+ | at org.apache.sqoop.connector.jdbc.GenericJdbcImportExtractor.extract(GenericJdbcImportExtractor.java: | ||
+ | at org.apache.sqoop.job.mr.SqoopMapper.run(SqoopMapper.java: | ||
+ | ... 7 more | ||
+ | Caused by: java.sql.SQLRecoverableException: | ||
+ | </ | ||
+ | |||
+ | ?? Suche wir weiter ?? | ||
+ | |||
+ | |||
+ | Mehr Erfolg mit : [[nosql: | ||
+ | ==== Einsatz von OraOop ===== | ||
+ | |||
+ | Oraoop is a special plugin for sqoop that provides faster access to Oracle' | ||
+ | |||
+ | See = > [[https:// | ||
+ | |||
+ | |||
+ | See : => http:// | ||
+ | |||
+ | ??? Seit zwei Jahren nicht mehr gepflegt ???? | ||
+ | |||
+ | |||
+ | ==== Sqoop Client API ==== | ||
+ | |||
+ | über die Client API läßt sich auch direkt auf den Server zugreifen um das zum Beispiel in eigene Programme zu integrieren. | ||
+ | |||
+ | => http:// | ||
+ | |||
+ | |||
+ | ==== Quellen ==== | ||
+ | |||
+ | * http:// | ||
+ | |||
+ | |||
+ | Netz: | ||
+ | * http:// | ||
+ | * http:// | ||
nosql/sqoop_erste_schritte.txt · Zuletzt geändert: 2014/11/17 11:41 von gpipperr