nosql:sqoop_erste_schritte
Unterschiede
Hier werden die Unterschiede zwischen zwei Versionen angezeigt.
Beide Seiten der vorigen RevisionVorhergehende Überarbeitung | Letzte ÜberarbeitungBeide Seiten der Revision | ||
nosql:sqoop_erste_schritte [2014/09/24 18:43] – [Die ersten Daten zwischen der Datenbank und Hadoop kopieren] gpipperr | nosql:sqoop_erste_schritte [2014/09/24 18:44] – [Die ersten Daten zwischen der Datenbank und Hadoop kopieren] gpipperr | ||
---|---|---|---|
Zeile 1: | Zeile 1: | ||
- | =====Erste Schritte mit Apache Sqoop 2 - v1.99 ===== | ||
- | |||
- | Mit Apache Sqoop (siehe => http:// | ||
- | |||
- | Architektur: | ||
- | |||
- | |||
- | {{ : | ||
- | |||
- | In der Version 2 (v1.99) wird eine Client Server Architektur realisiert. | ||
- | |||
- | |||
- | ====Installation Apache Sqoop==== | ||
- | |||
- | Voraussetzung: | ||
- | * Hadopp mindestens als Client auf dem Server installiert | ||
- | * siehe zum Beispiel diese Hadoop Umgebung [[nosql: | ||
- | |||
- | |||
- | Ablauf: | ||
- | * Downlad Sqoop von [[http:// | ||
- | * Entpacken nach / | ||
- | * Umgebung setzen | ||
- | * Oracle JDBC Treiber herunterladen [[http:// | ||
- | * JDBC Treiber in das Sqoop Server Lib Verzeichnis kopieren | ||
- | * Test | ||
- | |||
- | Ablauf: | ||
- | |||
- | <code bash> | ||
- | |||
- | # auspacken | ||
- | |||
- | cd / | ||
- | |||
- | wget http:// | ||
- | |||
- | tar xfvz sqoop-1.99.3-bin-hadoop200.tar.gz | ||
- | |||
- | rm sqoop-1.99.3-bin-hadoop200.tar.gz | ||
- | |||
- | # Verlinken | ||
- | |||
- | ln -s sqoop-1.99.3-bin-hadoop200 sqoop | ||
- | |||
- | |||
- | #Umgebung setzen | ||
- | vi ~/.bashrc | ||
- | |||
- | .. | ||
- | #Sqoop | ||
- | export SQOOP_HOME=/ | ||
- | export PATH=$SQOOP_HOME/ | ||
- | .. | ||
- | |||
- | . ~/.bashrc | ||
- | |||
- | |||
- | #Copy Oracle JDBC Driver in das server Lib verzeichnis: | ||
- | cp ~/ | ||
- | |||
- | #lib Verzeichnis anlegen | ||
- | mkdir $SQOOP_HOME/ | ||
- | |||
- | |||
- | #1. test des Clients | ||
- | |||
- | sqoop.sh client | ||
- | |||
- | Sqoop home directory: / | ||
- | Sqoop Shell: Type ' | ||
- | |||
- | sqoop: | ||
- | client version: | ||
- | Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b | ||
- | Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 | ||
- | sqoop: | ||
- | |||
- | </ | ||
- | |||
- | ===Konfiguration Apache Sqoop=== | ||
- | |||
- | Für den Zugriff auf die Hadoop Lib in der Datei catalina.properties in $SQOOP_HOME/ | ||
- | <code bash> | ||
- | |||
- | vi $SQOOP_HOME/ | ||
- | |||
- | common.loader=${catalina.base}/ | ||
- | |||
- | </ | ||
- | |||
- | die Sqoop Server Konfigurieren befindet sich unter **$SQOOP_HOME/ | ||
- | |||
- | In der **sqoop.properties** muss das Haddop Config Verzeichniss konfiguriert/ | ||
- | <code bash> | ||
- | vi $SQOOP_HOME/ | ||
- | |||
- | # Hadoop configuration directory | ||
- | org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/ | ||
- | |||
- | </ | ||
- | |||
- | Die Ports lassen sich mit der Datei **$SQOOP_HOME/ | ||
- | |||
- | Server starten: | ||
- | <code bash> | ||
- | $SQOOP_HOME/ | ||
- | </ | ||
- | |||
- | Über die URL < | ||
- | <code bash> | ||
- | curl http:// | ||
- | </ | ||
- | |||
- | Server stoppen: | ||
- | <code bash> | ||
- | $SQOOP_HOME/ | ||
- | </ | ||
- | |||
- | |||
- | ===Test des Servers und Beheben erster evlt. Fehler=== | ||
- | |||
- | Mit client am Server anmelden und testen: | ||
- | |||
- | <code bash> | ||
- | | ||
- | |||
- | Sqoop Shell: Type ' | ||
- | |||
- | # Am Server anmelden | ||
- | sqoop: | ||
- | Server is set successfully | ||
- | sqoop: | ||
- | |||
- | # Server abfragen | ||
- | sqoop: | ||
- | client version: | ||
- | Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b | ||
- | Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013 | ||
- | |||
- | Exception has occurred during processing command | ||
- | Exception: com.sun.jersey.api.client.UniformInterfaceException Message: GET http:// | ||
- | |||
- | |||
- | </ | ||
- | |||
- | Tritt dieser Fehler auf, liegt vermutlich ein Konfigurationsproblem vor. | ||
- | |||
- | Logfile unter $SQOOP_HOME/ | ||
- | <code bash> | ||
- | vi localhost.2014-08-09.log | ||
- | |||
- | ... | ||
- | Caused by: org.apache.sqoop.common.SqoopException: | ||
- | |||
- | ... | ||
- | |||
- | Lösung: | ||
- | |||
- | cd $SQOOP_HOME/ | ||
- | |||
- | vi sqoop.properties | ||
- | |||
- | # Hadoop configuration directory | ||
- | org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/ | ||
- | |||
- | </ | ||
- | |||
- | |||
- | Fehler beim Stoppen des Server: | ||
- | |||
- | <code bash> | ||
- | Aug 09, 2014 2:44:41 PM org.apache.catalina.startup.ClassLoaderFactory validateFile | ||
- | WARNING: Problem with directory [/ | ||
- | |||
- | # | ||
- | mkdir / | ||
- | |||
- | </ | ||
- | |||
- | ==== Repository Datenbank ==== | ||
- | |||
- | In einer default Installation liegt die Repository Datenbank unter $SQOOP_HOME/ | ||
- | |||
- | |||
- | |||
- | ==== Die ersten Daten zwischen der Datenbank und Hadoop kopieren ==== | ||
- | |||
- | Es soll die Tabelle **SYSTEM.AUD$** aus der Oracle Datenbank als Text Datei in das Hadoop Cluster importiert werden. | ||
- | |||
- | Ablauf: | ||
- | * Client starten und mit dem Sqoop Server verbinden | ||
- | * Connection Object anlegen | ||
- | * Job Object definieren | ||
- | * Job starten und Daten nach Hadoop importieren | ||
- | |||
- | |||
- | **Client starten und mit dem Sqoop Server** | ||
- | <code bash> | ||
- | |||
- | $SQOOP_HOME/ | ||
- | |||
- | |||
- | sqoop: | ||
- | |||
- | </ | ||
- | |||
- | **Connection Object anlegen** | ||
- | |||
- | Für das Connection Object werden die folgenden Informationen benötigt: | ||
- | * Treibername - oracle.jdbc.driver.OracleDriver | ||
- | * JDBC URL - < | ||
- | * Username und Password | ||
- | |||
- | |||
- | <code bash> | ||
- | |||
- | sqoop: | ||
- | Creating connection for connector with id 1 | ||
- | Please fill following values to create new connection object | ||
- | Name: GPI DB | ||
- | |||
- | Connection configuration | ||
- | |||
- | JDBC Driver Class: oracle.jdbc.driver.OracleDriver | ||
- | JDBC Connection String: jdbc: | ||
- | Username: system | ||
- | Password: ****** | ||
- | JDBC Connection Properties: | ||
- | There are currently 0 values in the map: | ||
- | entry# | ||
- | |||
- | Security related configuration options | ||
- | |||
- | Max connections: | ||
- | New connection was successfully created with validation status FINE and persistent id 1 | ||
- | |||
- | sqoop: | ||
- | 1 connection(s) to show: | ||
- | Connection with id 2 and name GPI DB (Enabled: true, Created by hadoop at 8/9/14 5:27 PM, Updated by hadoop at 8/9/14 5:27 PM) | ||
- | Using Connector id 1 | ||
- | Connection configuration | ||
- | JDBC Driver Class: oracle.jdbc.driver.OracleDriver | ||
- | JDBC Connection String: jdbc: | ||
- | Username: system | ||
- | Password: | ||
- | JDBC Connection Properties: | ||
- | Security related configuration options | ||
- | Max connections: | ||
- | |||
- | |||
- | </ | ||
- | |||
- | **Job Object definieren** | ||
- | |||
- | Für das Job Object werden die folgenden Informationen benötigt: | ||
- | * Name und Schema der Tabelle | ||
- | * Speicherort im HDFS <code bash> | ||
- | hdfs dfs -chmod 777 /auditVault | ||
- | </ | ||
- | |||
- | <code bash> | ||
- | |||
- | sqoop: | ||
- | Creating job for connection with id 1 | ||
- | Please fill following values to create new job object | ||
- | Name: GPI AUD Import | ||
- | |||
- | Database configuration | ||
- | |||
- | Schema name: SYSTEM | ||
- | Table name: AUD$ | ||
- | Table SQL: | ||
- | Table column names: | ||
- | Partition column name:USERID | ||
- | Nulls in partition column: | ||
- | Boundary query: | ||
- | |||
- | Output configuration | ||
- | |||
- | Storage type: | ||
- | 0 : HDFS | ||
- | Choose: 0 | ||
- | Output format: | ||
- | 0 : TEXT_FILE | ||
- | 1 : SEQUENCE_FILE | ||
- | Choose: 1 | ||
- | Compression format: | ||
- | 0 : NONE | ||
- | 1 : DEFAULT | ||
- | 2 : DEFLATE | ||
- | 3 : GZIP | ||
- | 4 : BZIP2 | ||
- | 5 : LZO | ||
- | 6 : LZ4 | ||
- | 7 : SNAPPY | ||
- | Choose: 0 | ||
- | Output directory: / | ||
- | |||
- | Throttling resources | ||
- | |||
- | Extractors: | ||
- | Loaders: | ||
- | |||
- | New job was successfully created with validation status FINE and persistent id 1 | ||
- | |||
- | sqoop: | ||
- | +----+----------------+--------+-----------+---------+ | ||
- | | Id | Name | Type | Connector | Enabled | | ||
- | +----+----------------+--------+-----------+---------+ | ||
- | | 1 | GPI AUD Import | IMPORT | 1 | true | | ||
- | +----+----------------+--------+-----------+---------+ | ||
- | |||
- | |||
- | </ | ||
- | |||
- | **Job starten und Daten nach Hadoop importieren** | ||
- | |||
- | <code bash> | ||
- | |||
- | sqoop: | ||
- | Exception has occurred during processing command | ||
- | Unknown command: No such property: start for class: groovysh_evaluate | ||
- | |||
- | |||
- | ???????? | ||
- | |||
- | |||
- | |||
- | sqoop: | ||
- | Exception has occurred during processing command | ||
- | Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001: | ||
- | |||
- | # Fehler meldung sichtbar machen: | ||
- | sqoop: | ||
- | sqoop: | ||
- | |||
- | ... | ||
- | |||
- | Caused by: Exception: java.lang.Throwable Message: GENERIC_JDBC_CONNECTOR_0005: | ||
- | Stack trace: | ||
- | |||
- | ... | ||
- | sqoop: | ||
- | |||
- | #alle alten werte bis auf | ||
- | Partition column name: TIMESTAMP# | ||
- | #und | ||
- | Output directory: / | ||
- | |||
- | sqoop: | ||
- | 2014-08-09 17:56:17 CEST: FAILURE_ON_SUBMIT | ||
- | Exception: java.io.FileNotFoundException: | ||
- | |||
- | </ | ||
- | |||
- | Diese Anpassung an Hadoop Configuration löst dieser Verhalten: | ||
- | |||
- | <code bash> | ||
- | |||
- | vi $HADOOP_HOME/ | ||
- | |||
- | < | ||
- | < | ||
- | < | ||
- | </ | ||
- | |||
- | </ | ||
- | |||
- | |||
- | |||
- | |||
- | Nächster Test: | ||
- | |||
- | <code bash> | ||
- | sqoop: | ||
- | 2014-08-09 18:01:18 CEST: FAILURE_ON_SUBMIT | ||
- | Exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): | ||
- | The reported blocks 10 has reached the threshold 0.9990 of total blocks 10. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension | ||
- | |||
- | </ | ||
- | |||
- | # Lösung: | ||
- | <code bash> | ||
- | hdfs dfsadmin -safemode leave | ||
- | </ | ||
- | |||
- | Nächster Test: | ||
- | |||
- | <code bash> | ||
- | sqoop: | ||
- | Submission details | ||
- | Job ID: 1 | ||
- | Server URL: http:// | ||
- | Created by: hadoop | ||
- | Creation date: 2014-08-09 18:06:13 CEST | ||
- | Lastly updated by: hadoop | ||
- | External ID: job_1407600059752_0001 | ||
- | http:// | ||
- | 2014-08-09 18:06:13 CEST: BOOTING | ||
- | |||
- | # was tut sich: | ||
- | |||
- | |||
- | sqoop: | ||
- | Submission details | ||
- | Job ID: 1 | ||
- | Server URL: http:// | ||
- | Created by: hadoop | ||
- | Creation date: 2014-08-11 21:17:42 CEST | ||
- | Lastly updated by: hadoop | ||
- | External ID: job_1407784150673_0001 | ||
- | http:// | ||
- | Connector schema: Schema{name=AUD$, | ||
- | Decimal{name=SESSIONID, | ||
- | Decimal{name=ENTRYID, | ||
- | Decimal{name=STATEMENT, | ||
- | Date{name=TIMESTAMP#, | ||
- | Text{name=USERID, | ||
- | Text{name=USERHOST, | ||
- | Text{name=TERMINAL, | ||
- | Decimal{name=ACTION#, | ||
- | Decimal{name=RETURNCODE, | ||
- | Text{name=OBJ$CREATOR, | ||
- | Text{name=OBJ$NAME, | ||
- | Text{name=AUTH$PRIVILEGES, | ||
- | Text{name=AUTH$GRANTEE, | ||
- | Text{name=NEW$OWNER, | ||
- | Text{name=NEW$NAME, | ||
- | Text{name=SES$ACTIONS, | ||
- | Decimal{name=SES$TID, | ||
- | Decimal{name=LOGOFF$LREAD, | ||
- | Decimal{name=LOGOFF$PREAD, | ||
- | Decimal{name=LOGOFF$LWRITE, | ||
- | Decimal{name=LOGOFF$DEAD, | ||
- | Date{name=LOGOFF$TIME, | ||
- | Text{name=COMMENT$TEXT, | ||
- | Text{name=CLIENTID, | ||
- | Text{name=SPARE1, | ||
- | Decimal{name=SPARE2, | ||
- | Binary{name=OBJ$LABEL, | ||
- | Binary{name=SES$LABEL, | ||
- | Decimal{name=PRIV$USED, | ||
- | Decimal{name=SESSIONCPU, | ||
- | Date{name=NTIMESTAMP#, | ||
- | Decimal{name=PROXY$SID, | ||
- | Text{name=USER$GUID, | ||
- | Decimal{name=INSTANCE#, | ||
- | Text{name=PROCESS#, | ||
- | Binary{name=XID, | ||
- | Text{name=AUDITID, | ||
- | Decimal{name=SCN, | ||
- | Decimal{name=DBID, | ||
- | FloatingPoint{name=SQLBIND, | ||
- | FloatingPoint{name=SQLTEXT, | ||
- | Text{name=OBJ$EDITION, | ||
- | 2014-08-11 21:17:42 CEST: BOOTING | ||
- | |||
- | |||
- | </ | ||
- | |||
- | Status in YARN: | ||
- | <code bash> | ||
- | |||
- | yarn application -list | ||
- | |||
- | 14/08/11 21:21:33 INFO client.RMProxy: | ||
- | 14/08/11 21:21:33 WARN util.NativeCodeLoader: | ||
- | Total number of applications (application-types: | ||
- | Application-Id | ||
- | application_1407784150673_0001 | ||
- | |||
- | |||
- | </ | ||
- | |||
- | Leider bricht der Ladevorgang am Ende immer ab: | ||
- | <code bash> | ||
- | Caused by: org.apache.sqoop.common.SqoopException: | ||
- | at org.apache.sqoop.connector.jdbc.GenericJdbcExecutor.< | ||
- | at org.apache.sqoop.connector.jdbc.GenericJdbcImportExtractor.extract(GenericJdbcImportExtractor.java: | ||
- | at org.apache.sqoop.connector.jdbc.GenericJdbcImportExtractor.extract(GenericJdbcImportExtractor.java: | ||
- | at org.apache.sqoop.job.mr.SqoopMapper.run(SqoopMapper.java: | ||
- | ... 7 more | ||
- | Caused by: java.sql.SQLRecoverableException: | ||
- | </ | ||
- | |||
- | ??? Suche wir weiter ?? | ||
- | |||
- | |||
- | Mehr Erfolg mit : [[nosql: | ||
- | ==== Einsatz von OraOop ===== | ||
- | |||
- | Oraoop is a special plugin for sqoop that provides faster access to Oracle' | ||
- | |||
- | See = > [[https:// | ||
- | |||
- | |||
- | See : => http:// | ||
- | |||
- | ??? Seit zwei Jahren nicht mehr gepflegt ???? | ||
- | |||
- | |||
- | ==== Sqoop Client API ==== | ||
- | |||
- | über die Client API läßt sich auch direkt auf den Server zugreifen um das zum Beispiel in eigene Programme zu integrieren. | ||
- | |||
- | => http:// | ||
- | |||
- | |||
- | ==== Quellen ==== | ||
- | |||
- | * http:// | ||
- | |||
- | |||
- | Netz: | ||
- | * http:// | ||
- | * http:// | ||
nosql/sqoop_erste_schritte.txt · Zuletzt geändert: 2014/11/17 11:41 von gpipperr