nosql:oracle_hadoop_big_data_appl_erste_schritte
Unterschiede
Hier werden die Unterschiede zwischen zwei Versionen angezeigt.
Beide Seiten der vorigen RevisionVorhergehende ÜberarbeitungNächste Überarbeitung | Vorhergehende Überarbeitung | ||
nosql:oracle_hadoop_big_data_appl_erste_schritte [2014/09/20 21:22] – [Diese AUD$ Daten in der DB wieder als external Table zur Verfügung stellen] gpipperr | nosql:oracle_hadoop_big_data_appl_erste_schritte [2014/09/20 21:28] (aktuell) – [Mit dem Oracle Loader for Hadoop – OLH vom Hadoop Daten einlesen] gpipperr | ||
---|---|---|---|
Zeile 1: | Zeile 1: | ||
+ | ===== Erste Schritte mit der Oracle Big Data Appliance ===== | ||
+ | |||
+ | ==== BigDataLite VM von Oracle ==== | ||
+ | |||
+ | Eine VM von Oracle zu dem Thema BigData kann hier gefunden werden: \\ | ||
+ | => http:// | ||
+ | |||
+ | |||
+ | Um die VM in VMWare zu verwenden: | ||
+ | => [[nosql: | ||
+ | |||
+ | |||
+ | Soll remote auf das HDFS zugegriffen werden, müssen alle Services/ | ||
+ | |||
+ | Dazu die /etc/hosts anpassen, bei Localhost den Namen der Maschine entfernen und einen eigenen Eintrag hinzufügen: | ||
+ | <code bash> | ||
+ | 127.0.0.1 | ||
+ | 10.10.10.12 | ||
+ | </ | ||
+ | |||
+ | Und die Firewall ausschalten (natürlich nur in einer Testumgebung .-) )! | ||
+ | <code bash> | ||
+ | iptables -L | ||
+ | service iptables off | ||
+ | chkconfig iptables off | ||
+ | |||
+ | chkconfig --list iptables | ||
+ | |||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | ==== Daten aus Oracle mit sqoop 1.4 in das HDFS exportieren ==== | ||
+ | |||
+ | Ziel - die Aud$ Tabelle soll in das HDFS geladen werden | ||
+ | |||
+ | |||
+ | Datenbank Connect testen: | ||
+ | <code bash> | ||
+ | |||
+ | sqoop list-databases --connect jdbc: | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | Tabellen eines Users auflisten: | ||
+ | <code bash> | ||
+ | |||
+ | sqoop list-tables --connect jdbc: | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | Ein einfachen Import : | ||
+ | |||
+ | <code bash> | ||
+ | # | ||
+ | |||
+ | hdfs dfs -mkdir | ||
+ | |||
+ | |||
+ | # Einfachen Import durchführen | ||
+ | |||
+ | sqoop import | ||
+ | --connect jdbc: | ||
+ | --username "sys as sysdba" | ||
+ | --password welcome1 | ||
+ | --table " | ||
+ | -m 1 \ | ||
+ | --target-dir / | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | Die Daten liegen nun im HDFS vor: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ==== Daten aus Oracle mit sqoop 2 - 1.9 in das HDFS exportieren ==== | ||
+ | |||
+ | Für erste Schritt mit sqoop siehe hier => [[nosql: | ||
+ | |||
+ | |||
+ | Cloudera ( die Basis der Oracle VM) verteilt leider die Werkzeuge sehr stark im Filesystem mit dem Nachteil das erst mühsam die eigentlichen Default Verzeichnisse und Pfade gesucht werden müssen. Zwar läßt sich mit sqoop2 " | ||
+ | |||
+ | Daher zuvor die Umgebung einrichten und manuell gestartet: | ||
+ | |||
+ | <code bash> | ||
+ | export SQOOP_HOME=/ | ||
+ | |||
+ | cd $SQOOP_HOME/ | ||
+ | |||
+ | ./sqoop.sh client | ||
+ | |||
+ | # erster Fehler | ||
+ | Sqoop home directory: / | ||
+ | Error: Could not find or load main class org.apache.sqoop.shell.SqoopShell | ||
+ | |||
+ | # Umgebung anpassen | ||
+ | export CLIENT_LIB=/ | ||
+ | export JAVA_OPTS=" | ||
+ | |||
+ | #nächster Fehler | ||
+ | qoop home directory: / | ||
+ | Sqoop Shell: Type ' | ||
+ | |||
+ | Exception in thread " | ||
+ | |||
+ | </ | ||
+ | |||
+ | Trotz intensiver Fehlersuche ist es mir auf Anhieb nicht gelungen, die Ursache für den ***" | ||
+ | |||
+ | In einer vergleichbaren Cloudera Standard VM tritt der Fehler nicht auf! | ||
+ | |||
+ | Daher sqoop2 neu in der VM aufgesetzt wie in [[nosql: | ||
+ | |||
+ | Gleicher Fehler, das deutet auf die unterschiedlichen Libs von groovy hin, d.h Umgebung Einstellungen sind nicht richtig? | ||
+ | |||
+ | Erneuter Start, diesmal aber in einem sauberen Enviroment: | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | env -i CLIENT_LIB="/ | ||
+ | |||
+ | Sqoop home directory: / | ||
+ | Sqoop Shell: Type ' | ||
+ | |||
+ | sqoop: | ||
+ | |||
+ | # Classpath im Verdacht? | ||
+ | |||
+ | unset CLASSPATH | ||
+ | export CLIENT_LIB=/ | ||
+ | / | ||
+ | |||
+ | Sqoop home directory: / | ||
+ | Sqoop Shell: Type ' | ||
+ | |||
+ | sqoop: | ||
+ | |||
+ | </ | ||
+ | |||
+ | Funktioniert! | ||
+ | |||
+ | **Lösung: | ||
+ | In der sqoop.sh hardcodiert mit "unset CLASSPATH" | ||
+ | |||
+ | **Ursache: | ||
+ | Die verschiedenen Versionen der groovy libararis in den verschienden Lib Verzeichnissen der ganzen Tools passen nicht zusammen! | ||
+ | |||
+ | Damit läßt sich nun der Client aufrufen. | ||
+ | |||
+ | Sqoop Server Starten: | ||
+ | |||
+ | <code bash> | ||
+ | sudo / | ||
+ | |||
+ | # testen ob der Server auch läuft: | ||
+ | |||
+ | curl http:// | ||
+ | |||
+ | </ | ||
+ | |||
+ | Am Server anmelden: | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | sqoop2 | ||
+ | |||
+ | |||
+ | sqoop: | ||
+ | |||
+ | # Den Connect prüfen | ||
+ | |||
+ | sqoop: | ||
+ | |||
+ | client version: | ||
+ | Sqoop 1.99.3-cdh5.0.0 revision Unknown | ||
+ | Compiled by jenkins on Thu Mar 27 23:34:52 PDT 2014 | ||
+ | server version: | ||
+ | Sqoop 1.99.3-cdh5.0.0 revision Unknown | ||
+ | Compiled by jenkins on Thu Mar 27 23:34:52 PDT 2014 | ||
+ | Protocol version: | ||
+ | [1] | ||
+ | |||
+ | </ | ||
+ | |||
+ | Home Directory für den sqoop2 User anlegen: | ||
+ | <code bash> | ||
+ | sudo -u hdfs hdfs dfs -mkdir / | ||
+ | sudo -u hdfs hdfs dfs -chown sqoop2: | ||
+ | </ | ||
+ | |||
+ | |||
+ | Nachdem nun die Umgebung steht kann der erste Job eingerichtet werden, gleichen Job angelegt wie unter [[nosql: | ||
+ | |||
+ | |||
+ | === Ergebnis: sqoop ist nicht "SQL Injektion" | ||
+ | |||
+ | In den zu importierenden Daten befand sich ein Name mit folgenden Wert: **' | ||
+ | |||
+ | Sqoop erzeugt daraus das folgende SQL: | ||
+ | < | ||
+ | .GenericJdbcImportExtractor: | ||
+ | < | ||
+ | |||
+ | Und diese SQL ist dann natürlich ungültig! | ||
+ | <code > | ||
+ | |||
+ | org.apache.sqoop.common.SqoopException: | ||
+ | .. | ||
+ | caused by: org.apache.sqoop.common.SqoopException: | ||
+ | .. | ||
+ | Caused by: java.sql.SQLSyntaxErrorException: | ||
+ | ... | ||
+ | </ | ||
+ | |||
+ | Das heißt das sqoop per Default recht anfällig für SQL Injektion Attacken ist! | ||
+ | |||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ==== Diese AUD$ Daten in der DB wieder als external Table zur Verfügung stellen ===== | ||
+ | |||
+ | ===Eine Hive Tabelle aus den zuvor erzeugen Daten (typ External) erstellen=== | ||
+ | |||
+ | Dabei ergeben sich mit den in Oracle durchaus typischen Spaltennamen wie, **PROCESS# | ||
+ | |||
+ | < | ||
+ | .. | ||
+ | FAILED: ParseException line 3:10 character '#' | ||
+ | .. | ||
+ | FAILED: ParseException line 2:6 cannot recognize input near ' | ||
+ | |||
+ | </ | ||
+ | |||
+ | .-( | ||
+ | |||
+ | |||
+ | Hive Tabelle ohne " | ||
+ | <code sql> | ||
+ | CREATE external TABLE ext_ora_audit_gpi( | ||
+ | SESSIONID Decimal | ||
+ | ,ENTRYID Decimal | ||
+ | ,STATEMENT Decimal | ||
+ | ,TIMESTAMP Date | ||
+ | ,USERID STRING | ||
+ | ,USERHOST STRING | ||
+ | ,TERMINAL STRING | ||
+ | ,ACTION Decimal | ||
+ | ,RETURNCODE Decimal | ||
+ | , | ||
+ | ,OBJ_NAME STRING | ||
+ | , | ||
+ | , | ||
+ | ,NEW_OWNER STRING | ||
+ | ,NEW_NAME STRING | ||
+ | , | ||
+ | ,SES_TID Decimal | ||
+ | , | ||
+ | , | ||
+ | , | ||
+ | , | ||
+ | , | ||
+ | , | ||
+ | ,CLIENTID STRING | ||
+ | ,SPARE1 STRING | ||
+ | ,SPARE2 Decimal | ||
+ | ,OBJ_LABEL Binary | ||
+ | ,SES_LABEL Binary | ||
+ | ,PRIV_USED Decimal | ||
+ | ,SESSIONCPU Decimal | ||
+ | ,NTIMESTAMP Date | ||
+ | ,PROXY_SID Decimal | ||
+ | ,USER_GUID STRING | ||
+ | ,INSTANCE Decimal | ||
+ | ,PROCESS STRING | ||
+ | ,XID Binary | ||
+ | ,AUDITID STRING | ||
+ | ,SCN Decimal | ||
+ | ,DBID Decimal | ||
+ | ,SQLBIND FLOAT | ||
+ | ,SQLSTRING FLOAT | ||
+ | , | ||
+ | ) | ||
+ | ROW FORMAT | ||
+ | DELIMITED FIELDS TERMINATED BY ',' | ||
+ | LINES TERMINATED BY ' | ||
+ | STORED AS TEXTFILE | ||
+ | location '/ | ||
+ | </ | ||
+ | |||
+ | Tipp:: Für das Anlegen der Tabelle den Output aus sqoop Job verwendet und Datentypen TEXT auf String, FloatingPoint auf Float geändert. | ||
+ | |||
+ | Anlegen mit: | ||
+ | <code bash> | ||
+ | hive -f create_aud_gpi_hive_tab.sql | ||
+ | </ | ||
+ | |||
+ | |||
+ | Test: | ||
+ | <code sql> | ||
+ | |||
+ | hive> | ||
+ | |||
+ | 2 ' | ||
+ | 2 ' | ||
+ | 124 ' | ||
+ | 76 ' | ||
+ | |||
+ | -- damit ok! | ||
+ | |||
+ | hive> show tables; | ||
+ | |||
+ | ext_ora_audit_gpi | ||
+ | |||
+ | hive> show databases; | ||
+ | |||
+ | default | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | === Inmportierte Daten wiederum in der Datenbank mit dem Oracle SQL Connector for Hadoop OSCH | ||
+ | |||
+ | |||
+ | ==User und Directory in der Datenbank vorbereiten: | ||
+ | |||
+ | <code sql> | ||
+ | mkdir / | ||
+ | |||
+ | |||
+ | sqlplus / as sysdba | ||
+ | SSY> | ||
+ | SYS> | ||
+ | |||
+ | SYS> | ||
+ | SYS> | ||
+ | |||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | ==Erster Test mit CSV Externer Tabelle :== | ||
+ | <code bash> | ||
+ | [oracle@bigdatalite ~]$ hadoop jar $OSCH_HOME/ | ||
+ | oracle.hadoop.exttab.ExternalTable | ||
+ | -D oracle.hadoop.exttab.hive.tableName=ext_ora_audit_gpi | ||
+ | -D oracle.hadoop.exttab.hive.databaseName=default | ||
+ | -D oracle.hadoop.exttab.tableName=ext_ora_audit_gpi_tab | ||
+ | -D oracle.hadoop.connection.url=jdbc: | ||
+ | -D oracle.hadoop.connection.user=scott | ||
+ | -D oracle.hadoop.exttab.defaultDirectory=HADOOP_EXT_TABS | ||
+ | -D oracle.hadoop.exttab.dataPaths=hdfs:/ | ||
+ | -D oracle.hadoop.exttab.columnCount=42 | ||
+ | -createTable | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Oracle SQL Connector for HDFS Release 3.0.0 - Production | ||
+ | |||
+ | Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. | ||
+ | |||
+ | [Enter Database Password:] | ||
+ | Aug 30, 2014 6:22:22 PM oracle.hadoop.exttab.Dataset addFileConditionally | ||
+ | WARNING: hdfs:// | ||
+ | Aug 30, 2014 6:22:22 PM oracle.hadoop.exttab.Dataset addFileConditionally | ||
+ | WARNING: hdfs:// | ||
+ | Aug 30, 2014 6:22:22 PM oracle.hadoop.exttab.Dataset addFileConditionally | ||
+ | WARNING: hdfs:// | ||
+ | The create table command succeeded. | ||
+ | |||
+ | CREATE TABLE " | ||
+ | ( | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ) | ||
+ | ORGANIZATION EXTERNAL | ||
+ | ( | ||
+ | TYPE ORACLE_LOADER | ||
+ | | ||
+ | | ||
+ | ( | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | ( | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ) | ||
+ | ) | ||
+ | | ||
+ | ( | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ) | ||
+ | ) PARALLEL REJECT LIMIT UNLIMITED; | ||
+ | |||
+ | The following location files were created. | ||
+ | |||
+ | osch-20140830062222-4193-1 contains 1 URI, 11723955 bytes | ||
+ | |||
+ | 11723955 hdfs:// | ||
+ | |||
+ | osch-20140830062222-4193-2 contains 5 URIs, 431745 bytes | ||
+ | |||
+ | 4293 hdfs:// | ||
+ | 317306 hdfs:// | ||
+ | 682 hdfs:// | ||
+ | 4229 hdfs:// | ||
+ | 105235 hdfs:// | ||
+ | |||
+ | osch-20140830062222-4193-3 contains 1 URI, 820675 bytes | ||
+ | |||
+ | 820675 hdfs:// | ||
+ | |||
+ | osch-20140830062222-4193-4 contains 1 URI, 8075063 bytes | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | == erster Test mit einer Hive Tabelle == | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | |||
+ | hadoop jar $OSCH_HOME/ | ||
+ | oracle.hadoop.exttab.ExternalTable | ||
+ | -D oracle.hadoop.exttab.hive.tableName=ext_ora_audit_gpi | ||
+ | -D oracle.hadoop.exttab.hive.databaseName=default | ||
+ | -D oracle.hadoop.exttab.sourceType=hive | ||
+ | -D oracle.hadoop.exttab.tableName=ext_ora_audit_gpi_hive | ||
+ | -D oracle.hadoop.connection.url=jdbc: | ||
+ | -D oracle.hadoop.connection.user=scott | ||
+ | -D oracle.hadoop.exttab.defaultDirectory=HADOOP_EXT_TABS | ||
+ | -createTable | ||
+ | |||
+ | Oracle SQL Connector for HDFS Release 3.0.0 - Production | ||
+ | |||
+ | Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. | ||
+ | |||
+ | [Enter Database Password:] | ||
+ | Exception in thread " | ||
+ | at oracle.hadoop.exttab.ExternalTable.createSource(ExternalTable.java: | ||
+ | at oracle.hadoop.exttab.ExternalTable.doCreateTable(ExternalTable.java: | ||
+ | at oracle.hadoop.exttab.ExternalTable.run(ExternalTable.java: | ||
+ | at oracle.hadoop.exttab.ExternalTable.run(ExternalTable.java: | ||
+ | at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java: | ||
+ | at oracle.hadoop.exttab.ExternalTable.main(ExternalTable.java: | ||
+ | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) | ||
+ | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: | ||
+ | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: | ||
+ | at java.lang.reflect.Method.invoke(Method.java: | ||
+ | at org.apache.hadoop.util.RunJar.main(RunJar.java: | ||
+ | Caused by: java.lang.ClassNotFoundException: | ||
+ | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
+ | at java.net.URLClassLoader$1.run(URLClassLoader.java: | ||
+ | at java.security.AccessController.doPrivileged(Native Method) | ||
+ | at java.net.URLClassLoader.findClass(URLClassLoader.java: | ||
+ | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
+ | at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: | ||
+ | at java.lang.ClassLoader.loadClass(ClassLoader.java: | ||
+ | ... 11 more | ||
+ | |||
+ | |||
+ | # Check Hadoop Classpath | ||
+ | |||
+ | hadoop classpath | ||
+ | |||
+ | / | ||
+ | |||
+ | |||
+ | Wir brauchen die HIVE Jars wie unter / | ||
+ | |||
+ | |||
+ | echo $CLASSPATH | ||
+ | :/ | ||
+ | |||
+ | |||
+ | export | ||
+ | # hat nicht funktioniert!! | ||
+ | |||
+ | classpath im "/ | ||
+ | |||
+ | #nächster Fehler! | ||
+ | |||
+ | oracle.hadoop.exttab.ExternalTableException: | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | So sollte das eigentlich nicht ausschauen ...... | ||
+ | |||
+ | Hive Klassen fehlen im CLASSPATH, leider klappt das Hinzufügen in der Cloudera Distribution nur über das anpassen des Hadoop "/ | ||
+ | |||
+ | |||
+ | ==oracle.hadoop.exttab.ExternalTableException: | ||
+ | |||
+ | Dieser Datentyp wird wohl nicht unterstützt! | ||
+ | |||
+ | Tabelle angepasst und nun neu aufgebaut: | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | [oracle@bigdatalite ~]$ hadoop jar $OSCH_HOME/ | ||
+ | oracle.hadoop.exttab.ExternalTable | ||
+ | -D oracle.hadoop.exttab.hive.tableName=ext_ora_audit_gpi | ||
+ | -D oracle.hadoop.exttab.hive.databaseName=default | ||
+ | -D oracle.hadoop.exttab.sourceType=hive | ||
+ | -D oracle.hadoop.exttab.tableName=ext_ora_audit_gpi_hive | ||
+ | -D oracle.hadoop.connection.url=jdbc: | ||
+ | -D oracle.hadoop.connection.user=scott | ||
+ | -D oracle.hadoop.exttab.defaultDirectory=HADOOP_EXT_TABS | ||
+ | -createTable | ||
+ | Oracle SQL Connector for HDFS Release 3.0.0 - Production | ||
+ | |||
+ | Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. | ||
+ | |||
+ | [Enter Database Password:] | ||
+ | ... | ||
+ | |||
+ | 14/08/30 19:28:46 INFO hive.metastore: | ||
+ | 14/08/30 19:28:46 INFO hive.metastore: | ||
+ | Aug 30, 2014 7:28:47 PM oracle.hadoop.exttab.Dataset addFileConditionally | ||
+ | WARNING: hdfs:// | ||
+ | Aug 30, 2014 7:28:47 PM oracle.hadoop.exttab.Dataset addFileConditionally | ||
+ | WARNING: hdfs:// | ||
+ | Aug 30, 2014 7:28:47 PM oracle.hadoop.exttab.Dataset addFileConditionally | ||
+ | WARNING: hdfs:// | ||
+ | The create table command succeeded. | ||
+ | |||
+ | CREATE TABLE " | ||
+ | ( | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ) | ||
+ | ORGANIZATION EXTERNAL | ||
+ | ( | ||
+ | TYPE ORACLE_LOADER | ||
+ | | ||
+ | | ||
+ | ( | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | ( | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ) | ||
+ | ) | ||
+ | | ||
+ | ( | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ) | ||
+ | ) PARALLEL REJECT LIMIT UNLIMITED; | ||
+ | |||
+ | The following location files were created. | ||
+ | |||
+ | osch-20140830072847-3775-1 contains 1 URI, 11723955 bytes | ||
+ | |||
+ | 11723955 hdfs:// | ||
+ | |||
+ | osch-20140830072847-3775-2 contains 5 URIs, 431745 bytes | ||
+ | |||
+ | 4293 hdfs:// | ||
+ | 317306 hdfs:// | ||
+ | 682 hdfs:// | ||
+ | 4229 hdfs:// | ||
+ | 105235 hdfs:// | ||
+ | |||
+ | osch-20140830072847-3775-3 contains 1 URI, 820675 bytes | ||
+ | |||
+ | 820675 hdfs:// | ||
+ | |||
+ | osch-20140830072847-3775-4 contains 1 URI, 8075063 bytes | ||
+ | |||
+ | | ||
+ | |||
+ | </ | ||
+ | |||
+ | Test mit: | ||
+ | <code sql> | ||
+ | |||
+ | SCOTT> | ||
+ | |||
+ | |||
+ | no rows selected | ||
+ | |||
+ | ... hmmm | ||
+ | |||
+ | # logfiles kontrolliert | ||
+ | |||
+ | EXT_ORA_AUDIT_GPI_HIVE_5396.log: | ||
+ | |||
+ | error processing column TIMESTAMP in row 1 for datafile / | ||
+ | ORA-01841: (full) year must be between -4713 and +9999, and not be 0 | ||
+ | |||
+ | Hier passt das Datum nicht und wird nicht richtig geparst! | ||
+ | |||
+ | |||
+ | #Ursache: | ||
+ | |||
+ | Durch den Import per scoop nach dem HDFS und das lesen der Daten aus einer external Hive Table passt die Konvertierung von Null Values nicht mehr! | ||
+ | |||
+ | => " | ||
+ | |||
+ | Der null Values ist aber der wert ' | ||
+ | |||
+ | Richtig ist daher: | ||
+ | |||
+ | => " | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ==== Mit dem Oracle Loader for Hadoop – OLH vom Hadoop Daten einlesen ==== | ||
+ | |||
+ | |||
+ | Ziel ist es Daten aus dem Hadoop Cluster aus einer Hive Table über das Oracle DataPump External Fileformat in die Oracle Datenbank zu importieren. | ||
+ | |||
+ | Ablauf: | ||
+ | |||
+ | * Tabelle in Hive mit den Test Daten anlegen | ||
+ | * Tabelle in der Oracle RDBMS anlegen | ||
+ | * Metadaten Definition der Tabelle mit dem OLH Tools erzeugen | ||
+ | * Job Daten für den MapReduce Job für eine Hive Table erzeugen | ||
+ | * Job aufrufen | ||
+ | |||
+ | |||
+ | === Export Tabelle im Hadoop Hive anlegen | ||
+ | |||
+ | <code sql> | ||
+ | # test daten erzeugen und auf das HDFS kopieren | ||
+ | for ((i=1; | ||
+ | do | ||
+ | echo ${i}, | ||
+ | done | ||
+ | |||
+ | hdfs dfs -put hdfs_export.csv / | ||
+ | |||
+ | #hive table anlegen und test Daten dort einlesen | ||
+ | hive | ||
+ | |||
+ | hive> | ||
+ | , wert1 string | ||
+ | , wert2 string) | ||
+ | COMMENT ' | ||
+ | ROW FORMAT | ||
+ | DELIMITED FIELDS TERMINATED BY ',' | ||
+ | LINES TERMINATED BY ' | ||
+ | ; | ||
+ | |||
+ | hive> describe hdfs_export; | ||
+ | |||
+ | hive> load data inpath '/ | ||
+ | |||
+ | hive> select count(*) from hdfs_export; | ||
+ | |||
+ | ... | ||
+ | |||
+ | 10000 | ||
+ | |||
+ | </ | ||
+ | |||
+ | === Import Tabelle im Oracle RDBMS anlegen === | ||
+ | |||
+ | <code sql> | ||
+ | |||
+ | sqlplus scott/tiger | ||
+ | |||
+ | create table hdfs_imports (id number, wert1 varchar2(100), | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | === Metadaten Definition der Tabelle mit dem OLH Tools erzeugen === | ||
+ | |||
+ | |||
+ | <code bash> | ||
+ | #Check classpath | ||
+ | # we need all jar under the path to the oraloader.jar | ||
+ | # In this VM Enviroment / | ||
+ | echo $CLASSPATH | ||
+ | / | ||
+ | |||
+ | java oracle.hadoop.loader.metadata.OraLoaderMetadata | ||
+ | -user scott \ | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | [Enter Database Password:] | ||
+ | |||
+ | # The metadata files will be generated | ||
+ | |||
+ | cat hdfs_imports_metadata.xml | ||
+ | |||
+ | <?xml version=" | ||
+ | <!-- | ||
+ | Oracle Loader for Hadoop Release 3.0.0 - Production | ||
+ | |||
+ | Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. | ||
+ | |||
+ | --> | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | ===Job Daten für den MapReduce Job für eine Hive Table erzeugen=== | ||
+ | |||
+ | |||
+ | Mit einer Job Steuerdatei nach dem Standard Hadoop Methoden eine MapReduce zu erstellen, wird der Job definiert. | ||
+ | |||
+ | <code xml> | ||
+ | <?xml version=" | ||
+ | < | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | <!-- Table information --> | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | |||
+ | <!-- Connection information | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | In der Dokumentation werden in der Job Konfiguration einige " | ||
+ | <code bash> | ||
+ | 4/09/20 19:22:53 INFO Configuration.deprecation: | ||
+ | </ | ||
+ | |||
+ | Falls diese Meldungen stört, einfach die " | ||
+ | |||
+ | |||
+ | ===Job aufrufen=== | ||
+ | |||
+ | |||
+ | <code bash> | ||
+ | hadoop jar / | ||
+ | </ | ||
+ | |||
+ | Über die Hadoop Job Oberfläche prüfen ob alles klappt. | ||
+ | |||
+ | Erster und folgende weitere Fehler, Klassen fehlen im Pfad: | ||
+ | < | ||
+ | Error: java.lang.ClassNotFoundException: | ||
+ | Error: java.lang.ClassNotFoundException: | ||
+ | Error: java.lang.ClassNotFoundException: | ||
+ | Error: java.lang.ClassNotFoundException: | ||
+ | </ | ||
+ | |||
+ | Nach ein Paar Testläufen ergibt sich als Lösung diese Klassen mit **-libjars** beim Aufruf übergeben: | ||
+ | <code bash> | ||
+ | hadoop jar / | ||
+ | -conf hdfs_job_config.xml | ||
+ | -libjars / | ||
+ | </ | ||
+ | |||
+ | Mit der obigen Konfiguration finden sich die Dateien im HDFS nun hier: | ||
+ | |||
+ | <code bash> | ||
+ | hdfs dfs -ls / | ||
+ | |||
+ | Found 4 items | ||
+ | -rw-r--r-- | ||
+ | drwxr-xr-x | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | |||
+ | </ | ||
+ | |||
+ | Diese Datein können nun als Datenquelle für eine external Table über das normale Filesystem auf einen anderen Rechner dienen: | ||
+ | |||
+ | <code sql> | ||
+ | |||
+ | # copy the files to the local Filesystem | ||
+ | # The used Oracle directory object HADOOP_EXT_TABS points to this location! | ||
+ | |||
+ | hdfs dfs -get / | ||
+ | hdfs dfs -get / | ||
+ | |||
+ | sqlplus scott/tiger | ||
+ | |||
+ | create table hdfs_import_local (id number | ||
+ | , wert1 varchar2(100) | ||
+ | , wert2 varchar2(100) | ||
+ | ) | ||
+ | | ||
+ | TYPE ORACLE_DATAPUMP | ||
+ | | ||
+ | | ||
+ | ) | ||
+ | / | ||
+ | |||
+ | select count(*) from hdfs_import_local; | ||
+ | |||
+ | COUNT(*) | ||
+ | ------------ | ||
+ | 10000 | ||
+ | </ | ||
+ | |||
+ | tip: | ||
+ | Ein SQL Script als Vorlage findet sich unter / | ||
+ | <code bash> | ||
+ | |||
+ | dfs -cat / | ||
+ | |||
+ | |||
+ | --Oracle Loader for Hadoop Release 3.0.0 - Production | ||
+ | -- | ||
+ | --Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved. | ||
+ | -- | ||
+ | --Generated by DataPumpOutputFormat | ||
+ | -- | ||
+ | --CREATE OR REPLACE DIRECTORY OLH_EXTTAB_DIR AS ' | ||
+ | --GRANT READ, WRITE ON DIRECTORY OLH_EXTTAB_DIR TO SCOTT; | ||
+ | -- | ||
+ | --ALTER SESSION ENABLE PARALLEL DML; | ||
+ | --INSERT /*+ APPEND */ INTO " | ||
+ | -- | ||
+ | CREATE TABLE " | ||
+ | ( | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ) | ||
+ | ORGANIZATION EXTERNAL | ||
+ | (TYPE ORACLE_DATAPUMP | ||
+ | | ||
+ | | ||
+ | ( | ||
+ | ' | ||
+ | ' | ||
+ | ) | ||
+ | ); | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | Oracle Dokumentation Release 3 => http:// | ||
+ | |||
+ | |||
+ | ==== Quellen ==== | ||
+ | |||
+ | * http:// | ||
+ | |||
+ | External Table: | ||
+ | |||
+ | * https:// | ||
+ | * https:// | ||
+ | * https:// | ||
+ | |||
+ | |||
+ | |||
+ | Spark: | ||
+ | * http:// | ||
nosql/oracle_hadoop_big_data_appl_erste_schritte.txt · Zuletzt geändert: 2014/09/20 21:28 von gpipperr