Benutzer-Werkzeuge

Webseiten-Werkzeuge


nosql:oracle_hadoop_big_data_appl_erste_schritte

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen Revision Vorhergehende Überarbeitung
nosql:oracle_hadoop_big_data_appl_erste_schritte [2014/09/20 21:23]
gpipperr [BigDataLite VM von Oracle]
nosql:oracle_hadoop_big_data_appl_erste_schritte [2014/09/20 21:28] (aktuell)
gpipperr [Mit dem Oracle Loader for Hadoop – OLH vom Hadoop Daten einlesen]
Zeile 1: Zeile 1:
 +===== Erste Schritte mit der Oracle Big Data Appliance =====
 +
 +==== BigDataLite VM von Oracle ==== 
 +
 +Eine VM von Oracle zu dem Thema BigData kann hier gefunden werden: \\
 +=> http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
 +
 +
 +Um die VM in VMWare zu verwenden:\\
 +=>  [[nosql:oracle_bigdatalite_vm_vmware|Die Oracle BigDataLite 3.0 VM in Vmware importieren]]
 +
 +
 +Soll remote auf das HDFS zugegriffen werden, müssen alle Services/Dienste auf dem Netzwerkinterface und nicht auf localhost starten.
 +
 +Dazu die /etc/hosts anpassen, bei Localhost den Namen der Maschine entfernen und einen eigenen Eintrag hinzufügen:
 +<code bash>
 +127.0.0.1    localhost localhost.localdomain localhost4 localhost4.localdomain4
 +10.10.10.12  bigdatalite bigdatalite.localdomain
 +</code>
 +
 +Und die Firewall ausschalten (natürlich nur in einer Testumgebung .-) )!
 +<code bash>
 +iptables -L
 +service iptables off
 +chkconfig iptables off
 +
 +chkconfig --list iptables
 +
 +</code>
 +
 +----
 +
 +==== Daten aus Oracle mit sqoop 1.4 in das HDFS exportieren ====
 +
 +Ziel - die Aud$ Tabelle soll in das HDFS geladen werden
 +
 +
 +Datenbank Connect testen:
 +<code bash>
 +
 +sqoop list-databases --connect jdbc:oracle:thin:@//10.10.10.12:1521/orcl --username system --password welcome1
 +
 +</code>
 +
 +
 +Tabellen eines Users  auflisten:
 +<code bash>
 +
 +sqoop list-tables --connect jdbc:oracle:thin:@//10.10.10.12:1521/orcl --username "sys as sysdba" --password welcome1
 +
 +</code>
 +
 +
 +Ein einfachen Import :
 +
 +<code bash>
 +#Verzeichnis für de Daten  im HDFS anlegen
 +
 +hdfs dfs -mkdir  /tmp/auditVault
 +
 +
 +# Einfachen Import durchführen
 +
 +sqoop import                                        \
 +--connect jdbc:oracle:thin:@//10.10.10.12:1521/orcl \
 +--username "sys as sysdba"                          \
 +--password welcome1                                 \
 +--table "AUD\$"                                     \
 +-m 1                                                \
 +--target-dir /tmp/auditVault/aud_orcl
 +
 +</code>
 +
 +
 +Die Daten liegen nun im HDFS vor:
 +
 +{{:nosql:oraclebigdatavm_v01.png?500|Aud$ im HDFS }}
 +
 +
 +
 +----
 +
 +==== Daten aus Oracle mit sqoop 2 - 1.9 in das HDFS exportieren ====
 +
 +Für erste Schritt mit sqoop siehe hier => [[nosql:sqoop_erste_schritte|Erste Schritte mit Apache Sqoop 2 - v1.99]]
 +
 +
 +Cloudera ( die Basis der Oracle VM) verteilt leider die Werkzeuge sehr stark im Filesystem mit dem Nachteil das erst mühsam die eigentlichen Default Verzeichnisse und Pfade gesucht werden müssen. Zwar läßt sich mit sqoop2 "etwas" öffnen, zu Beginn allerdings immer nur mit Fehlern.
 +
 +Daher zuvor die Umgebung einrichten und manuell gestartet:
 +
 +<code bash>
 +export SQOOP_HOME=/usr/lib/sqoop2
 +
 +cd $SQOOP_HOME/bin
 +
 +./sqoop.sh client
 +
 +# erster Fehler
 +Sqoop home directory: /usr/lib/sqoop2
 +Error: Could not find or load main class org.apache.sqoop.shell.SqoopShell
 +
 +# Umgebung anpassen
 +export CLIENT_LIB=/usr/lib/sqoop2/client-lib
 +export JAVA_OPTS="$JAVA_OPTS -Dsqoop.config.dir=/etc/sqoop2/conf"
 +
 +#nächster Fehler
 +qoop home directory: /usr/lib/sqoop2
 +Sqoop Shell: Type 'help' or '\h' for help.
 +
 +Exception in thread "main" groovy.lang.MissingPropertyException: No such property: terminal for class: jline.Terminal
 +
 +</code>
 +
 +Trotz intensiver Fehlersuche ist es mir auf Anhieb nicht gelungen, die Ursache für den ***"Exception in thread "main" groovy.lang.MissingPropertyException: No such property: terminal for class: jline.Terminal"** zu finden.
 +
 +In einer vergleichbaren Cloudera Standard VM tritt der Fehler nicht auf!
 +
 +Daher sqoop2 neu in der VM aufgesetzt wie in [[nosql:sqoop_erste_schritte|Erste Schritte mit Apache Sqoop 2 - v1.99]] beschrieben.
 +
 +Gleicher Fehler, das deutet auf die unterschiedlichen Libs von groovy hin, d.h Umgebung Einstellungen sind nicht richtig?
 +
 +Erneuter Start, diesmal aber in einem sauberen Enviroment:
 +
 +<code bash>
 +
 +env -i CLIENT_LIB="/usr/lib/sqoop2/client-lib" /usr/lib/sqoop2/bin/sqoop.sh client
 +
 +Sqoop home directory: /usr/lib/sqoop2
 +Sqoop Shell: Type 'help' or '\h' for help.
 +
 +sqoop:000>
 +
 +# Classpath im Verdacht?
 +
 +unset CLASSPATH
 +export CLIENT_LIB=/usr/lib/sqoop2/client-lib
 +/usr/lib/sqoop2/bin/sqoop.sh client
 +
 +Sqoop home directory: /usr/lib/sqoop2
 +Sqoop Shell: Type 'help' or '\h' for help.
 +
 +sqoop:000>
 +
 +</code>
 +
 +Funktioniert!
 +
 +**Lösung:**\\
 +In der sqoop.sh hardcodiert mit "unset CLASSPATH" den Klassenpfad geleert und mit  "export CLIENT_LIB=/usr/lib/sqoop2/client-lib" die richtigen Libs für sqoop eingebunden!
 +
 +**Ursache:** \\
 +Die verschiedenen Versionen der groovy libararis in den verschienden Lib Verzeichnissen der ganzen Tools passen nicht zusammen!
 +
 +Damit läßt sich nun der Client aufrufen.
 +
 +Sqoop Server Starten:
 +
 +<code bash>
 +sudo /sbin/service sqoop2-server start
 +
 +# testen ob der Server auch läuft:
 +
 +curl http://localhost:12000/sqoop/version
 +
 +</code>
 +
 +Am Server anmelden:
 +
 +<code bash>
 +
 +sqoop2
 +
 +
 +sqoop:000> set server --host localhost
 +
 +# Den Connect prüfen 
 +
 +sqoop:000> show version --all
 +
 +client version:
 +  Sqoop 1.99.3-cdh5.0.0 revision Unknown
 +  Compiled by jenkins on Thu Mar 27 23:34:52 PDT 2014
 +server version:
 +  Sqoop 1.99.3-cdh5.0.0 revision Unknown
 +  Compiled by jenkins on Thu Mar 27 23:34:52 PDT 2014
 +Protocol version:
 +  [1]
 +
 +</code>
 +
 +Home Directory für den sqoop2 User anlegen:
 +<code bash>
 +sudo -u hdfs hdfs dfs -mkdir /user/sqoop2
 +sudo -u hdfs hdfs dfs -chown sqoop2:sqoop2  /user/sqoop2
 +</code>
 +
 +
 +Nachdem nun die Umgebung steht kann der erste Job eingerichtet werden, gleichen Job angelegt wie unter [[nosql:sqoop_erste_schritte|Erste Schritte mit Apache Sqoop 2 - v1.99]].
 +
 +
 +=== Ergebnis: sqoop ist nicht "SQL Injektion" save ====
 +
 +In den zu importierenden Daten befand sich ein Name mit folgenden Wert: **'TESTER**, das Hochkomma am Anfang beachten!
 +
 +Sqoop erzeugt daraus das folgende SQL: 
 +<code>
 +.GenericJdbcImportExtractor: Using query: SELECT * FROM SYSTEM.AUD$ WHERE ''TESTER' <= USERID AND USERID < '+����'
 +<code>
 +
 +Und diese SQL ist dann natürlich ungültig! 
 +<code >
 +
 +org.apache.sqoop.common.SqoopException: MAPRED_EXEC_0017:Error occurs during extractor run
 +..
 +caused by: org.apache.sqoop.common.SqoopException: GENERIC_JDBC_CONNECTOR_0002:Unable to execute the SQL statement
 +..
 +Caused by: java.sql.SQLSyntaxErrorException: ORA-00920: invalid relational operator
 +...
 +</code>
 +
 +Das heißt das sqoop per Default recht anfällig für SQL Injektion Attacken ist!
 +
 +
 +
 +----
 +
 +==== Diese AUD$ Daten in der DB wieder als external Table zur Verfügung stellen =====
 +
 +===Eine Hive Tabelle aus den zuvor erzeugen Daten (typ External) erstellen=== 
 +
 +Dabei ergeben sich mit den in Oracle durchaus typischen Spaltennamen wie, **PROCESS#** und **OBJ$NAME** gleich die ersten Probleme:
 +
 +<code>
 +..
 +FAILED: ParseException line 3:10 character '#' not supported here
 +..
 +FAILED: ParseException line 2:6 cannot recognize input near '$' 'NAME' 'Text' in column type
 +
 +</code>
 +
 +.-(
 +
 +
 +Hive Tabelle ohne "Sonderzeichen" angelegt (create_aud_gpi_hive_tab.sql):
 +<code sql>
 +CREATE external TABLE ext_ora_audit_gpi(
 +SESSIONID Decimal 
 +,ENTRYID Decimal 
 +,STATEMENT Decimal 
 +,TIMESTAMP Date 
 +,USERID STRING 
 +,USERHOST STRING 
 +,TERMINAL STRING 
 +,ACTION Decimal 
 +,RETURNCODE Decimal 
 +,OBJ_CREATOR STRING 
 +,OBJ_NAME STRING 
 +,AUTH_PRIVILEGES STRING 
 +,AUTH_GRANTEE STRING 
 +,NEW_OWNER STRING 
 +,NEW_NAME STRING 
 +,SES_ACTIONS STRING 
 +,SES_TID Decimal 
 +,LOGOFF_LREAD Decimal 
 +,LOGOFF_PREAD Decimal 
 +,LOGOFF_LWRITE Decimal 
 +,LOGOFF_DEAD Decimal 
 +,LOGOFF_TIME Date 
 +,COMMENT_STRING STRING 
 +,CLIENTID STRING 
 +,SPARE1 STRING 
 +,SPARE2 Decimal 
 +,OBJ_LABEL Binary 
 +,SES_LABEL Binary 
 +,PRIV_USED Decimal 
 +,SESSIONCPU Decimal 
 +,NTIMESTAMP Date 
 +,PROXY_SID Decimal 
 +,USER_GUID STRING 
 +,INSTANCE Decimal 
 +,PROCESS STRING 
 +,XID Binary 
 +,AUDITID STRING 
 +,SCN Decimal 
 +,DBID Decimal 
 +,SQLBIND FLOAT  
 +,SQLSTRING FLOAT  
 +,OBJ_EDITION STRING  
 +)
 +ROW FORMAT 
 +DELIMITED FIELDS TERMINATED BY ',' 
 +LINES TERMINATED BY '\n' 
 +STORED AS TEXTFILE 
 +location '/tmp/auditGPI';
 +</code>
 +
 +Tipp:: Für das Anlegen der Tabelle den Output aus sqoop Job verwendet und Datentypen TEXT auf String, FloatingPoint auf Float geändert.
 +
 +Anlegen mit:
 +<code bash>
 +hive -f create_aud_gpi_hive_tab.sql
 +</code>
 +
 +
 +Test:
 +<code sql>
 +
 +hive>select count(*),userid from ext_ora_audit_gpi group by userid;
 +
 +2       'ADF'
 +2       'ADMINUSER'
 +124     'ADMUSER'
 +76      'ADMUSER83'
 +
 +-- damit ok!
 +
 +hive> show tables;
 +
 +ext_ora_audit_gpi
 +
 +hive> show databases;
 +
 +default
 +
 +</code>
 +
 +
 +=== Inmportierte Daten wiederum in der Datenbank mit dem Oracle SQL Connector for Hadoop OSCH   einbinden ===
 +
 +
 +==User und Directory in der Datenbank vorbereiten:==
 +
 +<code sql>
 +mkdir /home/oracle/hadoop_ext_tab
 +
 +
 +sqlplus / as sysdba
 +SSY>create or replace directory HADOOP_EXT_TABS as '/home/oracle/hadoop_ext_tab';
 +SYS>grant read,write on directory HADOOP_EXT_TABS to scott;
 +
 +SYS>grant EXECUTE on directory OSCH_BIN_PATH to scott;
 +SYS>grant read,write on directory OSCH_BIN_PATH to scott;
 +
 +
 +
 +</code>
 +
 +==Erster Test mit CSV Externer Tabelle :==
 +<code bash>
 +[oracle@bigdatalite ~]$ hadoop jar $OSCH_HOME/jlib/orahdfs.jar               \
 +oracle.hadoop.exttab.ExternalTable                                           \
 +-D oracle.hadoop.exttab.hive.tableName=ext_ora_audit_gpi                     \
 +-D oracle.hadoop.exttab.hive.databaseName=default                            \
 +-D oracle.hadoop.exttab.tableName=ext_ora_audit_gpi_tab                      \
 +-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/orcl      \
 +-D oracle.hadoop.connection.user=scott                                       \
 +-D oracle.hadoop.exttab.defaultDirectory=HADOOP_EXT_TABS                     \
 +-D oracle.hadoop.exttab.dataPaths=hdfs:/tmp/auditGPI/                        \
 +-D oracle.hadoop.exttab.columnCount=42                                       \
 +-createTable
 +
 +
 +
 +
 +Oracle SQL Connector for HDFS Release 3.0.0 - Production
 +
 +Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
 +
 +[Enter Database Password:]
 +Aug 30, 2014 6:22:22 PM oracle.hadoop.exttab.Dataset addFileConditionally
 +WARNING: hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/_SUCCESS has 0 length. Not being included in dataset
 +Aug 30, 2014 6:22:22 PM oracle.hadoop.exttab.Dataset addFileConditionally
 +WARNING: hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00004 has 0 length. Not being included in dataset
 +Aug 30, 2014 6:22:22 PM oracle.hadoop.exttab.Dataset addFileConditionally
 +WARNING: hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00005 has 0 length. Not being included in dataset
 +The create table command succeeded.
 +
 +CREATE TABLE "SCOTT"."EXT_ORA_AUDIT_GPI_TAB"
 +(
 + "C1"                             VARCHAR2(4000),
 + "C2"                             VARCHAR2(4000),
 + "C3"                             VARCHAR2(4000),
 + "C4"                             VARCHAR2(4000),
 + "C5"                             VARCHAR2(4000),
 + "C6"                             VARCHAR2(4000),
 + "C7"                             VARCHAR2(4000),
 + "C8"                             VARCHAR2(4000),
 + "C9"                             VARCHAR2(4000),
 + "C10"                            VARCHAR2(4000),
 + "C11"                            VARCHAR2(4000),
 + "C12"                            VARCHAR2(4000),
 + "C13"                            VARCHAR2(4000),
 + "C14"                            VARCHAR2(4000),
 + "C15"                            VARCHAR2(4000),
 + "C16"                            VARCHAR2(4000),
 + "C17"                            VARCHAR2(4000),
 + "C18"                            VARCHAR2(4000),
 + "C19"                            VARCHAR2(4000),
 + "C20"                            VARCHAR2(4000),
 + "C21"                            VARCHAR2(4000),
 + "C22"                            VARCHAR2(4000),
 + "C23"                            VARCHAR2(4000),
 + "C24"                            VARCHAR2(4000),
 + "C25"                            VARCHAR2(4000),
 + "C26"                            VARCHAR2(4000),
 + "C27"                            VARCHAR2(4000),
 + "C28"                            VARCHAR2(4000),
 + "C29"                            VARCHAR2(4000),
 + "C30"                            VARCHAR2(4000),
 + "C31"                            VARCHAR2(4000),
 + "C32"                            VARCHAR2(4000),
 + "C33"                            VARCHAR2(4000),
 + "C34"                            VARCHAR2(4000),
 + "C35"                            VARCHAR2(4000),
 + "C36"                            VARCHAR2(4000),
 + "C37"                            VARCHAR2(4000),
 + "C38"                            VARCHAR2(4000),
 + "C39"                            VARCHAR2(4000),
 + "C40"                            VARCHAR2(4000),
 + "C41"                            VARCHAR2(4000),
 + "C42"                            VARCHAR2(4000)
 +)
 +ORGANIZATION EXTERNAL
 +(
 +   TYPE ORACLE_LOADER
 +   DEFAULT DIRECTORY "HADOOP_EXT_TABS"
 +   ACCESS PARAMETERS
 +   (
 +     RECORDS DELIMITED BY 0X'0A'
 +     CHARACTERSET AL32UTF8
 +     PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
 +     FIELDS TERMINATED BY 0X'2C'
 +     MISSING FIELD VALUES ARE NULL
 +     (
 +       "C1" CHAR(4000),
 +       "C2" CHAR(4000),
 +       "C3" CHAR(4000),
 +       "C4" CHAR(4000),
 +       "C5" CHAR(4000),
 +       "C6" CHAR(4000),
 +       "C7" CHAR(4000),
 +       "C8" CHAR(4000),
 +       "C9" CHAR(4000),
 +       "C10" CHAR(4000),
 +       "C11" CHAR(4000),
 +       "C12" CHAR(4000),
 +       "C13" CHAR(4000),
 +       "C14" CHAR(4000),
 +       "C15" CHAR(4000),
 +       "C16" CHAR(4000),
 +       "C17" CHAR(4000),
 +       "C18" CHAR(4000),
 +       "C19" CHAR(4000),
 +       "C20" CHAR(4000),
 +       "C21" CHAR(4000),
 +       "C22" CHAR(4000),
 +       "C23" CHAR(4000),
 +       "C24" CHAR(4000),
 +       "C25" CHAR(4000),
 +       "C26" CHAR(4000),
 +       "C27" CHAR(4000),
 +       "C28" CHAR(4000),
 +       "C29" CHAR(4000),
 +       "C30" CHAR(4000),
 +       "C31" CHAR(4000),
 +       "C32" CHAR(4000),
 +       "C33" CHAR(4000),
 +       "C34" CHAR(4000),
 +       "C35" CHAR(4000),
 +       "C36" CHAR(4000),
 +       "C37" CHAR(4000),
 +       "C38" CHAR(4000),
 +       "C39" CHAR(4000),
 +       "C40" CHAR(4000),
 +       "C41" CHAR(4000),
 +       "C42" CHAR(4000)
 +     )
 +   )
 +   LOCATION
 +   (
 +     'osch-20140830062222-4193-1',
 +     'osch-20140830062222-4193-2',
 +     'osch-20140830062222-4193-3',
 +     'osch-20140830062222-4193-4'
 +   )
 +) PARALLEL REJECT LIMIT UNLIMITED;
 +
 +The following location files were created.
 +
 +osch-20140830062222-4193-1 contains 1 URI, 11723955 bytes
 +
 +    11723955 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00003
 +
 +osch-20140830062222-4193-2 contains 5 URIs, 431745 bytes
 +
 +        4293 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00007
 +      317306 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00001
 +         682 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00002
 +        4229 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00008
 +      105235 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00000
 +
 +osch-20140830062222-4193-3 contains 1 URI, 820675 bytes
 +
 +      820675 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00006
 +
 +osch-20140830062222-4193-4 contains 1 URI, 8075063 bytes
 +
 +     8075063 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00009
 +
 +
 +
 +</code>
 +
 +== erster Test mit einer Hive Tabelle ==
 +
 +<code bash>
 +
 +
 +hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
 +oracle.hadoop.exttab.ExternalTable   \
 +-D oracle.hadoop.exttab.hive.tableName=ext_ora_audit_gpi                    \
 +-D oracle.hadoop.exttab.hive.databaseName=default                            \
 +-D oracle.hadoop.exttab.sourceType=hive                                      \
 +-D oracle.hadoop.exttab.tableName=ext_ora_audit_gpi_hive                      \
 +-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/orcl      \
 +-D oracle.hadoop.connection.user=scott                                       \
 +-D oracle.hadoop.exttab.defaultDirectory=HADOOP_EXT_TABS                     \
 +-createTable       
 +
 +Oracle SQL Connector for HDFS Release 3.0.0 - Production
 +
 +Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
 +
 +[Enter Database Password:]
 +Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/MetaException
 +        at oracle.hadoop.exttab.ExternalTable.createSource(ExternalTable.java:219)
 +        at oracle.hadoop.exttab.ExternalTable.doCreateTable(ExternalTable.java:2024)
 +        at oracle.hadoop.exttab.ExternalTable.run(ExternalTable.java:3031)
 +        at oracle.hadoop.exttab.ExternalTable.run(ExternalTable.java:3112)
 +        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 +        at oracle.hadoop.exttab.ExternalTable.main(ExternalTable.java:3225)
 +        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 +        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 +        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 +        at java.lang.reflect.Method.invoke(Method.java:606)
 +        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 +Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.api.MetaException
 +        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 +        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 +        at java.security.AccessController.doPrivileged(Native Method)
 +        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 +        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 +        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 +        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 +        ... 11 more
 +
 +
 +# Check Hadoop Classpath
 +
 +hadoop classpath
 +
 +/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/u01/connectors/olh/jlib/*:/etc/hive/conf:/u01/connectors/osch/jlib/*:/u01/app/oracle/product/12.1.0/dbhome_1/jdbc/lib/*:/u01/nosql/kv-ee/lib/kvstore.jar
 +
 +
 +Wir brauchen die HIVE Jars wie unter /user/lib/hive/lib/hive-metastore.jar auch!
 +
 +
 +echo $CLASSPATH
 +:/u01/connectors/olh/jlib/*:/usr/lib/hadoop/*:/usr/lib/hadoop/client/*:/u01/nosql/kv-ee/lib/kvstore.jar:.
 +
 +
 +export  CLASSPATH=$CLASSPATH:/usr/lib/hive/lib/*
 +# hat nicht funktioniert!!
 +
 +classpath im "/usr/lib/hadoop/bin/hadoop" angepasst!
 +
 +#nächster Fehler!
 +
 +oracle.hadoop.exttab.ExternalTableException: Unsupported Hive column type binary
 +
 +
 +</code>
 +
 +
 +So sollte das eigentlich nicht ausschauen ......
 +
 +Hive Klassen fehlen im CLASSPATH, leider klappt das Hinzufügen in der Cloudera Distribution nur über das anpassen des Hadoop "/usr/lib/hadoop/bin/hadoop" scripts.
 +
 +
 +==oracle.hadoop.exttab.ExternalTableException: Unsupported Hive column type binary==
 +
 +Dieser Datentyp wird wohl nicht unterstützt!
 +
 +Tabelle angepasst und nun neu aufgebaut:
 +
 +<code bash>
 +
 +[oracle@bigdatalite ~]$ hadoop jar $OSCH_HOME/jlib/orahdfs.jar               \
 +oracle.hadoop.exttab.ExternalTable                                           \
 +-D oracle.hadoop.exttab.hive.tableName=ext_ora_audit_gpi                     \
 +-D oracle.hadoop.exttab.hive.databaseName=default                            \
 +-D oracle.hadoop.exttab.sourceType=hive                                      \
 +-D oracle.hadoop.exttab.tableName=ext_ora_audit_gpi_hive                     \
 +-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//localhost:1521/orcl      \
 +-D oracle.hadoop.connection.user=scott                                       \
 +-D oracle.hadoop.exttab.defaultDirectory=HADOOP_EXT_TABS                     \
 +-createTable
 +Oracle SQL Connector for HDFS Release 3.0.0 - Production
 +
 +Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
 +
 +[Enter Database Password:]
 +...
 +
 +14/08/30 19:28:46 INFO hive.metastore: Trying to connect to metastore with URI thrift://bigdatalite.localdomain:9083
 +14/08/30 19:28:46 INFO hive.metastore: Connected to metastore.
 +Aug 30, 2014 7:28:47 PM oracle.hadoop.exttab.Dataset addFileConditionally
 +WARNING: hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/_SUCCESS has 0 length. Not being included in dataset
 +Aug 30, 2014 7:28:47 PM oracle.hadoop.exttab.Dataset addFileConditionally
 +WARNING: hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00004 has 0 length. Not being included in dataset
 +Aug 30, 2014 7:28:47 PM oracle.hadoop.exttab.Dataset addFileConditionally
 +WARNING: hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00005 has 0 length. Not being included in dataset
 +The create table command succeeded.
 +
 +CREATE TABLE "SCOTT"."EXT_ORA_AUDIT_GPI_HIVE"
 +(
 + "SESSIONID"                      NUMBER(10,0),
 + "ENTRYID"                        NUMBER(10,0),
 + "STATEMENT"                      NUMBER(10,0),
 + "TIMESTAMP"                      DATE,
 + "USERID"                         VARCHAR2(4000),
 + "USERHOST"                       VARCHAR2(4000),
 + "TERMINAL"                       VARCHAR2(4000),
 + "ACTION"                         NUMBER(10,0),
 + "RETURNCODE"                     NUMBER(10,0),
 + "OBJ_CREATOR"                    VARCHAR2(4000),
 + "OBJ_NAME"                       VARCHAR2(4000),
 + "AUTH_PRIVILEGES"                VARCHAR2(4000),
 + "AUTH_GRANTEE"                   VARCHAR2(4000),
 + "NEW_OWNER"                      VARCHAR2(4000),
 + "NEW_NAME"                       VARCHAR2(4000),
 + "SES_ACTIONS"                    VARCHAR2(4000),
 + "SES_TID"                        NUMBER(10,0),
 + "LOGOFF_LREAD"                   NUMBER(10,0),
 + "LOGOFF_PREAD"                   NUMBER(10,0),
 + "LOGOFF_LWRITE"                  NUMBER(10,0),
 + "LOGOFF_DEAD"                    NUMBER(10,0),
 + "LOGOFF_TIME"                    DATE,
 + "COMMENT_STRING"                 VARCHAR2(4000),
 + "CLIENTID"                       VARCHAR2(4000),
 + "SPARE1"                         VARCHAR2(4000),
 + "SPARE2"                         NUMBER(10,0),
 + "OBJ_LABEL"                      VARCHAR2(4000),
 + "SES_LABEL"                      VARCHAR2(4000),
 + "PRIV_USED"                      NUMBER(10,0),
 + "SESSIONCPU"                     NUMBER(10,0),
 + "NTIMESTAMP"                     DATE,
 + "PROXY_SID"                      NUMBER(10,0),
 + "USER_GUID"                      VARCHAR2(4000),
 + "INSTANCE"                       NUMBER(10,0),
 + "PROCESS"                        VARCHAR2(4000),
 + "XID"                            VARCHAR2(4000),
 + "AUDITID"                        VARCHAR2(4000),
 + "SCN"                            NUMBER(10,0),
 + "DBID"                           NUMBER(10,0),
 + "SQLBIND"                        NUMBER,
 + "SQLSTRING"                      NUMBER,
 + "OBJ_EDITION"                    VARCHAR2(4000)
 +)
 +ORGANIZATION EXTERNAL
 +(
 +   TYPE ORACLE_LOADER
 +   DEFAULT DIRECTORY "HADOOP_EXT_TABS"
 +   ACCESS PARAMETERS
 +   (
 +     RECORDS DELIMITED BY 0X'0A'
 +     CHARACTERSET AL32UTF8
 +     PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
 +     FIELDS TERMINATED BY 0X'2C'
 +     MISSING FIELD VALUES ARE NULL
 +     (
 +       "SESSIONID" CHAR NULLIF "SESSIONID"=0X'5C4E',
 +       "ENTRYID" CHAR NULLIF "ENTRYID"=0X'5C4E',
 +       "STATEMENT" CHAR NULLIF "STATEMENT"=0X'5C4E',
 +       "TIMESTAMP" CHAR DATE_FORMAT DATE MASK 'YYYY-MM-DD' NULLIF "TIMESTAMP"=0X'5C4E',
 +       "USERID" CHAR(4000) NULLIF "USERID"=0X'5C4E',
 +       "USERHOST" CHAR(4000) NULLIF "USERHOST"=0X'5C4E',
 +       "TERMINAL" CHAR(4000) NULLIF "TERMINAL"=0X'5C4E',
 +       "ACTION" CHAR NULLIF "ACTION"=0X'5C4E',
 +       "RETURNCODE" CHAR NULLIF "RETURNCODE"=0X'5C4E',
 +       "OBJ_CREATOR" CHAR(4000) NULLIF "OBJ_CREATOR"=0X'5C4E',
 +       "OBJ_NAME" CHAR(4000) NULLIF "OBJ_NAME"=0X'5C4E',
 +       "AUTH_PRIVILEGES" CHAR(4000) NULLIF "AUTH_PRIVILEGES"=0X'5C4E',
 +       "AUTH_GRANTEE" CHAR(4000) NULLIF "AUTH_GRANTEE"=0X'5C4E',
 +       "NEW_OWNER" CHAR(4000) NULLIF "NEW_OWNER"=0X'5C4E',
 +       "NEW_NAME" CHAR(4000) NULLIF "NEW_NAME"=0X'5C4E',
 +       "SES_ACTIONS" CHAR(4000) NULLIF "SES_ACTIONS"=0X'5C4E',
 +       "SES_TID" CHAR NULLIF "SES_TID"=0X'5C4E',
 +       "LOGOFF_LREAD" CHAR NULLIF "LOGOFF_LREAD"=0X'5C4E',
 +       "LOGOFF_PREAD" CHAR NULLIF "LOGOFF_PREAD"=0X'5C4E',
 +       "LOGOFF_LWRITE" CHAR NULLIF "LOGOFF_LWRITE"=0X'5C4E',
 +       "LOGOFF_DEAD" CHAR NULLIF "LOGOFF_DEAD"=0X'5C4E',
 +       "LOGOFF_TIME" CHAR DATE_FORMAT DATE MASK 'YYYY-MM-DD' NULLIF "LOGOFF_TIME"=0X'5C4E',
 +       "COMMENT_STRING" CHAR(4000) NULLIF "COMMENT_STRING"=0X'5C4E',
 +       "CLIENTID" CHAR(4000) NULLIF "CLIENTID"=0X'5C4E',
 +       "SPARE1" CHAR(4000) NULLIF "SPARE1"=0X'5C4E',
 +       "SPARE2" CHAR NULLIF "SPARE2"=0X'5C4E',
 +       "OBJ_LABEL" CHAR(4000) NULLIF "OBJ_LABEL"=0X'5C4E',
 +       "SES_LABEL" CHAR(4000) NULLIF "SES_LABEL"=0X'5C4E',
 +       "PRIV_USED" CHAR NULLIF "PRIV_USED"=0X'5C4E',
 +       "SESSIONCPU" CHAR NULLIF "SESSIONCPU"=0X'5C4E',
 +       "NTIMESTAMP" CHAR DATE_FORMAT DATE MASK 'YYYY-MM-DD' NULLIF "NTIMESTAMP"=0X'5C4E',
 +       "PROXY_SID" CHAR NULLIF "PROXY_SID"=0X'5C4E',
 +       "USER_GUID" CHAR(4000) NULLIF "USER_GUID"=0X'5C4E',
 +       "INSTANCE" CHAR NULLIF "INSTANCE"=0X'5C4E',
 +       "PROCESS" CHAR(4000) NULLIF "PROCESS"=0X'5C4E',
 +       "XID" CHAR(4000) NULLIF "XID"=0X'5C4E',
 +       "AUDITID" CHAR(4000) NULLIF "AUDITID"=0X'5C4E',
 +       "SCN" CHAR NULLIF "SCN"=0X'5C4E',
 +       "DBID" CHAR NULLIF "DBID"=0X'5C4E',
 +       "SQLBIND" CHAR NULLIF "SQLBIND"=0X'5C4E',
 +       "SQLSTRING" CHAR NULLIF "SQLSTRING"=0X'5C4E',
 +       "OBJ_EDITION" CHAR(4000) NULLIF "OBJ_EDITION"=0X'5C4E'
 +     )
 +   )
 +   LOCATION
 +   (
 +     'osch-20140830072847-3775-1',
 +     'osch-20140830072847-3775-2',
 +     'osch-20140830072847-3775-3',
 +     'osch-20140830072847-3775-4'
 +   )
 +) PARALLEL REJECT LIMIT UNLIMITED;
 +
 +The following location files were created.
 +
 +osch-20140830072847-3775-1 contains 1 URI, 11723955 bytes
 +
 +    11723955 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00003
 +
 +osch-20140830072847-3775-2 contains 5 URIs, 431745 bytes
 +
 +        4293 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00007
 +      317306 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00001
 +         682 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00002
 +        4229 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00008
 +      105235 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00000
 +
 +osch-20140830072847-3775-3 contains 1 URI, 820675 bytes
 +
 +      820675 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00006
 +
 +osch-20140830072847-3775-4 contains 1 URI, 8075063 bytes
 +
 +     8075063 hdfs://bigdatalite.localdomain:8020/tmp/auditGPI/part-m-00009
 +
 +</code>
 +
 +Test mit:
 +<code sql>
 +
 +SCOTT>select count(*),userid from "SCOTT"."EXT_ORA_AUDIT_GPI_HIVE" group by userid;
 +
 +
 +no rows selected
 +
 +... hmmm
 +
 +# logfiles kontrolliert
 +
 +EXT_ORA_AUDIT_GPI_HIVE_5396.log:ORA-01841: (full) year must be between -4713 and +9999, and not be 0
 +
 +error processing column TIMESTAMP in row 1 for datafile /home/oracle/hadoop_ext_tab/osch-20140830073731-58-2
 +ORA-01841: (full) year must be between -4713 and +9999, and not be 0
 +
 +Hier passt das Datum nicht und wird nicht richtig geparst!
 +
 +
 +#Ursache:
 +
 +Durch den Import per scoop nach dem HDFS und das lesen der Daten aus einer external Hive Table passt die Konvertierung von Null Values nicht mehr!
 +
 +=>  "TIMESTAMP" CHAR DATE_FORMAT DATE MASK 'YYYY-MM-DD' NULLIF "TIMESTAMP"=0X'5C4E'
 +
 +Der null Values ist aber der wert 'null' ! nicht in /n !
 +
 +Richtig ist daher:
 +
 +=> "TIMESTAMP" CHAR DATE_FORMAT DATE MASK 'YYYY-MM-DD' NULLIF "TIMESTAMP"=0X'276E756C6C27' für 'null'!
 +
 +</code>
 +
 +
 +
 +
 +----
 +
 +==== Mit dem Oracle Loader for Hadoop – OLH vom Hadoop Daten einlesen ====
 +
 +
 +Ziel ist es Daten aus dem Hadoop Cluster aus einer Hive Table über das Oracle DataPump External Fileformat in die Oracle Datenbank zu importieren.
 +
 +Ablauf:
 +
 +  * Tabelle in Hive mit den Test Daten anlegen
 +  * Tabelle in der Oracle RDBMS anlegen
 +  * Metadaten Definition der Tabelle mit dem OLH Tools erzeugen
 +  * Job Daten für den MapReduce Job für eine Hive Table erzeugen
 +  * Job aufrufen
 +
 +
 +=== Export Tabelle im Hadoop Hive anlegen  ===
 +
 +<code sql>
 +# test daten erzeugen und auf das HDFS kopieren
 +for ((i=1;i<=10000;i++));
 +do
 +  echo ${i},AWert${i},BWert${i} >> hdfs_export.csv
 +done
 +
 +hdfs dfs -put hdfs_export.csv /user/oracle
 +
 +#hive table anlegen und test Daten dort einlesen
 +hive
 +
 +hive>create table hdfs_export (id  BIGINT
 +                               , wert1 string
 +                               , wert2 string)
 +COMMENT 'Oracle OKData.'
 +ROW FORMAT 
 +DELIMITED FIELDS TERMINATED BY ',' 
 +LINES TERMINATED BY '\n' 
 +;   
 +
 +hive> describe hdfs_export;                            
 +
 +hive> load data inpath '/user/oracle/hdfs_export.csv' overwrite into table hdfs_export;
 +
 +hive> select count(*) from hdfs_export;
 +
 +...
 +
 +10000
 +
 +</code>
 +
 +=== Import Tabelle im Oracle RDBMS anlegen ===
 +
 +<code sql>
 +
 +sqlplus scott/tiger
 +
 +create table hdfs_imports (id  number, wert1 varchar2(100), wert2 varchar2(100));
 +
 +</code>
 +
 +
 +=== Metadaten Definition der Tabelle mit dem OLH Tools erzeugen ===
 +
 +
 +<code bash>
 +#Check classpath
 +# we need all jar under the path to the oraloader.jar
 +# In this VM Enviroment /u01/connectors/olh/jlib/oraloader.jar
 +echo $CLASSPATH
 +/u01/connectors/olh/jlib/*:/usr/lib/hadoop/*:/usr/lib/hadoop/client/
 +
 +java oracle.hadoop.loader.metadata.OraLoaderMetadata         \
 +   -user scott                                               \
 +   -connection_url jdbc:oracle:thin:@//localhost:1521/orcl   \
 +   -schema scott                                             \
 +   -table hdfs_imports                                       \
 +   -output hdfs_imports_metadata.xml                         
 +
 +[Enter Database Password:]
 +
 +# The metadata files will be generated
 +
 +cat hdfs_imports_metadata.xml
 +
 +<?xml version="1.0" encoding="UTF-8"?>
 +<!--
 +Oracle Loader for Hadoop Release 3.0.0 - Production
 +
 +Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
 +
 +-->
 +<DATABASE>
 +<ROWSET><ROW>
 +<TABLE_T>
 + <VERS_MAJOR>2</VERS_MAJOR>
 + <VERS_MINOR>5 </VERS_MINOR>
 + <OBJ_NUM>99959</OBJ_NUM>
 + <SCHEMA_OBJ>
 +
 +
 +
 +
 +</code>
 +
 +===Job Daten für den MapReduce Job für eine Hive Table erzeugen===
 +
 +
 +Mit einer Job Steuerdatei nach dem Standard Hadoop Methoden eine MapReduce zu erstellen, wird der Job definiert.
 +
 +<code xml>
 +<?xml version="1.0" encoding="UTF-8" ?>
 +<configuration>
 + 
 +<!--    Input settings  -->
 + <property>
 +    <name>mapreduce.inputformat.class</name>
 +    <value>oracle.hadoop.loader.lib.input.HiveToAvroInputFormat</value>
 + </property>
 + 
 + <property>
 +    <name>oracle.hadoop.loader.input.hive.databaseName</name>
 +    <value>default</value>
 +  </property>
 + 
 + <property>
 +    <name>oracle.hadoop.loader.input.hive.tableName</name>
 +    <value>hdfs_export</value>
 + </property>
 + 
 + <property>
 +    <name>oracle.hadoop.loader.input.fieldNames</name>
 +    <value>id,wert1,wert2</value>
 + </property>
 + 
 + <property>
 +    <name>oracle.hadoop.loader.defaultDateFormat</name>
 +    <value>yyyy-MM-dd:HH:mm:ss</value>
 + </property>
 + 
 + 
 +<!--   Output settings   -->
 + <property>
 +    <name>mapreduce.job.outputformat.class</name>
 +    <value>oracle.hadoop.loader.lib.output.DataPumpOutputFormat</value>
 + </property>
 + 
 + <property>
 +    <name>mapreduce.output.fileoutputformat.outputdir</name>
 +    <value>/tmp/hdfs_exports</value>
 + </property>
 + 
 +<!-- Table information -->
 + <property>
 +    <name>oracle.hadoop.loader.loaderMap.targetTable</name>
 +    <value>hdfs_imports</value>
 + </property>
 + 
 + 
 +<!-- Connection information  -->
 + 
 +<property>
 +  <name>oracle.hadoop.loader.connection.url</name>
 +  <value>jdbc:oracle:thin:@${HOST}:${TCPPORT}/${SERVICE_NAME}</value>
 +</property>
 + 
 +<property>
 +  <name>TCPPORT</name>
 +  <value>1521</value>
 +</property>
 + 
 +<property>
 +  <name>HOST</name>
 +  <value>bigdatalite</value>
 +</property>
 + 
 +<property>
 + <name>SERVICE_NAME</name>
 +  <value>orcl</value>
 +</property>
 + 
 +<property>
 +  <name>oracle.hadoop.loader.connection.user</name>
 +  <value>scott</value>
 +</property>
 + 
 +<property>
 +  <name>oracle.hadoop.loader.connection.password</name>
 +  <value>tiger</value>
 +  <description> A password in clear text is NOT RECOMMENDED. Use an Oracle wallet instead.</description>
 +</property>
 + 
 +</configuration>
 +
 +
 +</code>
 +
 +In der Dokumentation werden in der Job Konfiguration einige "veraltete" Klassennamen verwendet
 +<code bash>
 +4/09/20 19:22:53 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
 +</code>
 +
 +Falls diese Meldungen stört, einfach die "mapred" Klassenangaben mit den moderneren Varianten "mapreduce" ersetzen.
 +
 +
 +===Job aufrufen===
 +
 +
 +<code bash>
 +hadoop jar /u01/connectors/olh/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -conf hdfs_job_config.xml
 +</code>
 +
 +Über die Hadoop Job Oberfläche prüfen ob alles klappt.
 +
 +Erster und folgende weitere Fehler, Klassen fehlen im Pfad:
 +<code>
 +Error: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.TableType at 
 +Error: java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDeException at 
 +Error: java.lang.ClassNotFoundException: org.apache.thrift.TBase at 
 +Error: java.lang.ClassNotFoundException: org.apache.hadoop.hive.common.io.NonSyncByteArrayOutputStream at 
 +</code>
 +
 +Nach ein Paar Testläufen ergibt sich als Lösung diese Klassen mit **-libjars** beim Aufruf übergeben:
 +<code bash>
 +hadoop jar /u01/connectors/olh/jlib/oraloader.jar oracle.hadoop.loader.OraLoader \
 +-conf hdfs_job_config.xml                                                        \
 +-libjars /usr/lib/hive/lib/hive-metastore-0.12.0-cdh5.0.0.jar,/usr/lib/hive/lib/hive-serde-0.12.0-cdh5.0.0.jar,/usr/lib/hive/lib/libthrift-0.9.0.cloudera.2.jar,/usr/lib/hive/lib/hive-common-0.12.0-cdh5.0.0.jar
 +</code>
 +
 +Mit der obigen Konfiguration finden sich die Dateien im HDFS nun hier:
 +
 +<code bash>
 +hdfs dfs -ls /user/oracle/temp_out_session
 +
 +Found 4 items
 +-rw-r--r--   1 oracle oracle          0 2014-09-20 19:22 /user/oracle/temp_out_session/_SUCCESS
 +drwxr-xr-x   - oracle oracle          0 2014-09-20 19:22 /user/oracle/temp_out_session/_olh
 +-rw-r--r--   1 oracle oracle     151552 2014-09-20 19:22 /user/oracle/temp_out_session/oraloader-00000-dp-0.dat
 +-rw-r--r--   1 oracle oracle     151552 2014-09-20 19:22 /user/oracle/temp_out_session/oraloader-00001-dp-0.dat
 +
 +</code>
 +
 +Diese Datein können nun als Datenquelle für eine external Table über das normale Filesystem auf einen anderen Rechner dienen:
 +
 +<code sql>
 +
 +# copy the files to the local Filesystem 
 +# The used Oracle directory object HADOOP_EXT_TABS points to this location!
 +
 +hdfs dfs -get /user/oracle/temp_out_session/oraloader-00000-dp-0.dat .
 +hdfs dfs -get /user/oracle/temp_out_session/oraloader-00001-dp-0.dat .
 +
 +sqlplus scott/tiger
 +
 +create table hdfs_import_local (id  number
 +                                        , wert1 varchar2(100)
 +                                        , wert2 varchar2(100)
 +                                        )
 + ORGANIZATION EXTERNAL (
 +     TYPE ORACLE_DATAPUMP
 +     DEFAULT DIRECTORY HADOOP_EXT_TABS
 +     LOCATION ('oraloader-00000-dp-0.dat','oraloader-00001-dp-0.dat')
 +  )
 +/
 +
 +select count(*) from hdfs_import_local;
 +
 +    COUNT(*)
 +------------
 +       10000
 +</code>
 +
 +tip:
 +Ein SQL Script als Vorlage findet sich unter /user/oracle/temp_out_session/_olh/oraloader-dp.sql auf dem HDFS Arbeitsverzeichnis:
 +<code bash>
 +
 +dfs -cat /user/oracle/temp_out_session/_olh/oraloader-dp.sql
 +
 +
 +--Oracle Loader for Hadoop Release 3.0.0 - Production
 +--
 +--Copyright (c) 2011, 2014, Oracle and/or its affiliates. All rights reserved.
 +--
 +--Generated by DataPumpOutputFormat
 +--
 +--CREATE OR REPLACE DIRECTORY OLH_EXTTAB_DIR AS '...';
 +--GRANT READ, WRITE ON DIRECTORY OLH_EXTTAB_DIR TO SCOTT;
 +--
 +--ALTER SESSION ENABLE PARALLEL DML;
 +--INSERT /*+ APPEND */ INTO "SCOTT"."HDFS_IMPORTS" SELECT * FROM "SCOTT"."EXT_HDFS_IMPORTS";
 +--
 +CREATE TABLE  "SCOTT"."EXT_HDFS_IMPORTS"
 +(
 +   "ID"                             NUMBER,
 +   "WERT1"                          VARCHAR2(100),
 +   "WERT2"                          VARCHAR2(100)
 +)
 +ORGANIZATION EXTERNAL
 +  (TYPE ORACLE_DATAPUMP
 +   DEFAULT DIRECTORY OLH_EXTTAB_DIR
 +   LOCATION
 +   (
 +    'oraloader-00000-dp-0.dat',
 +    'oraloader-00001-dp-0.dat'
 +   )
 +  );
 +
 +</code>
 +
 +
 +Oracle Dokumentation Release 3 => http://docs.oracle.com/cd/E53356_01/doc.30/e53067/olh.htm#BDCUG464
 +
 +
 +==== Quellen ====
 +
 +  * http://hadooped.blogspot.de/2013/05/apache-sqoop-for-data-integration.html
 +
 +External Table:
 +
 +  * https://blogs.oracle.com/bigdataconnectors/entry/how_to_load_oracle_tables4
 +  * https://blogs.oracle.com/bigdataconnectors/entry/how_to_load_oracle_tables
 +  * https://blogs.oracle.com/bigdataconnectors/entry/accessing_a_hive_table_from
 +
 +
 +
 +Spark:
 +  * http://www.rittmanmead.com/2014/05/exploring-apache-spark-on-the-new-bigdatalite-3-0-vm/
  
"Autor: Gunther Pipperr"
nosql/oracle_hadoop_big_data_appl_erste_schritte.txt · Zuletzt geändert: 2014/09/20 21:28 von gpipperr