===== Die Uhrzeit unter Linux für eine Oracle Cluster Installation überwachen/prüfen und kontrollieren =====

Eine unabdingbare Voraussetzung für ein funktionierende Oracle Real Application Cluster ist eine exakt gleiche Uhrzeit auf allen Cluster Knoten.

Gerade beim Start des Clusters muss die Uhrzeit des System exakt stimmen, d.h. auch die Bios/Real Time Clock muss auf die richtige Uhrzeit gesetzt werden.

Für Windows siehe [[windows:windows_ntp_w32tm_zeitdienst|Den NTP Service unter MS Windows 2008 / 2012 für eine Oracle Real Application Cluster Installation auf einen eigenen NTP Server konfigurieren]]
==== NTP Konfiguration für das Oracle Cluster mit der " -x" Slewing Option einrichten ====


NTP muss mit der Option **" -x"** (Für slew) eingerichtet werden.

Das bedeutet, das die Uhrzeit nicht schlagartig bei Änderungen verstellt wird, sondern langsam an den echten Wert angenähert wird.


<code bash>
$ vi /etc/sysconfig/ntpd
…
OPTIONS=” -x -g ”
…
</code>

Auf das führende Leerzeichen vor dem **-x** achten.

Auf die richtigen NTP Server achten, dazu Datei "/etc/ntp.conf" prüfen:
<code bash>
$ cat /etc/ntp.conf
…
restrict <ip_adress_of_your_time_server> mask 255.255.255.255 nomodify notrap noquery
…
</code>


Ist ein Zugriff auf das Internet finden finden sich hier: http://www.pool.ntp.org/zone/de passende Server:

Beispiel für einen Server Eintrag:
<code bash>

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1

# Hosts on local network are less restricted.
restrict 10.10.10.0 mask 255.255.255.0 nomodify notrap

# Server
server 192.168.178.1

</code> 


Testen des Zeitdienstes (und setzen der aktuellen Zeit):
<code bash>
$ service ntpd stop
$ ntpdate -s <ip_adress_of_your_time_server>

$ service ntpd restart
</code>

Autostart einrichten Linux 6:
<code bash>
$ chkconfig --level 35 ntpd on

$ chkconfig --list | grep ntpd
ntpd            0:off   1:off   2:on    3:on    4:on    5:on    6:off

</code>

Autostart einrichten Linux 7:
<code bash>
systemctl enable ntpd.service
systemctl restart ntpd.service
</code>

===Genauigkeit über den Drift file prüfen===

<code bash>
cat /var/lib/ntp/drift
-7.437
</code>

In der Datei steht "system clock's frequency offset", Einheit "PPM" - "parts per million"

Einheit: 
1 PPM = 1 part per million = 1 microsecond per second = 3.6ms per Stunde = 86.4ms per Tag


===Datei "/etc/adjtime"=== 

Eine wichtige Rolle die diesen Zusammenhang spielt die Datei "/etc/adjtime"

Abfragen mit "adjtimex" falls auf dem aktuellen Linux verfügbar


In der Datei djtime wird die tägliche Abweichung und der Zeitpunkt der letzen Korrectur verzeichnet

Beispiel:
<code bash>
cat /etc/adjtime

0.000081 1417182843 0.000000
1417182843
UTC


date -d @1417182843

Fri Nov 28 14:54:03 CET 2014

</code>

  * 1. Zeile: die tägliche Abweichung in Sekunden (als Gleitkommazahl) - Zeitpunkt der letzten Korrektur in epoch - der Wert  0.000000
  * 2. Zeile: Zeitpunkt der letzten Korrektur in epoch oder »0« , falls noch keine Korrektur stattfand 
  * 3. Zeile »LOCAL« oder »UTC« , zeigt an ob die  Hardwareuhr auf lokale Zeitzone oder »Coordinated Universal Time« eingestellt ist

Ein Aufruf von "hwclock --systohc" stellt die Hardwareuhr auf die aktuelle Systemzeit und aktualisiert (bzw. legt an) gleichzeitig die Datei "/etc/adjtime" an

Beim Bootvorgang wird diese Information dann (evlt. ? je nach Linux?) verwendet, um die Uhr beim Booten zu setzen. In einer Umgebung konnte beobachtet werden da beim Booten die Uhrzeit dadurch extrem falsch gesetzt wurde, mit der " -x" Option im Rack führte das wiederum zu masiven Problemen beim Start des Cluster Knoten.


===Test===


Gestarteten Zeitdienst mit dem Programm [[http://www.eecis.udel.edu/~mills/ntp/html/ntpq.html|**ntpq**]] abfragen.

<code bash>

/usr/sbin/ntpq

ntpq> lopeers
     remote           local      st t when poll reach   delay   offset    disp
==============================================================================
*iris.wf-hosting 192.168.178.181  2 u    7   64    1   27.943   -0.030 187.594
 alvo.fungus.at  192.168.178.181  3 u    6   64    1   18.343  -35.754 187.564
 hv02.nebie.de   192.168.178.181  3 u    5   64    1   17.690   -1.484 187.533
 tischi.de       192.168.178.181  2 u    4   64    1   13.030    0.406 187.549


#
#Parameters:
# remote:      The remote peer or server being synced to. “LOCAL” is this local host (included in case there are no remote peers or servers available); 
# refid:       Where or what the remote peer or server is itself synchronised to;
# st :         The remote peer or server Stratum
# t:           Type (u: unicast or manycast client, b: broadcast or multicast client, l: local reference clock, s: symmetric peer, A: manycast server, B: broadcast server, M: multicast server, see “Automatic Server Discovery“);
# when:        number of seconds passed since last response
# poll:        polling interval, in seconds, for source
# reach:       indicates success/failure to reach source, 377 all attempts successful
# delay:       indicates the roundtrip time, in milliseconds, to receive a reply
# offset:      indicates the time difference, in milliseconds, between the client server and source
# disp/jitter: indicates the difference, in milliseconds, between two samples


ntpq> rl
associd=0 status=0614 leap_none, sync_ntp, 1 event, freq_mode,
version="ntpd 4.2.6p5@1.2349-o Fri Oct 11 03:18:05 UTC 2013 (1)",
processor="x86_64", system="Linux/2.6.39-400.17.1.el6uek.x86_64",
leap=00, stratum=3, precision=-24, rootdelay=153.275, rootdisp=1115.335,
refid=176.31.45.66,
reftime=d7125e40.97fd54d9  Mon, May  5 2014 21:00:16.593,
clock=d7125e50.c5a2a83c  Mon, May  5 2014 21:00:32.772, peer=16613, tc=6,
mintc=3, offset=0.091, frequency=0.001, sys_jitter=0.996,
clk_jitter=0.032, clk_wander=0.000

# weitere wichtige Befehle:

ntpq> peers
ntpq> associations

# Auslese eines peer mit der assID

ntpq> readlist <assID>

ntpq> exit

</code>

Überwachen mit [[http://www.eecis.udel.edu/~mills/ntp/html/ntpdc.html|**ntpdc**]]

<code bash>

ntpdc

ntpdc> monlist

remote address          port local address      count m ver rstr avgint  lstint
===============================================================================
hv02.nebie.de            123 192.168.178.181        9 4 4    1d0     29      56
alvo.fungus.at           123 192.168.178.181        9 4 4    1d0     29      56
tischi.de                123 192.168.178.181        9 4 4    1d0     28      59
iris.wf-hosting.de       123 192.168.178.181        9 4 4    1d0     29      60


ntpdc> sysstat

time since restart:     320
time since reset:       320
packets received:       62
packets processed:      40
current version:        40
previous version:       0
declined:               0
access denied:          0
bad length or format:   0
bad authentication:     0
rate exceeded:          0


# Alles zusammen abfragen:


ntpq -c peer -c as -c rl

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+minion.webershe 192.53.103.108   2 u   58   64  377   18.594  -106.79   1.281
-89.163.176.81   178.63.97.57     3 u   46   64  377   13.130  -105.77   1.919
*monitman.com    158.43.128.33    2 u   27   64  377   18.966  -105.97   0.992
+freiwuppertal.d 130.133.1.10     2 u   10   64  377   18.365  -105.42   1.286

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 10798  941a   yes   yes  none candidate    sys_peer  1
  2 10799  9324   yes   yes  none   outlyer   reachable  2
  3 10800  961a   yes   yes  none  sys.peer    sys_peer  1
  4 10801  941a   yes   yes  none candidate    sys_peer  1
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5@1.2349-o Fri Oct 11 03:18:05 UTC 2013 (1)",
processor="x86_64", system="Linux/3.8.13-16.2.1.el6uek.x86_64", leap=00,
stratum=3, precision=-21, rootdelay=35.476, rootdisp=162.095,
refid=144.76.118.85,
reftime=d80b8e6a.b134c8b9  Mon, Nov 10 2014 20:20:10.692,
clock=d80b8ec8.a6f43595  Mon, Nov 10 2014 20:21:44.652, peer=10800, tc=6,
mintc=3, offset=-106.861, frequency=78.044, sys_jitter=0.803,
clk_jitter=12.849, clk_wander=2.351

</code>


==Wie genau ist die Uhrzeit nun eingestellt?==

Im Cluster wird oft von der Clustersoftware eine Zeitdifferenz in Bereich einer Millisekunde erkannt und gelogt.

Auch ist die Frage wie groß denn gerade der Offset durch die " -x" Option ist.

Detail suchen:
<code bash>
/usr/sbin/ntpdc -nc  loopinfo 

offset:               0.003636 s
frequency:            12.004 ppm
poll adjust:          16
watchdog timer:       1098 s

#From the man page:
# The `offset' is the last offset given to the loop filter by the packet processing code.  
# The `frequency' is the frequency error of the local clock in parts-per-million (ppm).
# The `time_const' controls the stiffness of the phase-lock loop  and thus the speed at which it can adapt to # oscillator drift.
# The `watchdog timer' value is the number of seconds which have elapsed since the last sample offset was given to the loop filter. 

/usr/sbin/ntpdc -c kerninfo

pll offset:           0 s
pll frequency:        0.000 ppm
maximum error:        16 s
estimated error:      1.6e-05 s
status:               0041  pll unsync
pll time constant:    7
precision:            1e-06 s
frequency tolerance:  500 ppm

#

/usr/sbin/ntpdc -c sysinfo

system peer:          192.168.178.1
system peer mode:     client
leap indicator:       00
stratum:              3
precision:            -24
root distance:        0.00586 s
root dispersion:      0.04466 s
reference ID:         [138.35.51.113]
reference time:       d8bd4cd1.4522887b  Wed, Mar 25 2014 16:03:45.270
system flags:         auth monitor ntp stats 
jitter:               0.003906 s
stability:            0.000 ppm
broadcastdelay:       0.000000 s
authdelay:            0.000000 s


/usr/sbin/ntptime

ntp_gettime() returns code 0 (OK)
  time d80b8e61.a8555000  Mon, Nov 10 2014 20:20:01.657, (.657552),
  maximum error 12666016 us, estimated error 16 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
  modes 0x0 (),
  offset 0.000 us, frequency 0.000 ppm, interval 1 s,
  maximum error 12666016 us, estimated error 16 us,
  status 0x1 (PLL),
  time constant 7, precision 1.000 us, tolerance 500 ppm,

# Auf einem RedHat System:

/usr/sbin/ntptime

ntp_gettime() returns code 5 (ERROR)

#Genauigkeit

/usr/sbin/ntpq -c rl

..
stratum=3, precision=-23, rootdelay=6.058, rootdisp=52.754
..
tc=10, mintc=3, offset=4.686, frequency=13.050, sys_jitter=4.660,clk_jitter=1.201, clk_wander=0.003
..

## microseconds => 2^precision  = 23^2= 559 microseconds 
## When repeatedly reading the time, the difference may vary almost randomly. The difference of these differences (second derivation) is called jitter.


</code>


==Fehlersuche==

Siehe:
  * http://www.ntp.org/ntpfaq/NTP-s-trouble.htm


==== Real Time Clock ====
Beim Bootvorgang holt sich das OS die Zeit vom der Hardware, der Real Time Clock.

Der Wert der Real Time Clock kann hier abgefragt werden:

<code bash>
cat /sys/class/rtc/rtc0/since_epoch
</code>


Bzgl. Epoch Zeit siehe diesen guten Epoch Rechner: http://www.epochconverter.com

==Lokale OS Zeit mit der Real Time Clock der Hardware vergleichen==

<code bash>

# epoch auswerten:
date -d @`cat /sys/class/rtc/rtc0/since_epoch`

#Alternativ:
cat /sys/class/rtc/rtc0/since_epoch | awk '{print strftime("%d.%b %H:%M:%S %z",$1)}'

# Gleichzeitig auswerten:
# Gleichzeitig abfragen mit:

echo NTP Time=`date +"%d.%b %H:%M:%S %z"` :: Internal RTC time=`cat /sys/class/rtc/rtc0/since_epoch | awk '{print strftime("%d.%b %H:%M:%S %z",$1)}'` :: Epoch RTC Timestamp=`cat /sys/class/rtc/rtc0/since_epoch`

</code>

Bzgl. der Zeitformate siehe hier: http://www.gnu.org/software/gawk/manual/html_node/Time-Functions.html


Die Uhrzeit der Hardware Clock MUSS gleich der Uhrzeit des OS sein und sollte nicht zu stark auseinander laufen.


Management Karten wie der HP iLO können beim Booten die Uhrzeit zwar gut setzen, müssen selber aber auf die richtige Zeit gesetzt werden und die richtigen Zeitserver verwenden! Auf die Zeitzone achten!

== Setzen==

Gesetzt werden kann die HW Clock mit [[http://linux.die.net/man/8/hwclock|hwclock]]:
<code bash>
# auslesen als root mit:
hwclock -r

#HW Clock auf System zeit setzen mit:
hwclock --systohc 

</code>

==== Mit dem Oracle clufy Utility prüfen====

Zeitdienst im Oracle Cluster prüfen:
<code bash>

#Umgebung auf das Oracle Home des Clusters einstellen

cd $ORACLE_HOME/bin

./cluvfy comp clocksync -n all -verbose

</code>


Zeitdienst nur im OS prüfen:
<code bash>

#Umgebung auf das Oracle Home des Clusters einstellen

cd $ORACLE_HOME/bin

./cluvfy comp clocksync -noctss -n all -verbose

</code>

<note warning>Bei beiden Kommandos darf kein Fehler und keine Warnung auftreten!</note>

Prüfen ob der Oracle ctss Cluster Time Synchronization Service aktiv ist:

<code bash>
# Als Grid User

crsctl check ctss
CRS-4700: The Cluster Time Synchronization Service is in Observer mode.

</code>

Sobald der CTSS erkennt das KEIN NTP konfiguriert ist ( CTSS prüft ob die /etc/ntp.conf exisiert) wird in den Active Mode geschaltet.


===Beispiel für fehlerhaften Zeit Dienst:===

==Erste Anzeichen im Cluster Alert Log==

<code bash>
#Umgebung auf das Oracle Home des Clusters einstellen

cd $ORACLE_HOME/log/gpidb-db01

grep -B 1 clock alertgpidb01.log

...
2014-05-05 18:27:49.677:
[ctssd(21703)]CRS-2409:The clock on host GPIDB-db01 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in 
...

</code>


Ursache: Siehe Meldungen von den Clufy Kommando für die Analyse


  * CRS-2409:The clock on host <hostname> is not synchronous with the mean cluster time (Doc ID 1135337.1)
  * CRS-2409 messages seen from CTSS despite NTP running properly. (Doc ID 1311163.1)


**Ursache:**

Der Unterschied in der  Uhrzeit zwischen diesem Knoten und dem referenz Knoten im Cluster ist größer als 1ms.
Laut Node 1311163.1 kann das ingnoriert werden, wenn keine Fehler bei einem ntp test mit clufy auftauchen.


== Schwerwiegender Fehler ==

<code bash>
[ctssd(7784)]CRS-2412:The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time. Details in ......./octssd.log.
</code>


Analyse: 
  * Detail Logfile unter **$ORACLE_HOME/log/gpidb-db01/ctssd/octssd.log** analysieren
  * clufy Output beachten!


  * Linux: CRS-2412: The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time (Doc ID 1632514.1)

==== Quellen====

NTP:

  * http://www.ntp.org/ntpfaq/NTP-s-algo.htm
  * http://support.ntp.org/bin/view/Main/WebHome
  * https://wiki.archlinux.org/index.php/Time
  * http://www.cisco.com/c/en/us/support/docs/availability/high-availability/19643-ntpm.html
  * http://www.ntp.org/ntpfaq/NTP-s-algo.htm#Q-ACCURATE-CLOCK
  * https://www.meinbergglobal.com/download/ntp/docs/ntp_cheat_sheet.pdf


Oracle:

  * http://docs.oracle.com/cd/E11882_01/rac.112/e16794/cvu.htm#CWADD92257
  * How to Configure NTP or Windows Time for Oracle Clusterware (Doc ID 1056693.1)
  * Linux: An Example NTP Client Configuration to use with Oracle Clusterware 11gR2 (Doc ID 1104473.1)
  * Grid Infrastructure Does not Start after Node Reboot as Master octssd.bin Stuck (Doc ID 1215893.1)

Windows ntpd
  * https://www.meinbergglobal.com/english/sw/ntp.htm#ntp_stable

Blogs:

  * http://dbakevin.blogspot.de/2012/05/ctssd-in-11g.html
  * http://blog.sina.com.cn/s/blog_701271e80101c4ur.html
  * http://de.linwiki.org/wiki/Linuxfibel_-_System-Administration_-_Zeit_und_Steuerung#Die_Datei_.2Fetc.2Fadjtime
  * http://tech.kulish.com/2007/10/30/ntp-ntpq-output-explained/
  * http://www.satsignal.eu/ntp/