PANIC: watchdog timer expired - exiting

Message ID 1181418045.7288.143.camel@core
State New
Headers

Commit Message

Petri Hintukainen June 9, 2007, 7:40 p.m. UTC
  On Sat, 2007-06-09 at 12:28 +0200, Udo Richter wrote:
> And, from the original post:
> > May 31 20:23:38 localhost vdr: [3413] System Time = Thu May 31 20:23:38 2007 (1180632218)
> > May 31 20:23:38 localhost vdr: [3413] Local Time = Thu May 31 20:19:37 2007 (1180631977)
> > May 31 20:21:01 localhost vdr: [3405] PANIC: watchdog timer expired - exiting!

Turning clock is always very bad and dangerous thing to do. It can cause
lot of other problems too, just to mention incomplete builds, duplicate
cron jobs, destroyed logs and files, incomplete backups ...

> The clock was set to 20:19:37, and the watchdog fires at 20:21:01 - 84 
> seconds later. There must be something different causing the watchdog to 
> expire.

It might be even some plugin. All timeouts (cTimeMs, cCondVar,
cCondWait) use current wall clock time to set the timeout. 
Example:
  cCondWait c;
  c.Wait(100);

If clock is turned 2 minutes back in middle of this, the code will wait
120100 ms instead of 100ms ... Might cause some quite weird problems.
I belive there's no way to change pthread_..._timedwait functions, but
cTimeMs can be changed to use monotonic timers instead of gettimeofday
(patch attached).


- Petri
  

Comments

Clemens Kirchgatterer June 10, 2007, 10:31 a.m. UTC | #1
Petri Hintukainen <phintuka@users.sourceforge.net> wrote:

> If clock is turned 2 minutes back in middle of this, the code will
> wait 120100 ms instead of 100ms ... Might cause some quite weird
> problems. I belive there's no way to change pthread_..._timedwait
> functions, but cTimeMs can be changed to use monotonic timers instead
> of gettimeofday (patch attached).

just for the record, i think you have to change the Makefile to include
-lrt in the LIBS for the patch to work.

best regards ...
clemens
  
Udo Richter June 10, 2007, 12:59 p.m. UTC | #2
Petri Hintukainen wrote:
> It might be even some plugin. All timeouts (cTimeMs, cCondVar,
> cCondWait) use current wall clock time to set the timeout. 

Thats not even all: There are 140 references to time(NULL) in VDR, and 
most of them are used for timeouts between a few seconds and some hours. 
Even the famous "video data stream broken" (causing an emergency 
shutdown) can be triggered by a 30-second time step.

Cheers,

Udo
  
Petri Hintukainen June 10, 2007, 2:37 p.m. UTC | #3
On Sun, 2007-06-10 at 14:59 +0200, Udo Richter wrote:
> Petri Hintukainen wrote:
> > It might be even some plugin. All timeouts (cTimeMs, cCondVar,
> > cCondWait) use current wall clock time to set the timeout. 
> 
> Thats not even all: There are 140 references to time(NULL) in VDR, and 
> most of them are used for timeouts between a few seconds and some hours. 
> Even the famous "video data stream broken" (causing an emergency 
> shutdown) can be triggered by a 30-second time step.

He :)
All places where time(NULL) is used to measure some time interval could
easily be changed to use monotonic cTimeMs.
But with timers current wall clock time is really required ...

It might be enough to replace most of time(NULL) 's with something like

time_t cTimeMs::Time(void) {
	return (time_t)(cTimeMs::Now() / 1000);
}

Using the cTimeMs timer/trigger mechanism requires some more changes and
debugging.


- Petri
  

Patch

--- ../../vdr-1.4.5-orig/tools.c	2007-01-02 06:18:41.000000000 +0200
+++ ../../vdr-1.4.5/tools.c	2007-01-02 06:14:03.000000000 +0200
@@ -549,6 +549,54 @@ 
 
 uint64_t cTimeMs::Now(void)
 {
+#if _POSIX_TIMERS > 0 && defined(_POSIX_MONOTONIC_CLOCK)
+  static bool initialized = false;
+  static bool monotonic = false;
+  struct timespec tp;
+
+  // initialization: 
+  // check if monotonic timer is available and
+  // provides enough accurate resolution  
+  if(!initialized) {
+
+     if(clock_getres(CLOCK_MONOTONIC, &tp)) 
+        esyslog("cTimeMs: clock_getres(CLOCK_MONOTONIC) failed");
+
+     else {
+        dsyslog("cTimeMs: clock_gettime(CLOCK_MONOTONIC): clock resolution %d us",
+		((int)tp.tv_nsec) / 1000);
+
+	// require at least 1 ms resolution
+	if( tp.tv_sec == 0 && tp.tv_nsec <= 1000000 ) {
+
+	   if(clock_gettime(CLOCK_MONOTONIC, &tp))
+	      esyslog("cTimeMs: clock_gettime(CLOCL_MONOTONIC) failed");
+
+	   else {
+	      dsyslog("cTimeMs: using monotonic clock");
+	      monotonic = true;
+	      }
+	   }
+        }
+
+     initialized = true;
+     }
+
+
+  if(monotonic) {
+
+     if(!clock_gettime(CLOCK_MONOTONIC, &tp))
+        return (uint64_t(tp.tv_sec)) * 1000 + tp.tv_nsec / 1000000;
+        
+     esyslog("cTimeMs: clock_gettime(CLOCK_MONOTONIC) failed");
+     monotonic = false;
+     //return 0;
+     }
+
+#else
+#  warning Posix monotonic clock not available
+#endif
+
   struct timeval t;
   if (gettimeofday(&t, NULL) == 0)
      return (uint64_t(t.tv_sec)) * 1000 + t.tv_usec / 1000;