PANIC: watchdog timer expired - exiting

Message ID 4752A02B.60504@cadsoft.de
State New
Headers

Commit Message

Klaus Schmidinger Dec. 2, 2007, 12:08 p.m. UTC
  On 06/09/07 21:40, Petri Hintukainen wrote:
> On Sat, 2007-06-09 at 12:28 +0200, Udo Richter wrote:
>> And, from the original post:
>>> May 31 20:23:38 localhost vdr: [3413] System Time = Thu May 31 20:23:38 2007 (1180632218)
>>> May 31 20:23:38 localhost vdr: [3413] Local Time = Thu May 31 20:19:37 2007 (1180631977)
>>> May 31 20:21:01 localhost vdr: [3405] PANIC: watchdog timer expired - exiting!
> 
> Turning clock is always very bad and dangerous thing to do. It can cause
> lot of other problems too, just to mention incomplete builds, duplicate
> cron jobs, destroyed logs and files, incomplete backups ...
> 
>> The clock was set to 20:19:37, and the watchdog fires at 20:21:01 - 84 
>> seconds later. There must be something different causing the watchdog to 
>> expire.
> 
> It might be even some plugin. All timeouts (cTimeMs, cCondVar,
> cCondWait) use current wall clock time to set the timeout. 
> Example:
>   cCondWait c;
>   c.Wait(100);
> 
> If clock is turned 2 minutes back in middle of this, the code will wait
> 120100 ms instead of 100ms ... Might cause some quite weird problems.
> I belive there's no way to change pthread_..._timedwait functions, but
> cTimeMs can be changed to use monotonic timers instead of gettimeofday
> (patch attached).
> ...

I have (finally, sorry for the big delay) adopted this patch (in the
attached form) to fix a problem with SVDRP connections when the system
time is adjusted.

While testing this, I found that on my system the monotonic clock
only has a resolution of 4000250 ns (about 4 ms), which in your original
patch would have caused VDR not to use the monotonic clock.
Are there actually systems that have a 1 ms resolution?
Or is there some parameter that needs to be adjusted to get a better
resolution?

Maybe we should set the limit to, say, 10 ms, so that systems like
mine can also benefit from this. After all, the advantage of having
a monotonous clock outweighs the courser resolution (typically such
timeouts are not below 10 ms).

Klaus
  

Comments

Darren Salt Dec. 2, 2007, 1:34 p.m. UTC | #1
I demand that Klaus Schmidinger may or may not have written...

[snip]
> While testing this, I found that on my system the monotonic clock only has
> a resolution of 4000250 ns (about 4 ms), which in your original patch would
> have caused VDR not to use the monotonic clock.

That suggests that your kernel is built with HZ=250 (CONFIG_HZ in
/proc/config.gz).

> Are there actually systems that have a 1 ms resolution?

Any with HZ=1000, I expect :-)

[snip]
  
Klaus Schmidinger Dec. 2, 2007, 1:47 p.m. UTC | #2
On 12/02/07 14:34, Darren Salt wrote:
> I demand that Klaus Schmidinger may or may not have written...
> 
> [snip]
>> While testing this, I found that on my system the monotonic clock only has
>> a resolution of 4000250 ns (about 4 ms), which in your original patch would
>> have caused VDR not to use the monotonic clock.
> 
> That suggests that your kernel is built with HZ=250 (CONFIG_HZ in
> /proc/config.gz).

I'm running the default SUSE 10.2 kernel.

>> Are there actually systems that have a 1 ms resolution?
> 
> Any with HZ=1000, I expect :-)

Ok, I see.

I'll make it a 5 ms limit then, to allow default kernels to work.

Klaus
  
Matthias Schwarzott Dec. 2, 2007, 2:05 p.m. UTC | #3
On Sonntag, 2. Dezember 2007, Klaus Schmidinger wrote:
> On 12/02/07 14:34, Darren Salt wrote:
> > I demand that Klaus Schmidinger may or may not have written...
> >
> > [snip]
> >
> >> While testing this, I found that on my system the monotonic clock only
> >> has a resolution of 4000250 ns (about 4 ms), which in your original
> >> patch would have caused VDR not to use the monotonic clock.
> >
> > That suggests that your kernel is built with HZ=250 (CONFIG_HZ in
> > /proc/config.gz).
>
> I'm running the default SUSE 10.2 kernel.
>
> >> Are there actually systems that have a 1 ms resolution?
> >
> > Any with HZ=1000, I expect :-)
>
> Ok, I see.
>
> I'll make it a 5 ms limit then, to allow default kernels to work.
>
Then please do fix it so that it also works with HZ=100, as that is also a 
valid setting I think VDR should be able to run with.

Matthias
  
Darren Salt Dec. 2, 2007, 2:16 p.m. UTC | #4
I demand that Klaus Schmidinger may or may not have written...

> On 12/02/07 14:34, Darren Salt wrote:
>> I demand that Klaus Schmidinger may or may not have written...
>> [snip]
>>> While testing this, I found that on my system the monotonic clock only
>>> has a resolution of 4000250 ns (about 4 ms), which in your original
>>> patch would have caused VDR not to use the monotonic clock.
>> That suggests that your kernel is built with HZ=250 (CONFIG_HZ in
>> /proc/config.gz).

> I'm running the default SUSE 10.2 kernel.

That says nothing (to me) about how it's configured... :-)

>>> Are there actually systems that have a 1 ms resolution?
>> Any with HZ=1000, I expect :-)

> Ok, I see.

> I'll make it a 5 ms limit then, to allow default kernels to work.

Valid HZ options are 100, 250, 300 and 1000, unless overridden by an
arch-specific Kconfig file. (AFAICS, only mips does this, offering 48, 100,
128, 250, 256, 1000 and 1024.)

I have one computer on which I use HZ=100; however, it has no DVB devices.
  
Grégoire Favre Dec. 2, 2007, 3:09 p.m. UTC | #5
On 02/12/2007, Darren Salt <linux@youmustbejoking.demon.co.uk> wrote:

> Valid HZ options are 100, 250, 300 and 1000, unless overridden by an
> arch-specific Kconfig file. (AFAICS, only mips does this, offering 48, 100,
> 128, 250, 256, 1000 and 1024.)

zen-sources which are a really good option for multimedia desktop
offers much more choices than those.
  
Klaus Schmidinger Dec. 2, 2007, 4:50 p.m. UTC | #6
On 12/02/07 16:09, Grégoire FAVRE wrote:
> On 02/12/2007, Darren Salt <linux@youmustbejoking.demon.co.uk> wrote:
> 
>> Valid HZ options are 100, 250, 300 and 1000, unless overridden by an
>> arch-specific Kconfig file. (AFAICS, only mips does this, offering 48, 100,
>> 128, 250, 256, 1000 and 1024.)
> 
> zen-sources which are a really good option for multimedia desktop
> offers much more choices than those.

Well, I guess then it's probably best to not check this at all.

Klaus
  
Ville Skyttä Dec. 2, 2007, 5:52 p.m. UTC | #7
On Sunday 02 December 2007, Darren Salt wrote:
> I demand that Klaus Schmidinger may or may not have written...
>
> > On 12/02/07 14:34, Darren Salt wrote:
> >> I demand that Klaus Schmidinger may or may not have written...
> >
> > I'll make it a 5 ms limit then, to allow default kernels to work.
>
> Valid HZ options are 100, 250, 300 and 1000, unless overridden by an
> arch-specific Kconfig file. (AFAICS, only mips does this, offering 48, 100,
> 128, 250, 256, 1000 and 1024.)

Not that I really know much at all about this, but how would this change 
behave with NOHZ kernels?
  
Rainer Zocholl Dec. 2, 2007, 8:59 p.m. UTC | #8
Klaus.Schmidinger@cadsoft.de(Klaus Schmidinger)  02.12.07 14:47


>On 12/02/07 14:34, Darren Salt wrote:
>> I demand that Klaus Schmidinger may or may not have written...
>>
>> [snip]
>>> While testing this, I found that on my system the monotonic clock
>>> only has a resolution of 4000250 ns (about 4 ms), which in your
>>> original patch would have caused VDR not to use the monotonic
>>> clock.
>>
>> That suggests that your kernel is built with HZ=250 (CONFIG_HZ in
>> /proc/config.gz).

>I'm running the default SUSE 10.2 kernel.

>>> Are there actually systems that have a 1 ms resolution?
>>
>> Any with HZ=1000, I expect :-)

>Ok, I see.

>I'll make it a 5 ms limit then, to allow default kernels to work.

http://tldp.org/HOWTO/IO-Port-Programming-4.html

 For delays of under about 50 milliseconds (depending on the speed of your
 processor and machine, and the system load), giving up the CPU takes too much
 time, because the Linux scheduler (for the x86 architecture) usually takes at
 least about 10-30 milliseconds before it returns control to your process. Due
 to this, in small delays, usleep(3) usually delays somewhat more than the
 amount that you specify in the parameters, and at least about 10 ms.

So i assume  it's not just a problem of the "ticks intervall".

Too it might be required to differ between "wall clock" and "time delays".


VDR is (IMOH) a "strong(hard?) real time" application, not just another
file manager with a video interface ;-)
I wonder why linux "high resolution timer" can't be used for timeout and
delay timings.
I assume that those timer uses CPU/ACPI counters and not 
the good old interrupt ticker which origins in a time when a tick faster
than 10ms would allocate the entire CPU and were never intented
to be used in "real time" applications.

See
http://www.opengroup.org/rtforum/jan2002/slides/linux/mehaffey.pdf
etc.

So 10ms "sleep" would always be a 10ms sleep. not a 0ms or 5ms or 15ms
or 20ms, depending when the last tick occured and whichintervall was
choosen.


Another question:

What if the CPU clock is modulated to save power?
  
Anssi Hannula Dec. 3, 2007, 2:05 p.m. UTC | #9
Ville Skyttä wrote:
> On Sunday 02 December 2007, Darren Salt wrote:
>> I demand that Klaus Schmidinger may or may not have written...
>>
>>> On 12/02/07 14:34, Darren Salt wrote:
>>>> I demand that Klaus Schmidinger may or may not have written...
>>> I'll make it a 5 ms limit then, to allow default kernels to work.
>> Valid HZ options are 100, 250, 300 and 1000, unless overridden by an
>> arch-specific Kconfig file. (AFAICS, only mips does this, offering 48, 100,
>> 128, 250, 256, 1000 and 1024.)
> 
> Not that I really know much at all about this, but how would this change 
> behave with NOHZ kernels?

Apparently resolution is reported as 1 ns regardless of HZ when NO_HZ is 
used:

$ ./hz
cTimeMs: using monotonic clock (resolution is 1 ns)
$ zcat /proc/config.gz | grep "_HZ="
CONFIG_NO_HZ=y
CONFIG_HZ=100
  

Patch

--- Makefile	2007/11/04 10:15:59	1.110
+++ Makefile	2007/12/02 11:29:22
@@ -20,7 +20,7 @@ 
 MANDIR   = $(PREFIX)/share/man
 BINDIR   = $(PREFIX)/bin
 LOCDIR   = ./locale
-LIBS     = -ljpeg -lpthread -ldl -lcap -lfreetype -lfontconfig
+LIBS     = -ljpeg -lpthread -ldl -lcap -lrt -lfreetype -lfontconfig
 INCLUDES = -I/usr/include/freetype2
 
 PLUGINDIR= ./PLUGINS
--- tools.c	2007/11/03 15:34:07	1.137
+++ tools.c	2007/12/02 11:52:31
@@ -545,6 +545,40 @@ 
 
 uint64_t cTimeMs::Now(void)
 {
+#if _POSIX_TIMERS > 0 && defined(_POSIX_MONOTONIC_CLOCK)
+  static bool initialized = false;
+  static bool monotonic = false;
+  struct timespec tp;
+  if (!initialized) {
+     // check if monotonic timer is available and provides enough accurate resolution:
+     if (clock_getres(CLOCK_MONOTONIC, &tp) == 0) {
+        long Resolution = tp.tv_nsec;
+        // require at least 10 ms resolution:
+        if (tp.tv_sec == 0 && tp.tv_nsec <= 10000000) {
+           if (clock_gettime(CLOCK_MONOTONIC, &tp) == 0) {
+              dsyslog("cTimeMs: using monotonic clock (resolution is %ld ns)", Resolution);
+              monotonic = true;
+              }
+           else
+              esyslog("cTimeMs: clock_gettime(CLOCL_MONOTONIC) failed");
+           }
+        else
+           dsyslog("cTimeMs: not using monotonic clock - resolution is too bad (%ld s %ld ns)", tp.tv_sec, tp.tv_nsec);
+        }
+     else
+        esyslog("cTimeMs: clock_getres(CLOCK_MONOTONIC) failed");
+     initialized = true;
+     }
+  if (monotonic) {
+     if (clock_gettime(CLOCK_MONOTONIC, &tp) == 0)
+        return (uint64_t(tp.tv_sec)) * 1000 + tp.tv_nsec / 1000000;
+     esyslog("cTimeMs: clock_gettime(CLOCK_MONOTONIC) failed");
+     monotonic = false;
+     // fall back to gettimeofday()
+     }
+#else
+#  warning Posix monotonic clock not available
+#endif
   struct timeval t;
   if (gettimeofday(&t, NULL) == 0)
      return (uint64_t(t.tv_sec)) * 1000 + t.tv_usec / 1000;