New shutdown procedure not working as expected

Message ID 46B0FC36.5040708@googlemail.com
State New
Headers

Commit Message

Petri Helin Aug. 1, 2007, 9:33 p.m. UTC
  Hi,

It seems that the new shutdown procedure introduced for versions 1.5.x 
is (at least for me) not working as expected. The problem is that it 
does not recognize correctly whether VDR was started manually or not. In 
fact it thinks every time that it has been started manually. I traced 
the problem to the SystemExec call in 
cShutdownHandler::CallShutdownCommand where Setup.NextWakeupTime is 
updated only if the SystemExec call returns 0. i changed it to accept 
all values greater or equal to 0 and now Setup.NextWakeupTime gets 
updated properly. Can some explain me why 0 is expected?

Here is the patch that I came up with:

for comparison on reboot
  }


-Petri
  

Comments

Udo Richter Aug. 1, 2007, 10:07 p.m. UTC | #1
Petri Helin wrote:
> I traced the problem to the SystemExec call in 
> cShutdownHandler::CallShutdownCommand where Setup.NextWakeupTime is 
> updated only if the SystemExec call returns 0. i changed it to accept 
> all values greater or equal to 0 and now Setup.NextWakeupTime gets 
> updated properly. Can some explain me why 0 is expected?

SystemExec called with true as second parameter should only return -1 in 
case of an error (logged to syslog) or the status of waitpid() in case 
of success. (see thread.c) Since the direct child instantly does 
exit(0), I assumed that status should also be 0.

On the other hand, the man page of waitpid mentions lots of macros to 
inspect the status return of waitpid. Maybe checking on these helps here.

Cheers,

Udo
  
Petri Helin Aug. 7, 2007, 7:27 p.m. UTC | #2
Udo Richter wrote:
> Petri Helin wrote:
>> I traced the problem to the SystemExec call in 
>> cShutdownHandler::CallShutdownCommand where Setup.NextWakeupTime is 
>> updated only if the SystemExec call returns 0. i changed it to accept 
>> all values greater or equal to 0 and now Setup.NextWakeupTime gets 
>> updated properly. Can some explain me why 0 is expected?
> 
> SystemExec called with true as second parameter should only return -1 in 
> case of an error (logged to syslog) or the status of waitpid() in case 
> of success. (see thread.c) Since the direct child instantly does 
> exit(0), I assumed that status should also be 0.
> 
> On the other hand, the man page of waitpid mentions lots of macros to 
> inspect the status return of waitpid. Maybe checking on these helps here.
> 

It seems to me that VDR still fails writing the NextWakeupTime every now 
and then, even after I made the change described earlier. Do you have 
suggestions on how to shutdown properly in order to get the 
NextWakeupTime written to setup.conf? I run VDR as a daemon and shut it 
down in shutdown script by calling start-stop-daemon (I use Ubuntu Feisty).

-Petri
  
Udo Richter Aug. 8, 2007, 6:55 p.m. UTC | #3
Petri Helin wrote:
> It seems to me that VDR still fails writing the NextWakeupTime every now 
> and then, even after I made the change described earlier. Do you have 
> suggestions on how to shutdown properly in order to get the 
> NextWakeupTime written to setup.conf? I run VDR as a daemon and shut it 
> down in shutdown script by calling start-stop-daemon (I use Ubuntu Feisty).

You can try something like this for standards conformity:

   int ret = SystemExec(cmd, true);
   if (WIFEXITED(ret) && WEXITSTATUS(ret) == 0)
      Setup.NextWakeupTime = WakeupTime; // Remember this wakeup time

(not tested)
Or, you can try the permissive way and test for != -1, which will only 
catch fatal errors like fork failing.

In any case, try dumping the return value to syslog, I would really like 
to know what additional flag is set on the return value.

Cheers,

Udo
  
Petri Helin Aug. 8, 2007, 7:40 p.m. UTC | #4
Udo Richter wrote:
> 
> In any case, try dumping the return value to syslog, I would really like 
> to know what additional flag is set on the return value.
> 

I got 6, 9 and 11 when I did some debugging earlier.


-Petri
  
Udo Richter Aug. 9, 2007, 5:32 p.m. UTC | #5
Petri Helin wrote:
> Udo Richter wrote:
>> In any case, try dumping the return value to syslog, I would really like 
>> to know what additional flag is set on the return value.
>>
> 
> I got 6, 9 and 11 when I did some debugging earlier.

I *think* that these are kill signals received by the child process. 
Which is strange, as the child does an exit immediately. (Unless you're 
somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)

6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.


Cheers,

Udo
  
Petri Helin Aug. 9, 2007, 6:52 p.m. UTC | #6
Udo Richter wrote:
> Petri Helin wrote:
>> Udo Richter wrote:
>>> In any case, try dumping the return value to syslog, I would really like 
>>> to know what additional flag is set on the return value.
>>>
>> I got 6, 9 and 11 when I did some debugging earlier.
> 
> I *think* that these are kill signals received by the child process. 
> Which is strange, as the child does an exit immediately. (Unless you're 
> somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)
> 
> 6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.
> 
> 

I use version 1.5.6. I tested with my normal shutdown script and with a 
script that just does an "exit 0". No difference there. Can the script 
itself have any influence on the return value?

-Petri
  
Udo Richter Aug. 9, 2007, 9:22 p.m. UTC | #7
Petri Helin wrote:
> Udo Richter wrote:
>> I *think* that these are kill signals received by the child process. 
>> Which is strange, as the child does an exit immediately. (Unless you're 
>> somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)
>>
>> 6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.
> 
> I use version 1.5.6. I tested with my normal shutdown script and with a 
> script that just does an "exit 0". No difference there. Can the script 
> itself have any influence on the return value?

Since 1.5.4, VDR forks into a child that does nothing but fork off a 
second child, and do an exit(0). The second child exec's the shutdown 
script:

VDR
  +-- Child 1 -> exits
        +-- Child 2 -> runs shutdown

That way the waitpid quickly returns, the second child becomes orphaned, 
and no process ends up being a zombie.

Since the shutdown script runs orphaned, the return value of the script 
is not evaluated, nor does anyone wait for the script to terminate.

Cheers,

Udo
  
C.Y.M Aug. 9, 2007, 10:29 p.m. UTC | #8
On 8/9/07, Udo Richter <udo_richter@gmx.de> wrote:
>
> Petri Helin wrote:
> > Udo Richter wrote:
> >> I *think* that these are kill signals received by the child process.
> >> Which is strange, as the child does an exit immediately. (Unless you're
> >> somewhere between 1.5.1 and 1.5.3 - this changed in 1.5.4)
> >>
> >> 6 is SIGABRT, 11 is SIGSEGV and 9 is SIGKILL.
> >
> > I use version 1.5.6. I tested with my normal shutdown script and with a
> > script that just does an "exit 0". No difference there. Can the script
> > itself have any influence on the return value?
>
> Since 1.5.4, VDR forks into a child that does nothing but fork off a
> second child, and do an exit(0). The second child exec's the shutdown
> script:
>
> VDR
>   +-- Child 1 -> exits
>         +-- Child 2 -> runs shutdown
>
> That way the waitpid quickly returns, the second child becomes orphaned,
> and no process ends up being a zombie.
>
> Since the shutdown script runs orphaned, the return value of the script
> is not evaluated, nor does anyone wait for the script to terminate.



I noticed my vdr-1.5.6 did an emergency shutdown when I had poor signal
reception during a recording, which in itself is a fine thing to do... but
my "runvdr" script didn't seem to catch the bad exit code (exit 1) when vdr
did that, so VDR never restarted.   This code snippet used to work for me,
but doesn't now:

      eval "screen -D -m -S vdr $VDRCMD &"
      # Remember PID of VDR process
      PID=$!
      # Wait for VDR to end or signal to arrive
      wait $PID
      # Remember return value of VDR
      RET=$?
      if test $RET -eq 0 -o $RET -eq 2; then exit; fi
      TIMEOFDEATH=$(date +%s)

so, something is wrong with RET.  Does this have to do with the children
spawning the exit code?

Groeten.
  
Udo Richter Aug. 10, 2007, 9:26 p.m. UTC | #9
Stone wrote:
> I noticed my vdr-1.5.6 did an emergency shutdown when I had poor signal 
> reception during a recording, which in itself is a fine thing to do... 
> but my "runvdr" script didn't seem to catch the bad exit code (exit 1) 
> when vdr did that, so VDR never restarted. 
> 
> Does this have to do with the children spawning the exit code?

No, the exit code of VDR is independent of what the script returns. 
You're sure that it did not exit with 0 or 2? Maybe some other problem 
prevented the next restart?

Cheers,

Udo
  
C.Y.M Aug. 10, 2007, 10:13 p.m. UTC | #10
On 8/10/07, Udo Richter <udo_richter@gmx.de> wrote:
>
> Stone wrote:
> > I noticed my vdr-1.5.6 did an emergency shutdown when I had poor signal
> > reception during a recording, which in itself is a fine thing to do...
> > but my "runvdr" script didn't seem to catch the bad exit code (exit 1)
> > when vdr did that, so VDR never restarted.
> >
> > Does this have to do with the children spawning the exit code?
>
> No, the exit code of VDR is independent of what the script returns.
> You're sure that it did not exit with 0 or 2? Maybe some other problem
> prevented the next restart?


It is possible there was something else going on. I will investigate
further, but thanks for the  info.  Perhaps I will skip my runvdr script and
just use your "extreme" runvdr version instead :)

Groeten.
  

Patch

--- shutdown.c.orig     2007-08-02 00:14:49.000000000 +0300
+++ shutdown.c  2007-08-02 00:15:10.000000000 +0300
@@ -126,7 +126,7 @@ 
    time_t Delta = WakeupTime ? WakeupTime - time(NULL) : 0;
    cString cmd = cString::sprintf("%s %ld %ld %d \"%s\" %d", 
shutdownCommand, WakeupTime, Delta, Channel, *strescape(File, "\"$"), 
UserShutdown);
    isyslog("executing '%s'", *cmd);
-  if (SystemExec(cmd, true) == 0)
+  if (SystemExec(cmd, true) >= 0)
       Setup.NextWakeupTime = WakeupTime; // Remember this wakeup time