improving i18n-to-gettext.pl

Message ID 200708151301.28007.zzam@gentoo.org
State New
Headers

Commit Message

Matthias Schwarzott Aug. 15, 2007, 11:01 a.m. UTC
  Hi there!

This patch should improve i18n-to-gettext.pl to work with all plugins

1.It also lists i18n.h as there seem to be some plugin-writer putting 
translations there.
2. It also strips // comments being in lines without "," at the end.
3. it strips /* ... */ comments being in just one line
4. it strips /* ... */ comments spreading over multiple line
5. it just ignores #if and #endif lines (like used to make plugins compile 
with older vdr-versions which had less languages supported).


General question: Why is the locale dir called de_DE and not just de - as that 
seems what most other programs on my system do?

Matthias
  

Comments

Matthias Fechner Aug. 15, 2007, 11:18 a.m. UTC | #1
Hi Matthias,

Matthias Schwarzott wrote:
> General question: Why is the locale dir called de_DE and not just de - as that 
> seems what most other programs on my system do?

because German is spoken in more then one country: de_DE, de_AT and I
think de_CH and more. I havn't not list with all locales here now.

Best regards
Matthias
  
Matthias Schwarzott Aug. 15, 2007, 12:02 p.m. UTC | #2
On Mittwoch, 15. August 2007, Matthias Fechner wrote:
> Hi Matthias,
>
> Matthias Schwarzott wrote:
> > General question: Why is the locale dir called de_DE and not just de - as
> > that seems what most other programs on my system do?
>
> because German is spoken in more then one country: de_DE, de_AT and I
> think de_CH and more. I havn't not list with all locales here now.
>

Yeah, german is spoken in other countries. Is there then a reason to restrict 
the translation to germany? 

some example:
wget installs the file /usr/share/locale/de/LC_MESSAGES/wget.mo
this is to provide translations for "all" de* locales. Not just the german 
one, but also for austria and swiss.

Matthias
  
Klaus Schmidinger Aug. 15, 2007, 12:43 p.m. UTC | #3
On 08/15/07 14:02, Matthias Schwarzott wrote:
> On Mittwoch, 15. August 2007, Matthias Fechner wrote:
>> Hi Matthias,
>>
>> Matthias Schwarzott wrote:
>>> General question: Why is the locale dir called de_DE and not just de - as
>>> that seems what most other programs on my system do?
>> because German is spoken in more then one country: de_DE, de_AT and I
>> think de_CH and more. I havn't not list with all locales here now.
>>
> 
> Yeah, german is spoken in other countries. Is there then a reason to restrict 
> the translation to germany? 
> 
> some example:
> wget installs the file /usr/share/locale/de/LC_MESSAGES/wget.mo
> this is to provide translations for "all" de* locales. Not just the german 
> one, but also for austria and swiss.

I just tried renaming VDR's "de_DE" locale to "de" and did

LC_ALL=de_AT ./vdr

but it came up with the default English texts. Then I renamed
"de" to "de_AT" and did the same again, and I got the German texts.

I was hoping that gettext would be a little more intelligent and
look for

- an exact match ("de_AT")
- a default ("de")
- any suitable language ("de_DE")

but apparently that's not the case - unless I'm doing something wrong.

Klaus
  
Matthias Schwarzott Aug. 15, 2007, 1:07 p.m. UTC | #4
On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
> On 08/15/07 14:02, Matthias Schwarzott wrote:
> > On Mittwoch, 15. August 2007, Matthias Fechner wrote:
> >> Hi Matthias,
> >>
> >>
> >> because German is spoken in more then one country: de_DE, de_AT and I
> >> think de_CH and more. I havn't not list with all locales here now.
> >
> > Yeah, german is spoken in other countries. Is there then a reason to
> > restrict the translation to germany?
> >
> > some example:
> > wget installs the file /usr/share/locale/de/LC_MESSAGES/wget.mo
> > this is to provide translations for "all" de* locales. Not just the
> > german one, but also for austria and swiss.
>
> I just tried renaming VDR's "de_DE" locale to "de" and did
>
> LC_ALL=de_AT ./vdr
>
This will work, but only if the locale de_AT you set does exist (being in 
output of locale -a).
> but it came up with the default English texts. Then I renamed
> "de" to "de_AT" and did the same again, and I got the German texts.
>
> I was hoping that gettext would be a little more intelligent and
> look for
>
> - an exact match ("de_AT")
> - a default ("de")
> - any suitable language ("de_DE")

I think it does this but not doing "any suitable language".

trying it with ls:
# LC_ALL=de_DE strace ls xxx

open("/usr/share/locale/de_DE.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
open("/usr/share/locale/de_DE/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT 
(No such file or directory)
open("/usr/share/locale/de.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 
ENOENT (No such file or directory)
open("/usr/share/locale/de/LC_MESSAGES/coreutils.mo", O_RDONLY) = 3

You can see that gettext does this:
1. Trying the set locale with some different charsets (de_DE.utf8, de_DE)
2. stripping of country and trying language with different charsets (de.utf8, 
de).

Only condition is that the locale one sets LC_MESSAGES to must exist.


Now my tests: I created de_DE and de_AT locales on my system, but not de_CH.

# LC_ALL=de_DE ls zzzz
ls: Zugriff auf zzzz nicht möglich: Datei oder Verzeichnis nicht gefunden
# LC_ALL=de_AT ls zzzz
ls: Zugriff auf zzzz nicht möglich: Datei oder Verzeichnis nicht gefunden
# LC_ALL=de_CH ls zzzz
ls: cannot access zzzz: No such file or directory

# LC_ALL=de ls zzz
ls: cannot access zzzz: No such file or directory

The reason vdr does not work with directory called de is the same as LC_ALL=de 
will not work.
There is no locale called de even if the directory is called de.


Matthias
  
Klaus Schmidinger Aug. 15, 2007, 5:17 p.m. UTC | #5
On 08/15/07 15:07, Matthias Schwarzott wrote:
> On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
>> On 08/15/07 14:02, Matthias Schwarzott wrote:
>>> On Mittwoch, 15. August 2007, Matthias Fechner wrote:
>>>> Hi Matthias,
>>>>
>>>>
>>>> because German is spoken in more then one country: de_DE, de_AT and I
>>>> think de_CH and more. I havn't not list with all locales here now.
>>> Yeah, german is spoken in other countries. Is there then a reason to
>>> restrict the translation to germany?
>>>
>>> some example:
>>> wget installs the file /usr/share/locale/de/LC_MESSAGES/wget.mo
>>> this is to provide translations for "all" de* locales. Not just the
>>> german one, but also for austria and swiss.
>> I just tried renaming VDR's "de_DE" locale to "de" and did
>>
>> LC_ALL=de_AT ./vdr
>>
> This will work, but only if the locale de_AT you set does exist (being in 
> output of locale -a).
>> but it came up with the default English texts. Then I renamed
>> "de" to "de_AT" and did the same again, and I got the German texts.
>>
>> I was hoping that gettext would be a little more intelligent and
>> look for
>>
>> - an exact match ("de_AT")
>> - a default ("de")
>> - any suitable language ("de_DE")
> 
> I think it does this but not doing "any suitable language".
> 
> trying it with ls:
> # LC_ALL=de_DE strace ls xxx
> 
> open("/usr/share/locale/de_DE.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 
> ENOENT (No such file or directory)
> open("/usr/share/locale/de_DE/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT 
> (No such file or directory)
> open("/usr/share/locale/de.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 
> ENOENT (No such file or directory)
> open("/usr/share/locale/de/LC_MESSAGES/coreutils.mo", O_RDONLY) = 3
> 
> You can see that gettext does this:
> 1. Trying the set locale with some different charsets (de_DE.utf8, de_DE)
> 2. stripping of country and trying language with different charsets (de.utf8, 
> de).
> 
> Only condition is that the locale one sets LC_MESSAGES to must exist.
> 
> 
> Now my tests: I created de_DE and de_AT locales on my system, but not de_CH.
> 
> # LC_ALL=de_DE ls zzzz
> ls: Zugriff auf zzzz nicht möglich: Datei oder Verzeichnis nicht gefunden
> # LC_ALL=de_AT ls zzzz
> ls: Zugriff auf zzzz nicht möglich: Datei oder Verzeichnis nicht gefunden
> # LC_ALL=de_CH ls zzzz
> ls: cannot access zzzz: No such file or directory
> 
> # LC_ALL=de ls zzz
> ls: cannot access zzzz: No such file or directory
> 
> The reason vdr does not work with directory called de is the same as LC_ALL=de 
> will not work.
> There is no locale called de even if the directory is called de.

Well, if setlocale() can only be called with something like "de_DE"
and not kust "de", then it is clear that VDR's locale directories
must be named "de_DE" etc., because VDR uses these names to call
setlocale() when switching the language during runtime.

I'll add the missing intelligence to I18nInitialize...

Klaus
  

Patch

Index: vdr-1.5.7/i18n-to-gettext.pl
===================================================================
--- vdr-1.5.7.orig/i18n-to-gettext.pl
+++ vdr-1.5.7/i18n-to-gettext.pl
@@ -75,7 +75,7 @@  die "can't find plugin name!" unless ($P
 # Locate the file containing the texts:
 
 $I18NFILE = "";
-for ("i18n.c", `ls *.c`) { # try i18n.c explicitly first
+for ("i18n.c", "i18n.h", `ls *.c`) { # try i18n.c explicitly first
     chomp($f = $_);
     if (-f $f && `grep tI18nPhrase $f`) {
        $I18NFILE = $f;
@@ -204,13 +204,26 @@  $POTFILE = "$PODIR/$PLUGIN.pot";
 # Collect all translated texts:
 
 open(F, $I18NFILE) || die "$I18NFILE: $!\n";
+$in_comment=0;
 while (<F>) {
       chomp;
       s/\t/ /g; # get rid of tabs
       s/  *$//; # get rid of trailing blanks
       s/^ *\/\/.*//; # remove comment lines
-      s/, *\/\/.*/,/; # strip trailing comments
+      s/ *\/\/.*//; # strip trailing comments
+      s/\/\*.*\*\///g; # strip c comments
+      if (/\/\*/) {
+        $in_comment=1;
+	s/\/\*.*$//; # remove starting of comment
+      } elsif (/\*\//) {
+        $in_comment=0;
+	s/^.*\*\///; # remove end of comment
+      } elsif ($in_comment) {
+        next;
+      }
       next if (/^ *$/); # skip empty lines
+      next if (/#if/);
+      next if (/#endif/);
       next unless ($found or $found = /const *tI18nPhrase .*{/); # sync on phrases
       next if (/const *tI18nPhrase .*{/); # skip sync line
       last if (/{ *NULL *}/); # stop after last phrase