improving i18n-to-gettext.pl

Message ID 46C5967B.2040003@cadsoft.de
State New
Headers

Commit Message

Klaus Schmidinger Aug. 17, 2007, 12:37 p.m. UTC
  On 08/15/07 15:07, Matthias Schwarzott wrote:
> On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
>> On 08/15/07 14:02, Matthias Schwarzott wrote:
>>> On Mittwoch, 15. August 2007, Matthias Fechner wrote:
>>>> Hi Matthias,
>>>>
>>>>
>>>> because German is spoken in more then one country: de_DE, de_AT and I
>>>> think de_CH and more. I havn't not list with all locales here now.
>>> Yeah, german is spoken in other countries. Is there then a reason to
>>> restrict the translation to germany?
>>>
>>> some example:
>>> wget installs the file /usr/share/locale/de/LC_MESSAGES/wget.mo
>>> this is to provide translations for "all" de* locales. Not just the
>>> german one, but also for austria and swiss.
>> I just tried renaming VDR's "de_DE" locale to "de" and did
>>
>> LC_ALL=de_AT ./vdr
>>
> This will work, but only if the locale de_AT you set does exist (being in 
> output of locale -a).
>> but it came up with the default English texts. Then I renamed
>> "de" to "de_AT" and did the same again, and I got the German texts.
>>
>> I was hoping that gettext would be a little more intelligent and
>> look for
>>
>> - an exact match ("de_AT")
>> - a default ("de")
>> - any suitable language ("de_DE")
> 
> I think it does this but not doing "any suitable language".
> ...

Could you please try the attached patch and see whether this
works for you?

This should, e.g., select any "de*" locale in case there is no fully
matching one.

Klaus
  

Comments

Matthias Schwarzott Aug. 18, 2007, 10:55 a.m. UTC | #1
On Freitag, 17. August 2007, Klaus Schmidinger wrote:
> On 08/15/07 15:07, Matthias Schwarzott wrote:
> > On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
> >
> > This will work, but only if the locale de_AT you set does exist (being in
> > output of locale -a).
> >
> >> but it came up with the default English texts. Then I renamed
> >> "de" to "de_AT" and did the same again, and I got the German texts.
> >>
> >> I was hoping that gettext would be a little more intelligent and
> >> look for
> >>
> >> - an exact match ("de_AT")
> >> - a default ("de")
> >> - any suitable language ("de_DE")
> >
> > I think it does this but not doing "any suitable language".
> > ...
>
> Could you please try the attached patch and see whether this
> works for you?
>
> This should, e.g., select any "de*" locale in case there is no fully
> matching one.
>
Not yet tested, but code looks promising.

Another way to get list of usable locales is this:
Checking the subdirs of /usr/lib/locale/
And then using all, that have associated mo file under vdr's LOCALEDIR.
Sadly I don't know if there is a better way than hardcoding that directory.

But "locale -a" command will give the same result - maybe analyzing its code 
will help (or just calling this external command).

Matthias
  
Klaus Schmidinger Aug. 18, 2007, 11:05 a.m. UTC | #2
On 08/18/07 12:55, Matthias Schwarzott wrote:
> On Freitag, 17. August 2007, Klaus Schmidinger wrote:
>> On 08/15/07 15:07, Matthias Schwarzott wrote:
>>> On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
>>>
>>> This will work, but only if the locale de_AT you set does exist (being in
>>> output of locale -a).
>>>
>>>> but it came up with the default English texts. Then I renamed
>>>> "de" to "de_AT" and did the same again, and I got the German texts.
>>>>
>>>> I was hoping that gettext would be a little more intelligent and
>>>> look for
>>>>
>>>> - an exact match ("de_AT")
>>>> - a default ("de")
>>>> - any suitable language ("de_DE")
>>> I think it does this but not doing "any suitable language".
>>> ...
>> Could you please try the attached patch and see whether this
>> works for you?
>>
>> This should, e.g., select any "de*" locale in case there is no fully
>> matching one.
>>
> Not yet tested, but code looks promising.
> 
> Another way to get list of usable locales is this:
> Checking the subdirs of /usr/lib/locale/
> And then using all, that have associated mo file under vdr's LOCALEDIR.
> Sadly I don't know if there is a better way than hardcoding that directory.
> 
> But "locale -a" command will give the same result - maybe analyzing its code 
> will help (or just calling this external command).

Currently VDR has its own directory with all its supported locales.
It can quickly collect all locales by going through the entries
in that directory. I can even compile my VDR so that it searches
for the locales in "./locale" inside the source directory.

I like the simplicity of this, and wouldn't want to make it any
more complex.

Klaus
  
Matthias Schwarzott Aug. 18, 2007, 11:10 a.m. UTC | #3
On Samstag, 18. August 2007, Klaus Schmidinger wrote:
> On 08/18/07 12:55, Matthias Schwarzott wrote:
> > On Freitag, 17. August 2007, Klaus Schmidinger wrote:
> >> On 08/15/07 15:07, Matthias Schwarzott wrote:
> >>> On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
> >>>
> >>> This will work, but only if the locale de_AT you set does exist (being
> >>> in output of locale -a).
> >>>
> >>>> but it came up with the default English texts. Then I renamed
> >>>> "de" to "de_AT" and did the same again, and I got the German texts.
> >>>>
> >>>> I was hoping that gettext would be a little more intelligent and
> >>>> look for
> >>>>
> >>>> - an exact match ("de_AT")
> >>>> - a default ("de")
> >>>> - any suitable language ("de_DE")
> >>>
> >>> I think it does this but not doing "any suitable language".
> >>> ...
> >>
> >> Could you please try the attached patch and see whether this
> >> works for you?
> >>
> >> This should, e.g., select any "de*" locale in case there is no fully
> >> matching one.
> >
> > Not yet tested, but code looks promising.
> >
> > Another way to get list of usable locales is this:
> > Checking the subdirs of /usr/lib/locale/
> > And then using all, that have associated mo file under vdr's LOCALEDIR.
> > Sadly I don't know if there is a better way than hardcoding that
> > directory.
> >
> > But "locale -a" command will give the same result - maybe analyzing its
> > code will help (or just calling this external command).
>
> Currently VDR has its own directory with all its supported locales.
> It can quickly collect all locales by going through the entries
> in that directory. I can even compile my VDR so that it searches
> for the locales in "./locale" inside the source directory.
>
> I like the simplicity of this, and wouldn't want to make it any
> more complex.
>
The directory /usr/lib/locale does NOT contain any translations, but rather a 
directory for every locale you can set via setlocale.
Its meant as a replacement of the setlocale loop.

Btw. arent these two calls identical
setlocale(LC_MESSAGES, oldLocale);
setlocale(LC_MESSAGES, "");


Matthias
  
Klaus Schmidinger Aug. 18, 2007, 11:24 a.m. UTC | #4
On 08/18/07 13:10, Matthias Schwarzott wrote:
> On Samstag, 18. August 2007, Klaus Schmidinger wrote:
>> On 08/18/07 12:55, Matthias Schwarzott wrote:
>>> On Freitag, 17. August 2007, Klaus Schmidinger wrote:
>>>> On 08/15/07 15:07, Matthias Schwarzott wrote:
>>>>> On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
>>>>>
>>>>> This will work, but only if the locale de_AT you set does exist (being
>>>>> in output of locale -a).
>>>>>
>>>>>> but it came up with the default English texts. Then I renamed
>>>>>> "de" to "de_AT" and did the same again, and I got the German texts.
>>>>>>
>>>>>> I was hoping that gettext would be a little more intelligent and
>>>>>> look for
>>>>>>
>>>>>> - an exact match ("de_AT")
>>>>>> - a default ("de")
>>>>>> - any suitable language ("de_DE")
>>>>> I think it does this but not doing "any suitable language".
>>>>> ...
>>>> Could you please try the attached patch and see whether this
>>>> works for you?
>>>>
>>>> This should, e.g., select any "de*" locale in case there is no fully
>>>> matching one.
>>> Not yet tested, but code looks promising.
>>>
>>> Another way to get list of usable locales is this:
>>> Checking the subdirs of /usr/lib/locale/
>>> And then using all, that have associated mo file under vdr's LOCALEDIR.
>>> Sadly I don't know if there is a better way than hardcoding that
>>> directory.
>>>
>>> But "locale -a" command will give the same result - maybe analyzing its
>>> code will help (or just calling this external command).
>> Currently VDR has its own directory with all its supported locales.
>> It can quickly collect all locales by going through the entries
>> in that directory. I can even compile my VDR so that it searches
>> for the locales in "./locale" inside the source directory.
>>
>> I like the simplicity of this, and wouldn't want to make it any
>> more complex.
>>
> The directory /usr/lib/locale does NOT contain any translations, but rather a 
> directory for every locale you can set via setlocale.
> Its meant as a replacement of the setlocale loop.

I'm afraid I don't see what you mean.
I know that the "locale" directory doesn't contain translations directly,
but rather subdirectories. VDR gathers the names of these subdirectories
and does a setlocale() for each of them. Then it tries to get the
translation of "LanguageName$English" in order to build a list of all
available languages. How else do you suggest that could be done?

> Btw. arent these two calls identical
> setlocale(LC_MESSAGES, oldLocale);
> setlocale(LC_MESSAGES, "");

I guess so.

Klaus
  
Anssi Hannula Aug. 18, 2007, 11:46 a.m. UTC | #5
Klaus Schmidinger wrote:
> On 08/18/07 13:10, Matthias Schwarzott wrote:
>> On Samstag, 18. August 2007, Klaus Schmidinger wrote:
>>> On 08/18/07 12:55, Matthias Schwarzott wrote:
>>>> On Freitag, 17. August 2007, Klaus Schmidinger wrote:
>>>>> On 08/15/07 15:07, Matthias Schwarzott wrote:
>>>>>> On Mittwoch, 15. August 2007, Klaus Schmidinger wrote:
>>>>>>
>>>>>> This will work, but only if the locale de_AT you set does exist (being
>>>>>> in output of locale -a).
>>>>>>
>>>>>>> but it came up with the default English texts. Then I renamed
>>>>>>> "de" to "de_AT" and did the same again, and I got the German texts.
>>>>>>>
>>>>>>> I was hoping that gettext would be a little more intelligent and
>>>>>>> look for
>>>>>>>
>>>>>>> - an exact match ("de_AT")
>>>>>>> - a default ("de")
>>>>>>> - any suitable language ("de_DE")
>>>>>> I think it does this but not doing "any suitable language".
>>>>>> ...
>>>>> Could you please try the attached patch and see whether this
>>>>> works for you?
>>>>>
>>>>> This should, e.g., select any "de*" locale in case there is no fully
>>>>> matching one.
>>>> Not yet tested, but code looks promising.
>>>>
>>>> Another way to get list of usable locales is this:
>>>> Checking the subdirs of /usr/lib/locale/
>>>> And then using all, that have associated mo file under vdr's LOCALEDIR.
>>>> Sadly I don't know if there is a better way than hardcoding that
>>>> directory.
>>>>
>>>> But "locale -a" command will give the same result - maybe analyzing its
>>>> code will help (or just calling this external command).
>>> Currently VDR has its own directory with all its supported locales.
>>> It can quickly collect all locales by going through the entries
>>> in that directory. I can even compile my VDR so that it searches
>>> for the locales in "./locale" inside the source directory.
>>>
>>> I like the simplicity of this, and wouldn't want to make it any
>>> more complex.
>>>
>> The directory /usr/lib/locale does NOT contain any translations, but rather a 
>> directory for every locale you can set via setlocale.
>> Its meant as a replacement of the setlocale loop.
> 
> I'm afraid I don't see what you mean.
> I know that the "locale" directory doesn't contain translations directly,
> but rather subdirectories. VDR gathers the names of these subdirectories
> and does a setlocale() for each of them. Then it tries to get the
> translation of "LanguageName$English" in order to build a list of all
> available languages. How else do you suggest that could be done?

I think he meant to traverse the system locales directory to gather the 
list of potentially valid locales that can be used to call setlocale().

The VDR locale directory names may or may not be valid locale names on 
the running system.

This is what AFAICS "locale -a" uses (glibc/locale/programs/locale.c). 
It also checks the existence of the locale identification file and 
parses locale aliases from locale.alias.
  
Matthias Schwarzott Aug. 18, 2007, 12:20 p.m. UTC | #6
On Samstag, 18. August 2007, Klaus Schmidinger wrote:
> On 08/18/07 13:10, Matthias Schwarzott wrote:
> >
> > The directory /usr/lib/locale does NOT contain any translations, but
> > rather a directory for every locale you can set via setlocale.
> > Its meant as a replacement of the setlocale loop.
>
> I'm afraid I don't see what you mean.
> I know that the "locale" directory doesn't contain translations directly,
> but rather subdirectories. VDR gathers the names of these subdirectories
> and does a setlocale() for each of them. Then it tries to get the
> translation of "LanguageName$English" in order to build a list of all
> available languages. How else do you suggest that could be done?
>

First: general directory layout:
/usr/share/locale/*/LC_MESSAGES/*.mo contains translations

the oposite is /usr/lib/locale. This does NOT contain translations (and if you 
insisit on the difference: nowhere in the subdirs are translations). There 
are just descriptions of the available locales.

You now do this:
Loop over the subdirs of vdr-private-locale directory and then check which of 
these are actually available by doing setlocale.

Now this can also be done by a check if there is some matching directory 
under /usr/lib/locale/.

Regarding the english name of the associated language:
locale -a -v
will not only print the list of locales,  but also a lot of detail info. This 
info should also be available via some API.
But searching did not produce any API to query this.

Some cut out example:
# locale -a -v
...
locale: de_DE.utf8      directory: /usr/lib/locale/de_DE.utf8
-------------------------------------------------------------------------------
    title | German locale for Germany
   source | Free Software Foundation, Inc.
  address | 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA
    email | bug-glibc-locales@gnu.org
 language | German
territory | Germany
 revision | 1.0
     date | 2000-06-24
  codeset | UTF-8

locale: en_GB           directory: /usr/lib/locale/en_GB
-------------------------------------------------------------------------------
    title | English locale for Britain
   source | RAP
  address | Sankt J?rgens Alle 8, DK-1615 K?benhavn V, Danmark
  contact | Keld Simonsen
    email | bug-glibc-locales@gnu.org
 language | English
territory | Great Britain
 revision | 1.0
     date | 2000-06-28
  codeset | ISO-8859-1
...


Matthias
  
Anssi Hannula Aug. 18, 2007, 12:26 p.m. UTC | #7
Matthias Schwarzott wrote:
> On Samstag, 18. August 2007, Klaus Schmidinger wrote:
>> On 08/18/07 13:10, Matthias Schwarzott wrote:
>>> The directory /usr/lib/locale does NOT contain any translations, but
>>> rather a directory for every locale you can set via setlocale.
>>> Its meant as a replacement of the setlocale loop.
>> I'm afraid I don't see what you mean.
>> I know that the "locale" directory doesn't contain translations directly,
>> but rather subdirectories. VDR gathers the names of these subdirectories
>> and does a setlocale() for each of them. Then it tries to get the
>> translation of "LanguageName$English" in order to build a list of all
>> available languages. How else do you suggest that could be done?
>>
> 
> First: general directory layout:
> /usr/share/locale/*/LC_MESSAGES/*.mo contains translations
> 
> the oposite is /usr/lib/locale. This does NOT contain translations (and if you 
> insisit on the difference: nowhere in the subdirs are translations). There 
> are just descriptions of the available locales.

There is no /usr/lib/locale on my system. All the files are in 
/usr/share/locale.
  
Klaus Schmidinger Aug. 18, 2007, 12:29 p.m. UTC | #8
On 08/18/07 14:20, Matthias Schwarzott wrote:
> On Samstag, 18. August 2007, Klaus Schmidinger wrote:
>> On 08/18/07 13:10, Matthias Schwarzott wrote:
>>> The directory /usr/lib/locale does NOT contain any translations, but
>>> rather a directory for every locale you can set via setlocale.
>>> Its meant as a replacement of the setlocale loop.
>> I'm afraid I don't see what you mean.
>> I know that the "locale" directory doesn't contain translations directly,
>> but rather subdirectories. VDR gathers the names of these subdirectories
>> and does a setlocale() for each of them. Then it tries to get the
>> translation of "LanguageName$English" in order to build a list of all
>> available languages. How else do you suggest that could be done?
>>
> 
> First: general directory layout:
> /usr/share/locale/*/LC_MESSAGES/*.mo contains translations
> 
> the oposite is /usr/lib/locale. This does NOT contain translations (and if you 
> insisit on the difference: nowhere in the subdirs are translations). There 
> are just descriptions of the available locales.

Sorry, apparently we have a "lib" vs. "share" mixup here.
VDR searches in /usr/share/vdr/locale by default.
                     *****

> You now do this:
> Loop over the subdirs of vdr-private-locale directory and then check which of 
> these are actually available by doing setlocale.
> 
> Now this can also be done by a check if there is some matching directory 
> under /usr/lib/locale/.

But VDR needs to do the setlocale() call, anyway, because it needs
to know the language names and the three letter langauge code.
While the language name might even be derived otherwise, the
three letter language code sure can only be derived from a VDR
*.mo file.

Klaus
  
Anssi Hannula Aug. 18, 2007, 12:36 p.m. UTC | #9
Matthias Schwarzott wrote:
> Regarding the english name of the associated language:
> locale -a -v
> will not only print the list of locales,  but also a lot of detail info. This 
> info should also be available via some API.
> But searching did not produce any API to query this.

nl_langinfo(_NL_IDENTIFICATION_LANGUAGE);
nl_langinfo(_NL_IDENTIFICATION_TERRITORY);

etc etc

But I do not see how these could be used.

> Some cut out example:
> # locale -a -v
> ...
> locale: de_DE.utf8      directory: /usr/lib/locale/de_DE.utf8
> -------------------------------------------------------------------------------
>     title | German locale for Germany
>    source | Free Software Foundation, Inc.
>   address | 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA
>     email | bug-glibc-locales@gnu.org
>  language | German
> territory | Germany
>  revision | 1.0
>      date | 2000-06-24
>   codeset | UTF-8
> 
> locale: en_GB           directory: /usr/lib/locale/en_GB
> -------------------------------------------------------------------------------
>     title | English locale for Britain
>    source | RAP
>   address | Sankt J?rgens Alle 8, DK-1615 K?benhavn V, Danmark
>   contact | Keld Simonsen
>     email | bug-glibc-locales@gnu.org
>  language | English
> territory | Great Britain
>  revision | 1.0
>      date | 2000-06-28
>   codeset | ISO-8859-1
> ...
> 
> 
> Matthias
>
  

Patch

--- i18n.c	2007/08/15 14:17:56	1.309
+++ i18n.c	2007/08/17 12:31:17
@@ -100,12 +100,15 @@ 
   cFileNameList Locales(I18nLocaleDir, true);
   if (Locales.Size() > 0) {
      dsyslog("found %d locales in %s", Locales.Size(), I18nLocaleDir);
+     int MatchFull = 0, MatchPartial = 0;
      char *OldLocale = strdup(setlocale(LC_MESSAGES, NULL));
      for (int i = 0; i < Locales.Size(); i++) {
          if (i < I18N_MAX_LANGUAGES - 1) {
             if (setlocale(LC_MESSAGES, Locales[i])) {
                if (strstr(OldLocale, Locales[i]) == OldLocale)
-                  CurrentLanguage = LanguageLocales.Size();
+                  MatchFull = LanguageLocales.Size();
+               else if (strncmp(OldLocale, Locales[i], 2) == 0)
+                  MatchPartial = LanguageLocales.Size();
                LanguageLocales.Append(strdup(Locales[i]));
                LanguageNames.Append(strdup(gettext(LanguageName)));
                const char *Code = gettext(LanguageCode);
@@ -121,7 +124,8 @@ 
          else
             esyslog("ERROR: too many locales - increase I18N_MAX_LANGUAGES!");
          }
-     setlocale(LC_MESSAGES, OldLocale);
+     CurrentLanguage = MatchFull ? MatchFull : MatchPartial;
+     setlocale(LC_MESSAGES, CurrentLanguage ? LanguageLocales[CurrentLanguage] : OldLocale);
      free(OldLocale);
      }
   // Prepare any known language codes for which there was no locale: