wrong characters in EPG (vdr-1.5.18)

Message ID 47E186DD.6030507@cadsoft.de
State New

Commit Message

Klaus Schmidinger March 19, 2008, 9:34 p.m. UTC
  On 03/19/08 11:11, Éric Laly wrote:
> Klaus Schmidinger a écrit :
>> On 03/19/08 10:51, Éric Laly wrote:
>>> Hello all,
>>> I've rebuilt my vdr with two DVB-T cards.
>>> Until then it was running with a very old vdr (1.3) and DVB-S.
>>> My locale was set to fr_FR in order to have the good charset and 
>>> everything was fine (vdr menu and EPG in french).
>>> With the new 1.5 series I've understood that vdr now supports unicode 
>>> (since 1.5.12 ?) so my locale is now set to fr_FR.UTF-8.
>>> The vdr menus are in french and the accentuated character are good ( é, 
>>> è, î, à ...) but in the EPG the accentuated characters are wrong with 
>>> some channels.
>>> For exemple now, the EPG is showing "l'odyssøe" instead of "l'odyssée" 
>>> on ARTE but is showing "Bien-être" on direct8 (which are the good 
>>> characters).
>> Does the problem persist if you stop VDR, delete the epg.data file, and
>> restart it?
> I've just tried and unfortunately yes.

Please do this

and check which encodings are listed in EPG strings for ARTE and




Éric Laly March 20, 2008, 8:46 a.m. UTC | #1
Klaus Schmidinger a écrit :


> Please do this
> --- libsi/si.c  2008/03/05 17:00:55     1.25
> +++ libsi/si.c  2008/03/19 21:30:47
> @@ -416,6 +416,10 @@
>      // FIXME Need to make this UTF-8 aware (different control codes).
>      // However, there's yet to be found a broadcaster that actually
>      // uses UTF-8 for the SI data... (kls 2007-06-10)
> +   if (size > 20) {
> +      to = stpcpy(to, cs);
> +      to = stpcpy(to, "@");
> +      }
>      for (int i = 0; i < len; i++) {
>         if (*from == 0)
>            break;
> and check which encodings are listed in EPG strings for ARTE and
> direct8.
I'm not at home now but I've just tried via network and get results via 
See joined files.

It seems that EPG that are correctly displayed are in 8859-9 and the 
others in ISO6937.


215-X 2 01 fra 
215-X 1 01 fra 
215-E 26679 1206003300 3900 4F 8
215-T ISO-8859-9@US 15
215-S ISO-8859-9@Le classement des meilleurs titres américains du moment.
215-X 2 01 fra 
215-X 1 01 fra 
215-C T-8442-2-519 France 4
215-E 44829 1206001500 3300 4F 16
215-T ISO-8859-9@P.J.
215-S ISO-8859-9@Série policière française avec Bruno Wolkowitch, Lisa Martino, Charles Schneider. Saison 2. (4/6). "Carte bancaire". 
215-D ISO-8859-9@|Episode : 4 / 6|REDIFFUSION : le 20 Mars à 23:40|REDIFFUSION : le 22 Mars à 09:20|REDIFFUSION : le 24 Mars à 22:25|REDIFFUSION : le 25 Mars à 06:25|REDIFFUSION : le 27 Mars à 08:20
215-X 2 03 fra 
215-X 1 01 fra 
215-E 44830 1206004800 1500 4F 16
215-T ISO-8859-9@Cinq soeurs
215-S ISO-8859-9@Série sentimentale française avec Charlotte Becquin, Emmanuelle Boidron, Théa Boswell. Saison 1. (n°34). 
215-D ISO-8859-9@|Episode : 34 / 0|REDIFFUSION : le 21 Mars à 02:00
215-X 2 03 fra 
215-X 1 01 fra 
215 End of EPG data
Klaus Schmidinger March 20, 2008, 8:52 a.m. UTC | #2
On 03/20/08 09:46, Éric Laly wrote:
> Klaus Schmidinger a écrit :
> ...
>> Please do this
>> --- libsi/si.c  2008/03/05 17:00:55     1.25
>> +++ libsi/si.c  2008/03/19 21:30:47
>> @@ -416,6 +416,10 @@
>>      // FIXME Need to make this UTF-8 aware (different control codes).
>>      // However, there's yet to be found a broadcaster that actually
>>      // uses UTF-8 for the SI data... (kls 2007-06-10)
>> +   if (size > 20) {
>> +      to = stpcpy(to, cs);
>> +      to = stpcpy(to, "@");
>> +      }
>>      for (int i = 0; i < len; i++) {
>>         if (*from == 0)
>>            break;
>> and check which encodings are listed in EPG strings for ARTE and
>> direct8.
> I'm not at home now but I've just tried via network and get results via 
> See joined files.
> It seems that EPG that are correctly displayed are in 8859-9 and the 
> others in ISO6937.
> No.

Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
VDR. This should fix it.

Éric Laly March 20, 2008, 8:59 a.m. UTC | #3
Klaus Schmidinger a écrit :
> On 03/20/08 09:46, Éric Laly wrote:
>> Klaus Schmidinger a écrit :
>> ...
>>> Please do this
>>> --- libsi/si.c  2008/03/05 17:00:55     1.25
>>> +++ libsi/si.c  2008/03/19 21:30:47
>>> @@ -416,6 +416,10 @@
>>>      // FIXME Need to make this UTF-8 aware (different control codes).
>>>      // However, there's yet to be found a broadcaster that actually
>>>      // uses UTF-8 for the SI data... (kls 2007-06-10)
>>> +   if (size > 20) {
>>> +      to = stpcpy(to, cs);
>>> +      to = stpcpy(to, "@");
>>> +      }
>>>      for (int i = 0; i < len; i++) {
>>>         if (*from == 0)
>>>            break;
>>> and check which encodings are listed in EPG strings for ARTE and
>>> direct8.
>> I'm not at home now but I've just tried via network and get results via 
>> See joined files.
>> It seems that EPG that are correctly displayed are in 8859-9 and the 
>> others in ISO6937.
>>> Have you set VDR_CHARSET_OVERRIDE?
>> No.
> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
> VDR. This should fix it.

This is fixed !

Thank you.

Lucian Muresan March 26, 2008, 6:35 a.m. UTC | #4
Éric Laly wrote:
> Klaus Schmidinger a écrit :
>> On 03/20/08 09:46, Éric Laly wrote:
>>> Klaus Schmidinger a écrit :
>>> ...
>>>> Please do this
>>>> --- libsi/si.c  2008/03/05 17:00:55     1.25
>>>> +++ libsi/si.c  2008/03/19 21:30:47
>>>> @@ -416,6 +416,10 @@
>>>>      // FIXME Need to make this UTF-8 aware (different control codes).
>>>>      // However, there's yet to be found a broadcaster that actually
>>>>      // uses UTF-8 for the SI data... (kls 2007-06-10)
>>>> +   if (size > 20) {
>>>> +      to = stpcpy(to, cs);
>>>> +      to = stpcpy(to, "@");
>>>> +      }
>>>>      for (int i = 0; i < len; i++) {
>>>>         if (*from == 0)
>>>>            break;
>>>> and check which encodings are listed in EPG strings for ARTE and
>>>> direct8.
>>> I'm not at home now but I've just tried via network and get results via 
>>> SVDRP.
>>> See joined files.
>>> It seems that EPG that are correctly displayed are in 8859-9 and the 
>>> others in ISO6937.
>>>> Have you set VDR_CHARSET_OVERRIDE?
>>> No.
>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>> VDR. This should fix it.
> This is fixed !
> Thank you.

Looks like this is set globally, for all of the epg data, right? What 
about mixed charsets from different providers (I know for sure there 
are, and there are also the "external" data sources like tvmovie2vdr and 
the like fetching some xmltv listings and injecting the data via SVDRP)?

Klaus Schmidinger March 26, 2008, 8:51 a.m. UTC | #5
On 03/26/08 07:35, Lucian Muresan wrote:
> Éric Laly wrote:
>> Klaus Schmidinger a écrit :
>>> On 03/20/08 09:46, Éric Laly wrote:
>>>> Klaus Schmidinger a écrit :
>>>> ...
>>>> It seems that EPG that are correctly displayed are in 8859-9 and the 
>>>> others in ISO6937.
>>>>> Have you set VDR_CHARSET_OVERRIDE?
>>>> No.
>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>> VDR. This should fix it.
>> This is fixed !
>> Thank you.
> Looks like this is set globally, for all of the epg data, right? What 
> about mixed charsets from different providers (I know for sure there 
> are, and there are also the "external" data sources like tvmovie2vdr and 
> the like fetching some xmltv listings and injecting the data via SVDRP)?

The DVB standard provides for a way to mark text strings, so that
applications can correctly determine the actual encoding. The
VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
provider fails to correctly encode their strings.

External data source simply need to provide the strings in the
encoding used on your local system (presumably UTF-8).

Füley István March 26, 2008, 9:42 a.m. UTC | #6
> External data source simply need to provide the strings in the
> encoding used on your local system (presumably UTF-8).
> Klaus

This is what I did in my xmltv grab process:

iconv --silent --from-code=ISO-8859-2 --to-code=UTF-8 
--output=/opt/tigervdr/xmltv/hu-utf.xml /opt/tigervdr/xmltv/all.xml

And this provides vdr the correct encoding for epg.
Lucian Muresan March 26, 2008, 12:39 p.m. UTC | #7
Klaus Schmidinger wrote:
>>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>>> VDR. This should fix it.
>>> This is fixed !
>>> Thank you.
>> Looks like this is set globally, for all of the epg data, right? What 
>> about mixed charsets from different providers (I know for sure there 
>> are, and there are also the "external" data sources like tvmovie2vdr and 
>> the like fetching some xmltv listings and injecting the data via SVDRP)?
> The DVB standard provides for a way to mark text strings, so that
> applications can correctly determine the actual encoding. The
> VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
> provider fails to correctly encode their strings.

Am I missing something, is there a way to mark a provider as being my 
"main" one? Or is the workaround rather replacing the character set for 
all incorrectly recognized ones (assuming that the application 
determines the fact that it is incorrect)? If the latter case occures, 
what if there are several providers not marking the encoding right, but 
their epg content actually need different encodings, will they all use 
the same encoding specified in VDR_CHARSET_OVERRIDE? (This reminds me of 
the early UTF-8 patch which required setting the encoding for every 
channel in channels.conf, which of course is ugly, but could handle 
different EPG encoding needs in case of multiple providers failing to 
mark this correctly).

> External data source simply need to provide the strings in the
> encoding used on your local system (presumably UTF-8).

So it should work in the case of correctly handling external data, thanks.

BTW, OSD then stays unaffected by VDR_CHARSET_OVERRIDE? It might be 
worth renaming this to something more clearly specifying that it only 
affects EPG.

Lucian Muresan March 26, 2008, 12:43 p.m. UTC | #8
Füley István wrote:
>> External data source simply need to provide the strings in the
>> encoding used on your local system (presumably UTF-8).
>> Klaus
> This is what I did in my xmltv grab process:
> iconv --silent --from-code=ISO-8859-2 --to-code=UTF-8 
> --output=/opt/tigervdr/xmltv/hu-utf.xml /opt/tigervdr/xmltv/all.xml
> And this provides vdr the correct encoding for epg.

Looks like you might be using www.port.hu / www.port.ro as your data 
source. I used to use the romanian version some time ago, now I would 
like to set the whole thing up again. If you're really using that, could 
you please provide some relevant config snippets, scripts and requirements?

Klaus Schmidinger March 26, 2008, 12:52 p.m. UTC | #9
On 03/26/08 13:39, Lucian Muresan wrote:
> Klaus Schmidinger wrote:
> [..]
>>>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>>>> VDR. This should fix it.
>>>> This is fixed !
>>>> Thank you.
>>> Looks like this is set globally, for all of the epg data, right? What 
>>> about mixed charsets from different providers (I know for sure there 
>>> are, and there are also the "external" data sources like tvmovie2vdr and 
>>> the like fetching some xmltv listings and injecting the data via SVDRP)?
>> The DVB standard provides for a way to mark text strings, so that
>> applications can correctly determine the actual encoding. The
>> VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
>> provider fails to correctly encode their strings.
> Am I missing something, is there a way to mark a provider as being my 
> "main" one? Or is the workaround rather replacing the character set for 
> all incorrectly recognized ones (assuming that the application 
> determines the fact that it is incorrect)? If the latter case occures, 
> what if there are several providers not marking the encoding right, but 
> their epg content actually need different encodings, will they all use 
> the same encoding specified in VDR_CHARSET_OVERRIDE? (This reminds me of 
> the early UTF-8 patch which required setting the encoding for every 
> channel in channels.conf, which of course is ugly, but could handle 
> different EPG encoding needs in case of multiple providers failing to 
> mark this correctly).

Well, first and foremost providers should actually do their homework
and encode their stuff according to the standard.

The problem is with providers who don't add a codeset marker to their
strings. This is ok as long as they actually encode in ISO6937.
Unfortunately some providers use ISO-8859-9 instead (or maybe even
others). With VDR_CHARSET_OVERRIDE set, all strings that are not
explicitly marked as using a specific codeset are assumed to be
encoded in the way given by VDR_CHARSET_OVERRIDE.

>> External data source simply need to provide the strings in the
>> encoding used on your local system (presumably UTF-8).
> So it should work in the case of correctly handling external data, thanks.
> BTW, OSD then stays unaffected by VDR_CHARSET_OVERRIDE? It might be 
> worth renaming this to something more clearly specifying that it only 
> affects EPG.

This was just a last minute quick workaround (initially this was hardcoded),
since some (esp. Czech) providers actually do encode their strings in
ISO6937, and I didn't want to cause problems with those who do adhere to
the standard.

An elaborate workaround would probably require a separare file
in which transponders can be marked as using a specific default
codeset (and then VDR_CHARSET_OVERRIDE would vanish again).

But it would be so much better if these providers would just follow
the standard! Yes, I know, these are multi million dollar enterprises,
so they can't be bothered with "standards" - oh well...

Lucian Muresan March 26, 2008, 2:02 p.m. UTC | #10
Klaus Schmidinger wrote:
> On 03/26/08 13:39, Lucian Muresan wrote:
>> Klaus Schmidinger wrote:
>> [..]
>>>>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>>>>> VDR. This should fix it.
>>>>> This is fixed !
>>>>> Thank you.
>>>> Looks like this is set globally, for all of the epg data, right? What 
>>>> about mixed charsets from different providers (I know for sure there 
>>>> are, and there are also the "external" data sources like tvmovie2vdr and 
>>>> the like fetching some xmltv listings and injecting the data via SVDRP)?
>>> The DVB standard provides for a way to mark text strings, so that
>>> applications can correctly determine the actual encoding. The
>>> VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
>>> provider fails to correctly encode their strings.
>> Am I missing something, is there a way to mark a provider as being my 
>> "main" one? Or is the workaround rather replacing the character set for 
>> all incorrectly recognized ones (assuming that the application 
>> determines the fact that it is incorrect)? If the latter case occures, 
>> what if there are several providers not marking the encoding right, but 
>> their epg content actually need different encodings, will they all use 
>> the same encoding specified in VDR_CHARSET_OVERRIDE? (This reminds me of 
>> the early UTF-8 patch which required setting the encoding for every 
>> channel in channels.conf, which of course is ugly, but could handle 
>> different EPG encoding needs in case of multiple providers failing to 
>> mark this correctly).
> Well, first and foremost providers should actually do their homework
> and encode their stuff according to the standard.
> The problem is with providers who don't add a codeset marker to their
> strings. This is ok as long as they actually encode in ISO6937.
> Unfortunately some providers use ISO-8859-9 instead (or maybe even
> others). With VDR_CHARSET_OVERRIDE set, all strings that are not
> explicitly marked as using a specific codeset are assumed to be
> encoded in the way given by VDR_CHARSET_OVERRIDE.
>>> External data source simply need to provide the strings in the
>>> encoding used on your local system (presumably UTF-8).
>> So it should work in the case of correctly handling external data, thanks.
>> BTW, OSD then stays unaffected by VDR_CHARSET_OVERRIDE? It might be 
>> worth renaming this to something more clearly specifying that it only 
>> affects EPG.
> This was just a last minute quick workaround (initially this was hardcoded),
> since some (esp. Czech) providers actually do encode their strings in
> ISO6937, and I didn't want to cause problems with those who do adhere to
> the standard.
> An elaborate workaround would probably require a separare file
> in which transponders can be marked as using a specific default
> codeset (and then VDR_CHARSET_OVERRIDE would vanish again).
> But it would be so much better if these providers would just follow
> the standard! Yes, I know, these are multi million dollar enterprises,
> so they can't be bothered with "standards" - oh well...

You're so right about these providers :-). Thanks for enlightening on 
the current state of this workaround. Maybe, if proven necessary, the 
extra file concept will be not too difficult or "unclean" to implement 
(possibly as a patch by someone else, myself not excluded).



--- libsi/si.c  2008/03/05 17:00:55     1.25
+++ libsi/si.c  2008/03/19 21:30:47
@@ -416,6 +416,10 @@ 
     // FIXME Need to make this UTF-8 aware (different control codes).
     // However, there's yet to be found a broadcaster that actually
     // uses UTF-8 for the SI data... (kls 2007-06-10)
+   if (size > 20) {
+      to = stpcpy(to, cs);
+      to = stpcpy(to, "@");
+      }
     for (int i = 0; i < len; i++) {
        if (*from == 0)