wrong characters in EPG (vdr-1.5.18)

Message ID 47E186DD.6030507@cadsoft.de
State New
Headers

Commit Message

Klaus Schmidinger March 19, 2008, 9:34 p.m. UTC
  On 03/19/08 11:11, Éric Laly wrote:
> Klaus Schmidinger a écrit :
>> On 03/19/08 10:51, Éric Laly wrote:
>>> Hello all,
>>>
>>> I've rebuilt my vdr with two DVB-T cards.
>>> Until then it was running with a very old vdr (1.3) and DVB-S.
>>> My locale was set to fr_FR in order to have the good charset and 
>>> everything was fine (vdr menu and EPG in french).
>>>
>>> With the new 1.5 series I've understood that vdr now supports unicode 
>>> (since 1.5.12 ?) so my locale is now set to fr_FR.UTF-8.
>>> The vdr menus are in french and the accentuated character are good ( é, 
>>> è, î, à ...) but in the EPG the accentuated characters are wrong with 
>>> some channels.
>>> For exemple now, the EPG is showing "l'odyssøe" instead of "l'odyssée" 
>>> on ARTE but is showing "Bien-être" on direct8 (which are the good 
>>> characters).
>> Does the problem persist if you stop VDR, delete the epg.data file, and
>> restart it?
> I've just tried and unfortunately yes.

Please do this


and check which encodings are listed in EPG strings for ARTE and
direct8.

Have you set VDR_CHARSET_OVERRIDE?

Klaus
  

Comments

Éric Laly March 20, 2008, 8:46 a.m. UTC | #1
Klaus Schmidinger a écrit :

...

> Please do this
> 
> --- libsi/si.c  2008/03/05 17:00:55     1.25
> +++ libsi/si.c  2008/03/19 21:30:47
> @@ -416,6 +416,10 @@
>      // FIXME Need to make this UTF-8 aware (different control codes).
>      // However, there's yet to be found a broadcaster that actually
>      // uses UTF-8 for the SI data... (kls 2007-06-10)
> +   if (size > 20) {
> +      to = stpcpy(to, cs);
> +      to = stpcpy(to, "@");
> +      }
>      for (int i = 0; i < len; i++) {
>         if (*from == 0)
>            break;
> 
> and check which encodings are listed in EPG strings for ARTE and
> direct8.
I'm not at home now but I've just tried via network and get results via 
SVDRP.
See joined files.

It seems that EPG that are correctly displayed are in 8859-9 and the 
others in ISO6937.

> 
> Have you set VDR_CHARSET_OVERRIDE?
No.

Éric.
LSTE
215-C T-8442-1-275 France 3
215-E 16575 1206000600 1500 4E C
215-T ISO6937@C'est arrivØ pr?s de chez vous
215-S ISO6937@Magazine d'information.
215-X 2 03 fra ISO6937@Stereo
215-X 1 01 fra 
215-e
215-E 16576 1206002100 3000 4E C
215-T ISO6937@La famille Serrano
215-S ISO6937@«Prix Nobel de littØrature». SØrie humoristique. 2003.
215-X 2 03 fra ISO6937@Stereo
215-X 1 01 fra 
215-e
215-c
215-C T-8442-4-1025 M6
215-E 53 1206000543 3069 4F 9
215-T ISO-8859-9@M6 BOUTIQUE
215-X 1 01 fra ISO-8859-9@4:3
215-X 2 03 fra ISO-8859-9@stereo
215-e
215-E 54 1206003968 2916 4F 9
215-T ISO-8859-9@STAR6 MUSIC
215-X 1 01 fra ISO-8859-9@4:3
215-X 2 03 fra ISO-8859-9@stereo
215-e
215-c
215-C T-8442-1-257 France 2
215-E 16403 1206001200 1500 4E 12
215-T ISO6937@Amour, gloire et beautØ
215-S ISO6937@Feuilleton sentimental. Episode 4739.
215-X 2 03 fra ISO6937@Stereo
215-X 1 01 fra 
215-e
215-E 16404 1206002700 3300 4E 12
215-T ISO6937@C'est au programme
215-S ISO6937@Magazine de sociØtØ.
215-X 2 02 fra ISO6937@Mono
215-X 1 01 fra 
215-e
215-c
215-C T-8442-1-260 France 5
215-E 16744 1205999700 4800 4E 1F
215-T ISO6937@Les maternelles
215-S ISO6937@Magazine de sociØtØ.
215-X 2 03 fra ISO6937@Stereo
215-X 1 01 fra 
215-e
215-E 16745 1206004500 3300 4E 1F
215-T ISO6937@On n'est pas que des parents
215-S ISO6937@Magazine de sociØtØ.
215-X 2 03 fra ISO6937@Stereo
215-X 1 01 fra 
215-e
215-c
215-C T-8442-1-261 ARTE
215-E 16872 1206001500 1740 4E 1
215-T ISO6937@Tous les habits du monde
215-S ISO6937@«Le SØnØgal». Magazine de sociØtØ.
215-X 2 03 fra ISO6937@Stereo
215-X 1 01 fra 
215-e
215-E 16873 1206003240 60 4E 1
215-T ISO6937@Ouverture
215-S ISO6937@«Le Congo, infiniment riche, dØmocratiquement pauvre». Magazine de gØopolitique.
215-X 2 02 fra ISO6937@Mono
215-X 1 01 fra 
215-e
215-c
215-C T-8442-1-262 LCP
215-E 17010 1206001800 3600 4E 10
215-T ISO6937@Parlez-moi d'ailleurs
215-S ISO6937@«Oø va la Russie ?». Magazine de gØopolitique.
215-X 2 02 fra ISO6937@Mono
215-X 1 01 fra 
215-e
215-E 17011 1206005400 6300 4E 10
215-T ISO6937@En direct de l'hØmicycle
215-S ISO6937@«Projet de loi «OGM»». DØbat parlementaire.
215-X 2 02 fra ISO6937@Mono
215-X 1 01 fra 
215-e
215-c
215-C T-8442-2-515 BFM TV
215-E 10277 1206000000 10800 4F 5
215-T ISO-8859-9@Non Stop
215-S ISO-8859-9@Présenté par Jean-Alexandre Baril. 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-E 10278 1206010800 7200 4F 5
215-T ISO-8859-9@Aujourd'hui le monde
215-S ISO-8859-9@Présenté par Florence Duprat, Thomas Misrachi. 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-2-513 Direct 8
215-E 32793 1206001800 1800 4F 13
215-T ISO-8859-9@Culture VIP
215-S ISO-8859-9@Jeu. Présenté par Valérie Benaïm. 
215-X 2 03 fra 
215-X 1 01 fra 
215-e
215-E 33298 1206003600 3600 4F 13
215-T ISO-8859-9@Bien-être
215-S ISO-8859-9@"Coachez votre intuition... pour provoquer le hasard !" Invités : Vanessa Mielczareck, Catherine Balance, Yonelle Delle. 
215-X 2 03 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-2-516 i>TELE
215-E 25165 1206001800 540 4F 0
215-T ISO-8859-9@Journal
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-E 25166 1206002340 360 4F 0
215-T ISO-8859-9@Chronique politique
215-S ISO-8859-9@Présenté par Nicolas Domenach. 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-2-517 Virgin 17
215-E 26678 1205998800 4500 4F 8
215-T ISO-8859-9@Puissance tubes
215-S ISO-8859-9@Tous les tubes du moment.
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-E 26679 1206003300 3900 4F 8
215-T ISO-8859-9@US 15
215-S ISO-8859-9@Le classement des meilleurs titres américains du moment.
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-2-518 Gulli
215-E 44678 1206000600 1500 4F 12
215-T ISO-8859-9@Inspecteur Gadget
215-S ISO-8859-9@"La malédiction de Toutankharton". 
215-D ISO-8859-9@|Episode : 18 / 86
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-E 44679 1206002100 1500 4F 12
215-T ISO-8859-9@Famille Pirate
215-S ISO-8859-9@"Vacances pirates". 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-3-771 CANAL+ SPORT
215-E 19 1205998260 5880 4F 5
215-T ISO-8859-9@Au nom de la liberté
215-S ISO-8859-9@Réalisé par Phillip Noyce en 2006. Avec Tim Robbins, Derek Luke, Bonnie Henna. Drame britannico-franco-américain. 
215-X 2 03 fra 
215-X 2 03 eng 
215-X 1 03 fra 
215-X 3 01 fra 
215-e
215-E 20 1206004140 6960 4F 5
215-T ISO-8859-9@The Host
215-S ISO-8859-9@Réalisé par Bong Joon-ho en 2006. Avec Song Kang-ho, Byeon Hie-bong, Park Hae-il. Film d'horreur sud-coréen. 
215-X 2 03 fra 
215-X 2 03 kor 
215-X 1 03 fra 
215-X 3 01 fra 
215-e
215-c
215-C T-8442-4-1029 TF6
215-E 6 1205999963 5545 4F B
215-T ISO6937@MEURTRE SUR ECOUTE
215-D ISO6937@Une femme enquŒte sur la disparition de son mari et dØcouvre un enregistrement de l'une de ses conversations tØlØphoniques avec une call-girl.ISO6937@Apr?s quelques annØes de mariage, l'harmonie entre Adrienne et Richard Welles est quelque peu gÐtØe par les difficultØs du quotidien et usØe par l'habitude. Plut?t que de tenter de mettre les probl?mes ? plat pour sauver leur couple, Richard essaieISO6937@d'oublier ses prØoccupations en ayant recours aux services tØlØphoniques d'une call-girl de luxe. MalgrØ des probl?mes d'argent importants, il entame une relation assez suivie avec cette femme, qui se fait appeler Laura.
215-X 1 01 fra ISO6937@VIDEO 4/3  
215-X 2 03 fra ISO6937@AUDIO STEREO fre
215-e
215-E 7 1206005935 3313 4F B
215-T ISO6937@LA VIE DEVANT NOUS
215-D ISO6937@«Une semaine mouvementØe».|Bertrand comprend que son p?re est au ch?mage depuis plusieurs mois. De leur c?tØ, Barthe et AlizØ se racontent leurs dØboires sentimentaux
215-e
215-c
215-C T-8442-3-770 CANAL+ CINEMA
215-E 21 1205997300 5340 4F 13
215-T ISO-8859-9@Nue propriété
215-S ISO-8859-9@Réalisé par Joachim Lafosse en 2006. Avec Isabelle Huppert, Jérémie Renier, Yannick Renier. Drame franco-belgo-luxembourgeois. 
215-X 2 03 fra 
215-X 1 03 fra 
215-e
215-E 22 1206002640 900 4F 13
215-T ISO-8859-9@A suivre : Infernal A...
215-S ISO-8859-9@A suivre dans quelques minutes : Infernal Affairs 3
215-X 2 01 fra 
215-X 1 01 fra 
215-X 3 01 fra 
215-e
215-c
215-C T-8442-6-1538 NRJ12
215-E 100 1205999420 3187 4F 15
215-T ISO-8859-9@SHERIF FAIS-MOI PEUR
215-e
215-E 101 1206002606 3020 4F 15
215-T ISO-8859-9@FLIPPER
215-e
215-c
215-C T-8442-6-1537 TF1
215-E 39 1206001200 3000 4F 13
215-T ISO-8859-9@Melrose Place. "Le piège"
215-S ISO-8859-9@ Série (USA). ST. Avec J Bisset, T Calabro. Michael loue les services d'une prostituée pour piéger Robert, l'ami de Jane, pendant son voyage à San Diego. Amanda décide de partir en vacances à Hawaï avec Jake...
215-e
215-E 40 1206004200 2700 4F 13
215-T ISO-8859-9@Melrose Place. "Révélation"
215-S ISO-8859-9@ Série (USA). ST. Avec J Bisset, T Calabro. Sydney raconte à Jane le piège monté par Michael avec une prostituée pour compromettre Robert. Furieuse, elle le chasse de chez elle. Billy bénéficie d'une promotion...
215-e
215-c
215-C T-8442-6-1540 Eurosport France
215-E 5 1206000000 3600 4F 2
215-T ISO6937@NATATION : Championnat d\'Europe ? Eindhoven, Pays-Bas
215-S ISO6937@Finales
215-e
215-E 6 1206003600 2700 4F 2
215-T ISO6937@NATATION : Championnat d\'Europe ? Eindhoven, Pays-Bas
215-S ISO6937@Plongeon 1m messieurs
215-e
215-c
215-C T-8442-6-1542 TMC
215-E 94 1206001677 1780 4F 1D
215-T ISO-8859-9@TELE ACHAT
215-e
215-E 95 1206003895 438 4F 1D
215-T ISO-8859-9@C'EST POURTANT VRAI
215-e
215-c
215-C T-8442-3-774 TPS STAR
215-E 32335 1205999160 5580 4F 15
215-T ISO-8859-9@Vu à la TV
215-S ISO-8859-9@Réalisé par Daniel Losset en 2002. Avec Jean-Michel Noirey, Pascale Arbillot, Jackie Berroyer. Téléfilm sentimental français. 
215-X 2 03 fra 
215-X 1 01 fra 
215-e
215-E 32336 1206004740 6600 4F 15
215-T ISO-8859-9@Wonder Boys
215-S ISO-8859-9@Réalisé par C Hanson en 2000. Avec Michael Douglas, Tobey Maguire, Frances McDormand. Comédie dramatique britannico-germano-américaine. 
215-X 2 03 fra 
215-X 2 03 eng 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-3-772 PLANETE
215-E 5682 1205999100 3000 4F 14
215-T ISO-8859-9@Les ailes de la guerre
215-S ISO-8859-9@Documentaire américain réalisé en 2007. "Bombardiers contre chasseurs". 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-E 5683 1206002100 1800 4F 14
215-T ISO-8859-9@Dingos, les hors-la-loi du bush
215-S ISO-8859-9@Dingos, les hors-la-loi du bush Documentaire britannique réalisé par Holly Spearing. 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-2-519 France 4
215-E 44829 1206001500 3300 4F 16
215-T ISO-8859-9@P.J.
215-S ISO-8859-9@Série policière française avec Bruno Wolkowitch, Lisa Martino, Charles Schneider. Saison 2. (4/6). "Carte bancaire". 
215-D ISO-8859-9@|Episode : 4 / 6|REDIFFUSION : le 20 Mars à 23:40|REDIFFUSION : le 22 Mars à 09:20|REDIFFUSION : le 24 Mars à 22:25|REDIFFUSION : le 25 Mars à 06:25|REDIFFUSION : le 27 Mars à 08:20
215-X 2 03 fra 
215-X 1 01 fra 
215-e
215-E 44830 1206004800 1500 4F 16
215-T ISO-8859-9@Cinq soeurs
215-S ISO-8859-9@Série sentimentale française avec Charlotte Becquin, Emmanuelle Boidron, Théa Boswell. Saison 1. (n°34). 
215-D ISO-8859-9@|Episode : 34 / 0|REDIFFUSION : le 21 Mars à 02:00
215-X 2 03 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-3-773 CANAL J
215-E 12444 1206001980 480 4F E
215-T ISO-8859-9@Titeuf
215-S ISO-8859-9@"Crapauch'mar". 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-E 12445 1206002460 540 4F E
215-T ISO-8859-9@Titeuf
215-S ISO-8859-9@"Train d'enfer". 
215-X 2 01 fra 
215-X 1 01 fra 
215-e
215-c
215-C T-8442-4-1026 W9
215-E 131 1206001942 214 4F C
215-T ISO-8859-9@LE TUNNEL D'OR
215-D |ISO6937@Interpr?te: ISO6937@AARON|
215-X 1 01 fra ISO-8859-9@4:3
215-X 2 03 fra ISO-8859-9@stereo
215-e
215-E 132 1206002156 209 4F C
215-T ISO-8859-9@J TRAINE DES PIEDS
215-D |ISO6937@Interpr?te: ISO6937@Olivia RUIZ|
215-X 1 01 fra ISO-8859-9@4:3
215-X 2 03 fra ISO-8859-9@stereo
215-e
215-c
215-C T-8442-4-1030 AB1
215-E 30906 1206001662 1487 4F F
215-T ISO6937@Premiers baisers
215-S ISO6937@((Premiers baisers)), Couleur, 1991
215-D ISO6937@Roger est parti en repØrage pour le tournage d'un Øpisode d' Amour toujours ; Annette apprend que Marie, qui a pris un jour de congØ, doit dØjeuner avec un homme. Annette trouve un prØtexte pour retourner chez les Girard et ...|ISO6937@Acteur: ISO6937@Camille RAYMOND|ISO6937@Acteur: ISO6937@Fabien REMBLIER|ISO6937@Acteur: ISO6937@Christophe RIPPERT|
215-X 1 01 fra 
215-X 2 01 fra 
215-e
215-E 30907 1206003149 1462 4F F
215-T ISO6937@Premiers baisers
215-S ISO6937@((Premiers baisers)), Couleur, 1991
215-D ISO6937@Virginie confie ? Annette que Daniel, dont elle est amoureuse, lui semble tr?s indiffØrent ; Annette lui conseille de le rendre jaloux. De son c?tØ, Daniel tr?s attirØ par Virginie, dØcide d'attirer son attention en faisant le ...|ISO6937@Acteur: ISO6937@Camille RAYMOND|ISO6937@Acteur: ISO6937@Fabien REMBLIER|ISO6937@Acteur: ISO6937@Christophe RIPPERT|
215-X 1 01 fra 
215-X 2 01 fra 
215-e
215-c
215-C T-8442-4-1027 NT1
215-E 22168 1206000345 1855 4F 1D
215-T ISO6937@TØlØ achat NT1
215-S ISO6937@((TØlØ achat NT1)), Couleur, 2005
215-X 1 01 fra 
215-X 2 01 fra 
215-e
215-E 22169 1206002200 1697 4F 1D
215-T ISO6937@TØlØ achat NT1
215-S ISO6937@((TØlØ achat NT1)), Couleur, 2005
215-X 1 01 fra 
215-X 2 01 fra 
215-e
215-c
215-C T-8442-4-1028 PARIS PREMIERE
215-E 22 1206001859 3590 4F B
215-T ISO-8859-9@PARIS PREMIERE BOUTIQUE
215-X 1 01 fra ISO-8859-9@4:3
215-X 2 01 fra ISO-8859-9@single mono channel
215-e
215-c
215-C T-8442-6-1539 LCI
215-E 47 1206000600 6600 4F 7
215-T ISO6937@ON EN PARLE
215-D ISO6937@«InvitØs : Jean-Luc Romero, Julien Dourgnon, Jean-Paul HØvin et Odon Vallet.».|Au sommaire : «Fin de vie : changer la loi ?». «Mobile : trop cher ?». «PÐques : faites vos oeufs !».
215-X 1 01 fra ISO6937@VIDEO 4/3  
215-X 2 01 fra ISO6937@AUDIO MONO   fre
215-e
215-E 13 1206007200 600 4F 7
215-T ISO6937@LE JOURNAL
215-D ISO6937@Toute l'actualitØ passØe en revue.
215-e
215-c
215-C T-8442-3-769 CANAL+
215-E 34 1205998980 5520 4F 18
215-T ISO-8859-9@Twelve and Holding
215-S ISO-8859-9@Réalisé par Michael Cuesta en 2006. Avec Conor Donovan, Zoe Weizenbaum, Jesse Camacho. Drame américain. 
215-X 2 03 fra 
215-X 2 03 eng 
215-X 1 03 fra 
215-X 3 01 fra 
215-e
215-E 35 1206004500 360 4F 18
215-T ISO-8859-9@A suivre : Elijah Woo...
215-S ISO-8859-9@A suivre dans quelques minutes : Elijah Wood, Alex de la Iglesia : la re
215-X 2 01 fra 
215-X 1 01 fra 
215-X 3 01 fra 
215-e
215-c
215 End of EPG data
LSTE 7
215-C T-8442-1-261 ARTE
215-E 16872 1206001500 1740 4E 1
215-T ISO6937@Tous les habits du monde
215-S ISO6937@«Le SØnØgal». Magazine de sociØtØ.
215-X 2 03 fra ISO6937@Stereo
215-X 1 01 fra 
215-e
215-E 16873 1206003240 60 4E 1
215-T ISO6937@Ouverture
215-S ISO6937@«Le Congo, infiniment riche, dØmocratiquement pauvre». Magazine de gØopolitique.
215-X 2 02 fra ISO6937@Mono
215-X 1 01 fra 
215-e
215-c
215 End of EPG data
LSTE 8
215-C T-8442-2-513 Direct 8
215-E 32793 1206001800 1800 4F 13
215-T ISO-8859-9@Culture VIP
215-S ISO-8859-9@Jeu. Présenté par Valérie Benaïm. 
215-X 2 03 fra 
215-X 1 01 fra 
215-e
215-E 33298 1206003600 3600 4F 13
215-T ISO-8859-9@Bien-être
215-S ISO-8859-9@"Coachez votre intuition... pour provoquer le hasard !" Invités : Vanessa Mielczareck, Catherine Balance, Yonelle Delle. 
215-X 2 03 fra 
215-X 1 01 fra 
215-e
215-c
215 End of EPG data
  
Klaus Schmidinger March 20, 2008, 8:52 a.m. UTC | #2
On 03/20/08 09:46, Éric Laly wrote:
> Klaus Schmidinger a écrit :
> 
> ...
> 
>> Please do this
>>
>> --- libsi/si.c  2008/03/05 17:00:55     1.25
>> +++ libsi/si.c  2008/03/19 21:30:47
>> @@ -416,6 +416,10 @@
>>      // FIXME Need to make this UTF-8 aware (different control codes).
>>      // However, there's yet to be found a broadcaster that actually
>>      // uses UTF-8 for the SI data... (kls 2007-06-10)
>> +   if (size > 20) {
>> +      to = stpcpy(to, cs);
>> +      to = stpcpy(to, "@");
>> +      }
>>      for (int i = 0; i < len; i++) {
>>         if (*from == 0)
>>            break;
>>
>> and check which encodings are listed in EPG strings for ARTE and
>> direct8.
> I'm not at home now but I've just tried via network and get results via 
> SVDRP.
> See joined files.
> 
> It seems that EPG that are correctly displayed are in 8859-9 and the 
> others in ISO6937.
> 
>>
>> Have you set VDR_CHARSET_OVERRIDE?
> No.

Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
VDR. This should fix it.

Klaus
  
Éric Laly March 20, 2008, 8:59 a.m. UTC | #3
Klaus Schmidinger a écrit :
> On 03/20/08 09:46, Éric Laly wrote:
>> Klaus Schmidinger a écrit :
>>
>> ...
>>
>>> Please do this
>>>
>>> --- libsi/si.c  2008/03/05 17:00:55     1.25
>>> +++ libsi/si.c  2008/03/19 21:30:47
>>> @@ -416,6 +416,10 @@
>>>      // FIXME Need to make this UTF-8 aware (different control codes).
>>>      // However, there's yet to be found a broadcaster that actually
>>>      // uses UTF-8 for the SI data... (kls 2007-06-10)
>>> +   if (size > 20) {
>>> +      to = stpcpy(to, cs);
>>> +      to = stpcpy(to, "@");
>>> +      }
>>>      for (int i = 0; i < len; i++) {
>>>         if (*from == 0)
>>>            break;
>>>
>>> and check which encodings are listed in EPG strings for ARTE and
>>> direct8.
>> I'm not at home now but I've just tried via network and get results via 
>> SVDRP.
>> See joined files.
>>
>> It seems that EPG that are correctly displayed are in 8859-9 and the 
>> others in ISO6937.
>>
>>> Have you set VDR_CHARSET_OVERRIDE?
>> No.
> 
> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
> VDR. This should fix it.

This is fixed !

Thank you.

Éric.
  
Lucian Muresan March 26, 2008, 6:35 a.m. UTC | #4
Éric Laly wrote:
> Klaus Schmidinger a écrit :
>> On 03/20/08 09:46, Éric Laly wrote:
>>> Klaus Schmidinger a écrit :
>>>
>>> ...
>>>
>>>> Please do this
>>>>
>>>> --- libsi/si.c  2008/03/05 17:00:55     1.25
>>>> +++ libsi/si.c  2008/03/19 21:30:47
>>>> @@ -416,6 +416,10 @@
>>>>      // FIXME Need to make this UTF-8 aware (different control codes).
>>>>      // However, there's yet to be found a broadcaster that actually
>>>>      // uses UTF-8 for the SI data... (kls 2007-06-10)
>>>> +   if (size > 20) {
>>>> +      to = stpcpy(to, cs);
>>>> +      to = stpcpy(to, "@");
>>>> +      }
>>>>      for (int i = 0; i < len; i++) {
>>>>         if (*from == 0)
>>>>            break;
>>>>
>>>> and check which encodings are listed in EPG strings for ARTE and
>>>> direct8.
>>> I'm not at home now but I've just tried via network and get results via 
>>> SVDRP.
>>> See joined files.
>>>
>>> It seems that EPG that are correctly displayed are in 8859-9 and the 
>>> others in ISO6937.
>>>
>>>> Have you set VDR_CHARSET_OVERRIDE?
>>> No.
>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>> VDR. This should fix it.
> 
> This is fixed !
> 
> Thank you.

Looks like this is set globally, for all of the epg data, right? What 
about mixed charsets from different providers (I know for sure there 
are, and there are also the "external" data sources like tvmovie2vdr and 
the like fetching some xmltv listings and injecting the data via SVDRP)?

Cheers,
Lucian
  
Klaus Schmidinger March 26, 2008, 8:51 a.m. UTC | #5
On 03/26/08 07:35, Lucian Muresan wrote:
> Éric Laly wrote:
>> Klaus Schmidinger a écrit :
>>> On 03/20/08 09:46, Éric Laly wrote:
>>>> Klaus Schmidinger a écrit :
>>>>
>>>> ...
>>>> It seems that EPG that are correctly displayed are in 8859-9 and the 
>>>> others in ISO6937.
>>>>
>>>>> Have you set VDR_CHARSET_OVERRIDE?
>>>> No.
>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>> VDR. This should fix it.
>> This is fixed !
>>
>> Thank you.
> 
> Looks like this is set globally, for all of the epg data, right? What 
> about mixed charsets from different providers (I know for sure there 
> are, and there are also the "external" data sources like tvmovie2vdr and 
> the like fetching some xmltv listings and injecting the data via SVDRP)?

The DVB standard provides for a way to mark text strings, so that
applications can correctly determine the actual encoding. The
VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
provider fails to correctly encode their strings.

External data source simply need to provide the strings in the
encoding used on your local system (presumably UTF-8).

Klaus
  
Füley István March 26, 2008, 9:42 a.m. UTC | #6
> External data source simply need to provide the strings in the
> encoding used on your local system (presumably UTF-8).
>
> Klaus

This is what I did in my xmltv grab process:

iconv --silent --from-code=ISO-8859-2 --to-code=UTF-8 
--output=/opt/tigervdr/xmltv/hu-utf.xml /opt/tigervdr/xmltv/all.xml

And this provides vdr the correct encoding for epg.
  
Lucian Muresan March 26, 2008, 12:39 p.m. UTC | #7
Klaus Schmidinger wrote:
[..]
>>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>>> VDR. This should fix it.
>>> This is fixed !
>>>
>>> Thank you.
>> Looks like this is set globally, for all of the epg data, right? What 
>> about mixed charsets from different providers (I know for sure there 
>> are, and there are also the "external" data sources like tvmovie2vdr and 
>> the like fetching some xmltv listings and injecting the data via SVDRP)?
> 
> The DVB standard provides for a way to mark text strings, so that
> applications can correctly determine the actual encoding. The
> VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
> provider fails to correctly encode their strings.

Am I missing something, is there a way to mark a provider as being my 
"main" one? Or is the workaround rather replacing the character set for 
all incorrectly recognized ones (assuming that the application 
determines the fact that it is incorrect)? If the latter case occures, 
what if there are several providers not marking the encoding right, but 
their epg content actually need different encodings, will they all use 
the same encoding specified in VDR_CHARSET_OVERRIDE? (This reminds me of 
the early UTF-8 patch which required setting the encoding for every 
channel in channels.conf, which of course is ugly, but could handle 
different EPG encoding needs in case of multiple providers failing to 
mark this correctly).

> External data source simply need to provide the strings in the
> encoding used on your local system (presumably UTF-8).

So it should work in the case of correctly handling external data, thanks.

BTW, OSD then stays unaffected by VDR_CHARSET_OVERRIDE? It might be 
worth renaming this to something more clearly specifying that it only 
affects EPG.

Lucian
  
Lucian Muresan March 26, 2008, 12:43 p.m. UTC | #8
Füley István wrote:
>> External data source simply need to provide the strings in the
>> encoding used on your local system (presumably UTF-8).
>>
>> Klaus
> 
> This is what I did in my xmltv grab process:
> 
> iconv --silent --from-code=ISO-8859-2 --to-code=UTF-8 
> --output=/opt/tigervdr/xmltv/hu-utf.xml /opt/tigervdr/xmltv/all.xml
> 
> And this provides vdr the correct encoding for epg.

Looks like you might be using www.port.hu / www.port.ro as your data 
source. I used to use the romanian version some time ago, now I would 
like to set the whole thing up again. If you're really using that, could 
you please provide some relevant config snippets, scripts and requirements?

Lucian
  
Klaus Schmidinger March 26, 2008, 12:52 p.m. UTC | #9
On 03/26/08 13:39, Lucian Muresan wrote:
> Klaus Schmidinger wrote:
> [..]
>>>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>>>> VDR. This should fix it.
>>>> This is fixed !
>>>>
>>>> Thank you.
>>> Looks like this is set globally, for all of the epg data, right? What 
>>> about mixed charsets from different providers (I know for sure there 
>>> are, and there are also the "external" data sources like tvmovie2vdr and 
>>> the like fetching some xmltv listings and injecting the data via SVDRP)?
>> The DVB standard provides for a way to mark text strings, so that
>> applications can correctly determine the actual encoding. The
>> VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
>> provider fails to correctly encode their strings.
> 
> Am I missing something, is there a way to mark a provider as being my 
> "main" one? Or is the workaround rather replacing the character set for 
> all incorrectly recognized ones (assuming that the application 
> determines the fact that it is incorrect)? If the latter case occures, 
> what if there are several providers not marking the encoding right, but 
> their epg content actually need different encodings, will they all use 
> the same encoding specified in VDR_CHARSET_OVERRIDE? (This reminds me of 
> the early UTF-8 patch which required setting the encoding for every 
> channel in channels.conf, which of course is ugly, but could handle 
> different EPG encoding needs in case of multiple providers failing to 
> mark this correctly).

Well, first and foremost providers should actually do their homework
and encode their stuff according to the standard.

The problem is with providers who don't add a codeset marker to their
strings. This is ok as long as they actually encode in ISO6937.
Unfortunately some providers use ISO-8859-9 instead (or maybe even
others). With VDR_CHARSET_OVERRIDE set, all strings that are not
explicitly marked as using a specific codeset are assumed to be
encoded in the way given by VDR_CHARSET_OVERRIDE.

>> External data source simply need to provide the strings in the
>> encoding used on your local system (presumably UTF-8).
> 
> So it should work in the case of correctly handling external data, thanks.
> 
> BTW, OSD then stays unaffected by VDR_CHARSET_OVERRIDE? It might be 
> worth renaming this to something more clearly specifying that it only 
> affects EPG.

This was just a last minute quick workaround (initially this was hardcoded),
since some (esp. Czech) providers actually do encode their strings in
ISO6937, and I didn't want to cause problems with those who do adhere to
the standard.

An elaborate workaround would probably require a separare file
in which transponders can be marked as using a specific default
codeset (and then VDR_CHARSET_OVERRIDE would vanish again).

But it would be so much better if these providers would just follow
the standard! Yes, I know, these are multi million dollar enterprises,
so they can't be bothered with "standards" - oh well...

Klaus
  
Lucian Muresan March 26, 2008, 2:02 p.m. UTC | #10
Klaus Schmidinger wrote:
> On 03/26/08 13:39, Lucian Muresan wrote:
>> Klaus Schmidinger wrote:
>> [..]
>>>>>> Please try setting VDR_CHARSET_OVERRIDE=ISO-8859-9 before starting
>>>>>> VDR. This should fix it.
>>>>> This is fixed !
>>>>>
>>>>> Thank you.
>>>> Looks like this is set globally, for all of the epg data, right? What 
>>>> about mixed charsets from different providers (I know for sure there 
>>>> are, and there are also the "external" data sources like tvmovie2vdr and 
>>>> the like fetching some xmltv listings and injecting the data via SVDRP)?
>>> The DVB standard provides for a way to mark text strings, so that
>>> applications can correctly determine the actual encoding. The
>>> VDR_CHARSET_OVERRIDE is just a workaround in case your "main"
>>> provider fails to correctly encode their strings.
>> Am I missing something, is there a way to mark a provider as being my 
>> "main" one? Or is the workaround rather replacing the character set for 
>> all incorrectly recognized ones (assuming that the application 
>> determines the fact that it is incorrect)? If the latter case occures, 
>> what if there are several providers not marking the encoding right, but 
>> their epg content actually need different encodings, will they all use 
>> the same encoding specified in VDR_CHARSET_OVERRIDE? (This reminds me of 
>> the early UTF-8 patch which required setting the encoding for every 
>> channel in channels.conf, which of course is ugly, but could handle 
>> different EPG encoding needs in case of multiple providers failing to 
>> mark this correctly).
> 
> Well, first and foremost providers should actually do their homework
> and encode their stuff according to the standard.
> 
> The problem is with providers who don't add a codeset marker to their
> strings. This is ok as long as they actually encode in ISO6937.
> Unfortunately some providers use ISO-8859-9 instead (or maybe even
> others). With VDR_CHARSET_OVERRIDE set, all strings that are not
> explicitly marked as using a specific codeset are assumed to be
> encoded in the way given by VDR_CHARSET_OVERRIDE.
> 
>>> External data source simply need to provide the strings in the
>>> encoding used on your local system (presumably UTF-8).
>> So it should work in the case of correctly handling external data, thanks.
>>
>> BTW, OSD then stays unaffected by VDR_CHARSET_OVERRIDE? It might be 
>> worth renaming this to something more clearly specifying that it only 
>> affects EPG.
> 
> This was just a last minute quick workaround (initially this was hardcoded),
> since some (esp. Czech) providers actually do encode their strings in
> ISO6937, and I didn't want to cause problems with those who do adhere to
> the standard.
> 
> An elaborate workaround would probably require a separare file
> in which transponders can be marked as using a specific default
> codeset (and then VDR_CHARSET_OVERRIDE would vanish again).
> 
> But it would be so much better if these providers would just follow
> the standard! Yes, I know, these are multi million dollar enterprises,
> so they can't be bothered with "standards" - oh well...

You're so right about these providers :-). Thanks for enlightening on 
the current state of this workaround. Maybe, if proven necessary, the 
extra file concept will be not too difficult or "unclean" to implement 
(possibly as a patch by someone else, myself not excluded).

Lucian
  

Patch

--- libsi/si.c  2008/03/05 17:00:55     1.25
+++ libsi/si.c  2008/03/19 21:30:47
@@ -416,6 +416,10 @@ 
     // FIXME Need to make this UTF-8 aware (different control codes).
     // However, there's yet to be found a broadcaster that actually
     // uses UTF-8 for the SI data... (kls 2007-06-10)
+   if (size > 20) {
+      to = stpcpy(to, cs);
+      to = stpcpy(to, "@");
+      }
     for (int i = 0; i < len; i++) {
        if (*from == 0)
           break;