Merge lp://qastaging/~robert-ancell/intltool/xml_numeric_charrefs into lp://qastaging/intltool

Proposed by Robert Ancell
Status: Rejected
Rejected by: Данило Шеган
Proposed branch: lp://qastaging/~robert-ancell/intltool/xml_numeric_charrefs
Merge into: lp://qastaging/intltool
Diff against target: 25 lines (+4/-0)
2 files modified
intltool-extract.in (+2/-0)
intltool-merge.in (+2/-0)
To merge this branch: bzr merge lp://qastaging/~robert-ancell/intltool/xml_numeric_charrefs
Reviewer Review Type Date Requested Status
Данило Шеган code Needs Fixing
Review via email: mp+16111@code.qastaging.launchpad.net
To post a comment you must log in.
Revision history for this message
Данило Шеган (danilo) wrote :

Thanks for the fix, Robert. What is the reason to support hexadecimal digits in regular decimal entities (i.e. to have a-fA-F in a line for chr() call)?

Also, I am not sure we want this as such. What's the reasoning to convert them to real characters? Was it not possible to input them in the source document as real characters as well? intltool tries to do minimal conversion in these cases, allowing developers and translators full flexibility. I.e. you do convert these to actual characters, but never convert them back (though, that would be pretty bad because we can't tell which should be converted and which should not).

One big reason against it is that it might break many existing translations in the sense that it's an incompatible change (i.e. if someone has a message with these in, it would have to be re-translated to all the languages). Can you at least give the basic idea of how many of eg. GNOME projects are already using these entities and would be affected?

Also, what happens if intltool is not running in a Unicode locale and unsupported character for current locale is tried?

So far, I am leaning on rejecting this patch, but would like to hear more reasons behind it. For once, I know it'd be friendlier to translators in a general case, which is good except when you make them re-translate many things because of a toolchain change.

If we agree we want it, can you also add at least a minimal test case to tests/? I know it'd be best if we had unit tests for a thing like this, but since we don't, another test case added would be nice.

review: Needs Fixing (code)
703. By Robert Ancell

Fix decimal charref regexp

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Hi,

I've fixed up the copy-paste regexp error in the BZR branch.

The reasoning is purely so the translations are easier to read for translators. My motivation is gcalctool which I maintain has many unicode characters in the button text and they look indecipherable to translators.

Many projects would be affected as Glade likes to escape all unicode characters (they are entered as unicode characters in the interface and always appear non-escaped when editing), I couldn't say how many use characters that need escaping though I would guess it would be a non-trivial amount.

I guess if you're not in a Unicode locale it can never guarantee to work.

I tried making a test case but diff doesn't seem to be utf-8 aware!?
--- tests/cases/extract8.glade.h 2009-12-21 09:49:33.937921631 +1100
+++ tests/results/extract8.glade.h 2009-12-21 09:49:31.200422524 +1100
@@ -2,7 +2,7 @@
 char *s = N_("A label");
 char *s = N_("Cancel (and exit) the test.");
 char *s = N_("Click the button below");
-char *s = N_("Multiply symbol: �");
+char *s = N_("Multiply symbol: ×");
 char *s = N_("Testing");
 char *s = N_("Testing2");
 char *s = N_("This action\n"

So in short, I don't know enough about it to say how safe it would be but it is a pain for translators currently (and I would expect increasingly so as correct Unicode symbols are used more and more in applications).

I'm not sure if it would be better to change the XML escape sequences to C escape sequences (e.g. "\xD7") so then Launchpad translations could at least change them in it's GUI.

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Request on Glade to not escape these characters:
https://bugzilla.gnome.org/show_bug.cgi?id=596205

Revision history for this message
Данило Шеган (danilo) wrote :

Considering this was fixed in Glade yesterday, does it make sense to call this merge proposal obsolete?

Revision history for this message
Robert Ancell (robert-ancell) wrote :

The Glade problem no longer occurs. If the intltool can't tell the character set then this is obsolete.

Unmerged revisions

703. By Robert Ancell

Fix decimal charref regexp

702. By Robert Ancell <bob@alchemy>

make intltool recognise numeric character references (they occur in glade files)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'intltool-extract.in'
2--- intltool-extract.in 2009-07-16 02:39:26 +0000
3+++ intltool-extract.in 2009-12-20 22:41:12 +0000
4@@ -246,6 +246,8 @@
5 s/&lt;/</g;
6 s/&gt;/>/g;
7 s/&amp;/&/g;
8+ s/&\#([0-9]+);/chr($1)/eg;
9+ s/&\#x([0-9a-fA-F]+);/chr(hex($1))/eg;
10
11 return $_;
12 }
13
14=== modified file 'intltool-merge.in'
15--- intltool-merge.in 2009-07-16 02:39:26 +0000
16+++ intltool-merge.in 2009-12-20 22:41:12 +0000
17@@ -541,6 +541,8 @@
18 s/&lt;/</g;
19 s/&gt;/>/g;
20 s/&amp;/&/g;
21+ s/&\#([0-9]+);/chr($1)/eg;
22+ s/&\#x([0-9a-fA-F]+);/chr(hex($1))/eg;
23
24 return $_;
25 }

Subscribers

People subscribed via source and target branches