Code review comment for lp://qastaging/~thelinuxguy/openlp/fix-newline-bug

Revision history for this message
Phill (phill-ridout) wrote :

I've done some research in to this. Officially the only code points allowed are:

"
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
" ( https://www.w3.org/TR/REC-xml/#charsets)

So going on this our regex should be:

re.compile(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]')

we probably should filter out the other code points too.

Its worth noting that REPLACMENT_CHARS_MAP replaces the vertical tab and form feed chars with "\n\n" before the CONTROL_CHARS regex filters them out!

So in summary please replace the regex with the one in the inline comment. (Note: its not tested)

review: Needs Fixing

« Back to merge proposal