...or the script that generates the lists used in the JavaScriptArrayLayout selectors?
Why?
Because I want to use UTF-8 encoding so that all the fancy chars in my product and section names show up properly!
Oh dear, why don't they show up?
Because somewhere in the SearchScript.pl, and presumably somewhere in some other script that generates the Act_sections.js, Act_section_tree.js and other 'JavaArray' code snippets, there is a call to a function to 'html-ify' the text of section names and product names.
That sounds good tho' - turning all the £ signs to & pound ;'s etc makes the text more generic doesn't it?
ONLY if the html-ification is done in the context of the correct character encoding!
Sorry, I'm from Actinic Support, you've lost me...
OK - if we assume all encodings use one byte per character, then simply reading a byte and deciding whether it needs converting to an html entity would be painless and only require a single byte:->htmlified value lookup table.
BUT, this is the modern world, and it is the brilliance of the UTF-8 encoding that allows it to provide almost the full range of unicode chars without embedding nulls in strings and without screwing up the normal ascii collation order - the only price we pay for this is that there are sometimes more than a single byte to represent a character.
So this is what happens - the simple translation takes two and three byte sequences and converts each byte individually to html entities, which is how they end up being displayed by a browser, as individual entites from the ISO-8859-1 ascii character set.
So - I just want to get in there and modify the scripts that generate the section lists and the search results and STOP the brain-dead attempt at htmification of the multi-byte sequences!
Anybody know how I can do that?
Why?
Because I want to use UTF-8 encoding so that all the fancy chars in my product and section names show up properly!
Oh dear, why don't they show up?
Because somewhere in the SearchScript.pl, and presumably somewhere in some other script that generates the Act_sections.js, Act_section_tree.js and other 'JavaArray' code snippets, there is a call to a function to 'html-ify' the text of section names and product names.
That sounds good tho' - turning all the £ signs to & pound ;'s etc makes the text more generic doesn't it?
ONLY if the html-ification is done in the context of the correct character encoding!
Sorry, I'm from Actinic Support, you've lost me...
OK - if we assume all encodings use one byte per character, then simply reading a byte and deciding whether it needs converting to an html entity would be painless and only require a single byte:->htmlified value lookup table.
BUT, this is the modern world, and it is the brilliance of the UTF-8 encoding that allows it to provide almost the full range of unicode chars without embedding nulls in strings and without screwing up the normal ascii collation order - the only price we pay for this is that there are sometimes more than a single byte to represent a character.
So this is what happens - the simple translation takes two and three byte sequences and converts each byte individually to html entities, which is how they end up being displayed by a browser, as individual entites from the ISO-8859-1 ascii character set.
So - I just want to get in there and modify the scripts that generate the section lists and the search results and STOP the brain-dead attempt at htmification of the multi-byte sequences!
Anybody know how I can do that?
Comment