<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.eiffelroom.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>eiffelroom - Mixing Unicode and Latin-1 class texts - Comments</title>
 <link>http://www.eiffelroom.org/blog/colin_adams/mixing_unicode_and_latin_1_class_texts</link>
 <description>Comments for &quot;Mixing Unicode and Latin-1 class texts&quot;</description>
 <language>en</language>
<item>
 <title>Relevance</title>
 <link>http://www.eiffelroom.org/blog/colin_adams/mixing_unicode_and_latin_1_class_texts#comment-173</link>
 <description>&lt;p&gt;The main benefit for homogeneous clusters is simpler heuristics - there is no possibility of confusing Latin-1 with UTF-8.&lt;/p&gt;

&lt;p&gt;If you can&#039;t eliminate the possibility of one or another, I know of know way to disambiguate them, in general.&lt;/p&gt;

&lt;p&gt;Of course, there are lots of cases where it is easier to see which of the two is meant. But in other cases, not.&lt;/p&gt;

&lt;p&gt;So, starting from the case where the file is pure ASCII. Did the author intend it to be treated as Latin-1 or UTF-8?&lt;/p&gt;

&lt;p&gt;In this case, it doesn&#039;t matter (the only possibility is the type of manifest string constants, but these are defined to be of type STRING).&lt;/p&gt;

&lt;p&gt;But all we have to do is to mutate one character in a string literal, and immediately (if we choose the mutation carefully), the case becomes undecidable.&lt;/p&gt;

&lt;p&gt;Colin Adams&lt;/p&gt;

</description>
 <pubDate>Sun, 01 Apr 2007 10:29:33 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 173 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Is this relevant?</title>
 <link>http://www.eiffelroom.org/blog/colin_adams/mixing_unicode_and_latin_1_class_texts#comment-165</link>
 <description>&lt;p&gt;The source code will be using plain text file (i.e. sequence of character codes that are between 0 and 255), UTF-8 or any other Unicode encoding. Once you have the encoding then the semantics is properly defined.&lt;/p&gt;

&lt;p&gt;Of course if one library author is using Unicode characters beyond 255, the user of that library will be forced to use a Unicode encoding for his source code, but is this relevant to the project specification? I don&#039;t think so.&lt;/p&gt;

</description>
 <pubDate>Fri, 30 Mar 2007 10:12:00 -0700</pubDate>
 <dc:creator>manus_eiffel</dc:creator>
 <guid isPermaLink="false">comment 165 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Heuristics</title>
 <link>http://www.eiffelroom.org/blog/colin_adams/mixing_unicode_and_latin_1_class_texts#comment-164</link>
 <description>&lt;p&gt;See also &lt;a href=&quot;http://eiffelsoftware.origo.ethz.ch/index.php/Heuristics_for_detecting_class_text_encoding&quot;&gt;Heuristics for detecting class text encoding&lt;/a&gt;. Colin Adams&lt;/p&gt;

</description>
 <pubDate>Fri, 30 Mar 2007 10:07:50 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 164 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Mixing Unicode and Latin-1 class texts</title>
 <link>http://www.eiffelroom.org/blog/colin_adams/mixing_unicode_and_latin_1_class_texts</link>
 <description>&lt;p&gt;Since ECMA allows class texts to be written as either sequences of CHARACTER_8 of CHARACTER_32 (which although not properly specified yet, we can assume means Latin-1 or Unicode), there arises the question of to what extent the two can be mixed.&lt;/p&gt;

&lt;p&gt;It is clear that fully unrestricted mixing is not possible. For instance, if a class written in Unicode has a routine named 了, then this routine cannot be called from a class written in Latin-1 (unless it is passed as an agent).&lt;/p&gt;

&lt;p&gt;I would suggest that a suitable rule is that all classes within a cluster must either be all CHARACTER_8 or all CHARACTER_32. Furthermore, no class in a CHARACTER_8 cluster may depend upon a class from a CHARACTER_32 cluster.&lt;/p&gt;

&lt;p&gt;This rule suggests a requirement for the ACE/XACE/ECF file formats to be able to specifying the character size used for writing class texts within a cluster (or library too, perhaps).&lt;/p&gt;

</description>
 <comments>http://www.eiffelroom.org/blog/colin_adams/mixing_unicode_and_latin_1_class_texts#comments</comments>
 <category domain="http://www.eiffelroom.org/tag/ecma">Ecma</category>
 <category domain="http://www.eiffelroom.org/tag/unicode">Unicode</category>
 <pubDate>Fri, 30 Mar 2007 09:40:34 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">156 at http://www.eiffelroom.org</guid>
</item>
</channel>
</rss>
