<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://www.eiffelroom.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>eiffelroom - UTF-8 in .NET, revisited - Comments</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited</link>
 <description>Comments for &quot;UTF-8 in .NET, revisited&quot;</description>
 <language>en</language>
<item>
 <title>There is a overhead involved</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-255</link>
 <description>&lt;p&gt;There is a overhead involved but most of the time I still got a pretty good indication where I can start to optimize. It&#039;s also possible to just selectively trace certain functions only, which reduces the profiling overhead a lot. If you look at the data in kcachegrind or a similar tool there is also more data available than this single graph can show you. E.g. the number of calls are also listed.&lt;/p&gt;

</description>
 <pubDate>Mon, 21 May 2007 00:49:07 -0700</pubDate>
 <dc:creator>patrickr</dc:creator>
 <guid isPermaLink="false">comment 255 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>No guarentees</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-254</link>
 <description>&lt;p&gt;Because you have no guarantees that the overhead is distributed proportionally to the true execution time of the routines involved. Instead, I would expect the overhead to be distributed according to the frequency of the routine calls, although I don&#039;t know this for sure.&lt;/p&gt;

&lt;p&gt;So I am interested in the proportion of the time spent in UTF-8 routines, not the frequency at which they are called. I don&#039;t care much about the latter as long as the overall time spent is small (because if it is small, there is no point in trying to optimize it). Colin Adams&lt;/p&gt;

</description>
 <pubDate>Sat, 19 May 2007 23:51:00 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 254 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Overhead</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-253</link>
 <description>&lt;p&gt;Hey Colin, I don&#039;t understand why profiler overhead invalidates the percentages.&lt;/p&gt;

&lt;p&gt;I&#039;m no expert on using profilers, but my experience with NProf 0.9 was that it increased the total run time from about twenty seconds up to about four minutes: a factor of 10 or so. In a normal twenty-second run outside the profiler, there&#039;s &lt;strong&gt;(1)&lt;/strong&gt; a delay of about five seconds while our VB application loads; then &lt;strong&gt;(2)&lt;/strong&gt; I click a few buttons, which takes a couple of seconds; then &lt;strong&gt;(3)&lt;/strong&gt; there&#039;s an eight- or nine-second delay which (as far as I can figure out) is the .NET jitter just-in-time compiling a huge amount of Eiffel code (our stuff + base library routines + Gobo geyacc and gelex stuff); followed by &lt;strong&gt;(4)&lt;/strong&gt; a second or so of parsing and then &lt;strong&gt;(5)&lt;/strong&gt; a second or so of populating some GUI controls. Therefore, the twenty seconds total run time consists of only about five or six seconds of time that&#039;s likely to be affected by profiler overhead; which would be an overhead factor of 80 or so.&lt;/p&gt;

&lt;p&gt;In other words, I think the NProf overhead factor is somewhere between 10 and 100.&lt;/p&gt;

&lt;p&gt;Despite this, I found the percentages reported by NProf to be very useful. They showed me two places where I should optimise, and doing so has cut steps &lt;strong&gt;(4)&lt;/strong&gt; and &lt;strong&gt;(5)&lt;/strong&gt; from about three seconds down to about one second. Without NProf&#039;s guidance, I might have attempted the &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY&lt;/code&gt; optimisation, but there&#039;s no way I would figured out the other one.&lt;/p&gt;

&lt;p&gt;(NProf hasn&#039;t helped me with the big delay at step &lt;strong&gt;(3)&lt;/strong&gt;. This delay is constant, regardless of what data file I feed to our application, and seems to be due to the .NET jitter infrastructure, so I don&#039;t think a profiler is going to help me there.)&lt;/p&gt;

</description>
 <pubDate>Sat, 19 May 2007 20:08:13 -0700</pubDate>
 <dc:creator>peter_gummer</dc:creator>
 <guid isPermaLink="false">comment 253 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Overhead</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-252</link>
 <description>&lt;p&gt;OK. I shall remember that. But I don&#039;t know if it shows anything meaningful.&lt;/p&gt;

&lt;p&gt;I&#039;d rather see the total accumulated time (as a %) of all routines in UC_UTF8_STRING.&lt;/p&gt;

&lt;p&gt;But it is only meaningful if the overhead of profiling is low. Last time I use the ES profiler, it lengthened runtimes by a factor of about 100, which meant it was useless.&lt;/p&gt;

&lt;p&gt;Do you have the elapsed times with and without profiling for comparison? Colin Adams&lt;/p&gt;

</description>
 <pubDate>Sat, 19 May 2007 12:15:45 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 252 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>It&#039;s generated by running</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-251</link>
 <description>&lt;p&gt;It&#039;s generated by running the app with the callgrind tool from valgrind (&lt;a href=&quot;http://valgrind.org/)&quot;&gt;http://valgrind.org/)&lt;/a&gt; and then back translating the c names into eiffel names by using &lt;a href=&quot;http://eiffelroom.com/tool/valgrind_converter&quot;&gt;http://eiffelroom.com/tool/valgrind_converter&lt;/a&gt; and then I used kcachegrind (&lt;a href=&quot;http://kcachegrind.sourceforge.net/cgi-bin/show.cgi)&quot;&gt;http://kcachegrind.sourceforge.net/cgi-bin/show.cgi)&lt;/a&gt;.&lt;/p&gt;

</description>
 <pubDate>Sat, 19 May 2007 11:11:19 -0700</pubDate>
 <dc:creator>patrickr</dc:creator>
 <guid isPermaLink="false">comment 251 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Pretty diagram</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-250</link>
 <description>&lt;p&gt;That&#039;s a really pretty diagram Patrick.&lt;/p&gt;

&lt;p&gt;How do you produce it? Colin Adams&lt;/p&gt;

</description>
 <pubDate>Sat, 19 May 2007 04:44:33 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 250 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Gobo xml parser</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-249</link>
 <description>&lt;p&gt;Here&#039;s a profile run of the gobo xml parser. The percentages are relative to XM_EIFFEL_PARSER_SKELETON::parse_from_string &lt;a href=&quot;http://www.eiffelroom.com/node/182&quot;&gt;&lt;img src=&quot;http://www.eiffelroom.com/files/images/xmlparser.png&quot; alt=&quot;gobo xml parser profile trace&quot; title=&quot;gobo xml parser profile trace&quot; width=&quot;2020&quot; height=&quot;1867&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

</description>
 <pubDate>Sat, 19 May 2007 03:04:33 -0700</pubDate>
 <dc:creator>patrickr</dc:creator>
 <guid isPermaLink="false">comment 249 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Apply Application Optimizations</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-245</link>
 <description>&lt;p&gt;&lt;em&gt;Apply Application Optimizations&lt;/em&gt; is a new option in EiffelStudio 6.0. I don&#039;t see it in 5.7. When we move to 6.0 I&#039;ll be sure to try it; I don&#039;t think any of our VB classes inherit from our Eiffel classes.&lt;/p&gt;

&lt;p&gt;Thanks for the advice, Paul.&lt;/p&gt;

</description>
 <pubDate>Thu, 17 May 2007 18:18:27 -0700</pubDate>
 <dc:creator>peter_gummer</dc:creator>
 <guid isPermaLink="false">comment 245 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>UC_UTF8_STRING performance</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-242</link>
 <description>&lt;p&gt;This was very interesting Peter. I have long suspected that the poor performance of the Gobo Eiffel XML parser was down to the UTF-8 implementation, and this seems to give corroboration of that conjecture. Colin Adams&lt;/p&gt;

</description>
 <pubDate>Thu, 17 May 2007 09:44:59 -0700</pubDate>
 <dc:creator>colin-adams</dc:creator>
 <guid isPermaLink="false">comment 242 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>Squeezing more performance from Eiffel for .NET</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comment-241</link>
 <description>&lt;p&gt;We&#039;ve all been pretty busy around here of late so there is a whole list of articles I need to get to. One of them deals with performance optimization in .NET.&lt;/p&gt;

&lt;p&gt;There are two additional things you can do to boost performance of your .NET application. First inherit a .NET type (&lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_OBJECT&lt;/code&gt; will probably be the most used) where multiple inheritance is not required. This will create an Eiffel single type. The implementation of single types do not have an interface and implementation type and so the CLR is able to optimized the jitted code. The CLR/JIT does not heavily optimize calls through interfaces, if at all!&lt;/p&gt;

&lt;p&gt;The second thing is to set the &lt;em&gt;Apply Application Optimizations&lt;/em&gt; target configuration option to &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;span style=&quot;color: #800080;&quot;&gt;True&lt;/span&gt;&lt;/code&gt;. This should only be used for end-point libraries. The optimization only marks end-point classes as frozen so the jitter can optimize the virtually dispatched calls. There is no rule to say that the &lt;em&gt;Apply Application Optimizations&lt;/em&gt; option can only be used on end-point applications/libraries. If you want to enable it for your precompiled libraries then feel free. The side effect is that all the end-point classes are marked frozen and so cannot be extended. Fortunately the &lt;em&gt;Apply Application Optimizations&lt;/em&gt; option can be applied at a target, cluster and class level for fine grained control.&lt;/p&gt;

&lt;p&gt;In future versions of Eiffel for .NET there will be no need to perform the first step because it will be part of the application optimization options.&lt;/p&gt;

</description>
 <pubDate>Thu, 17 May 2007 09:30:35 -0700</pubDate>
 <dc:creator>paulbates</dc:creator>
 <guid isPermaLink="false">comment 241 at http://www.eiffelroom.org</guid>
</item>
<item>
 <title>UTF-8 in .NET, revisited</title>
 <link>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited</link>
 <description>&lt;p&gt;A couple of months ago I described how I managed to get Eiffel for .NET to work with UTF-8 Unicode strings. The solution given in &lt;a class=&quot;&quot; style=&quot;&quot; href=&quot;/locate/UTF-8_Unicode_in_Eiffel_for_.NET&quot;&gt;UTF-8 Unicode in Eiffel for .NET&lt;/a&gt; seems to be working fine, but we&#039;ve noticed that our VB application is running more slowly than before. Well I did say, &amp;quot;This implementation is no doubt inefficient...&amp;quot;&lt;/p&gt;

&lt;p&gt;Another problem with that solution is that it creates a dependency on &lt;a href=&quot;http://www.gobosoft.com&quot;&gt;Gobo&lt;/a&gt;. This wasn&#039;t a problem for us, but it might bother others.&lt;/p&gt;

&lt;p&gt;Because we had a performance problem, I started profiling my application. EiffelStudio&#039;s built-in profiler doesn&#039;t work in .NET (despite the fact that Project Settings misleadingly offers this as an option in .NET projects), so I used &lt;a href=&quot;http://www.mertner.com/confluence/display/NProf/Home&quot;&gt;NProf&lt;/a&gt;, a free profiler for .NET. I tried NProf 0.10, the latest, but I found it strangely minimalist. Then I tried &lt;strong&gt;NProf 0.9.1&lt;/strong&gt;, and it was much better, because it gives more options for viewing the results of a profiling run. (I wonder whether NProf is being rewritten from scratch.)&lt;/p&gt;

&lt;p&gt;NProf showed me that, in a common usage scenario, our application was spending &lt;strong&gt;18.43%&lt;/strong&gt; of its time in &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;make_from_cil&lt;/span&gt;&lt;/code&gt;! Most of this was spent in one of the routines whose implementation I had overridden: &lt;strong&gt;16.25%&lt;/strong&gt; of the total time was in &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY.&lt;span style=&quot;color: #000060;&quot;&gt;read_string_into&lt;/span&gt;&lt;/code&gt;. To quantify just how bad my UTF-8 implementation was, I removed it and profiled again: NProf showed &lt;strong&gt;2.66%&lt;/strong&gt; and &lt;strong&gt;0.47%&lt;/strong&gt; respectively.&lt;/p&gt;

&lt;p&gt;I saw that &lt;code class=&quot;geshifilter eiffel&quot;&gt;make_from_cil&lt;/code&gt; and &lt;code class=&quot;geshifilter eiffel&quot;&gt;read_string_into&lt;/code&gt; were being called 39,000 times, and that most of these calls were from a particular &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt; function of ours that concatenates strings and string constants. This function was amenable to optimisation by caching its &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt; result in an attribute. This worked: it noticeably improved performance. According to NProf, there were now only 9,738 calls, reducing the respective percentages of total time to &lt;strong&gt;11.47%&lt;/strong&gt; and &lt;strong&gt;10.85%&lt;/strong&gt;. &lt;em&gt;Better, but still bad.&lt;/em&gt; Could I optimise &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY&lt;/code&gt; itself?&lt;/p&gt;

&lt;p&gt;My override of &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY&lt;/code&gt; was using Gobo&#039;s &lt;code class=&quot;geshifilter eiffel&quot;&gt;UC_UTF8_STRING&lt;/code&gt; to convert UTF-8 bytes to characters. In .NET, however, there is an obvious alternative: the &lt;code class=&quot;geshifilter csharp&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;System&lt;/span&gt;.&lt;span style=&quot;color: #000000;&quot;&gt;Text&lt;/span&gt;.&lt;span style=&quot;color: #000000;&quot;&gt;UTF8Encoding&lt;/span&gt;&lt;/code&gt; class. This class has various methods for encoding and decoding between .NET &lt;code class=&quot;geshifilter csharp&quot;&gt;String&lt;/code&gt; objects and character arrays, on the one hand, and .NET &lt;code class=&quot;geshifilter csharp&quot;&gt;Byte&lt;/code&gt; arrays on the other. The strings and character arrays are encoded in UTF-16; the byte arrays are encoded in UTF-8.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The following rewrite of &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY.&lt;span style=&quot;color: #000060;&quot;&gt;read_string_into&lt;/span&gt;&lt;/code&gt; takes only 1.10% of the application&#039;s total time&lt;/strong&gt;, a dramatic improvement which of course is reflected in &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;make_from_cil&lt;/span&gt;&lt;/code&gt;, which now takes only &lt;strong&gt;1.81%&lt;/strong&gt;. The application runs noticeably faster too!&lt;/p&gt;

&lt;p&gt;&lt;div class=&quot;geshifilter eiffel&quot; style=&quot;font-family: monospace;&quot;&gt;&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;local&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; i, nb: &lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+INTEGER&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;INTEGER&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; l_str8: &lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; bytes: NATIVE_ARRAY &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#91;&lt;/span&gt;NATURAL_8&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#93;&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;do&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;if&lt;/span&gt; a_result.&lt;span style=&quot;color: #000060;&quot;&gt;is_string_8&lt;/span&gt; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;then&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; i := a_str.&lt;span style=&quot;color: #000060;&quot;&gt;length&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;create&lt;/span&gt; bytes.&lt;span style=&quot;color: #000060;&quot;&gt;make&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#123;&lt;/span&gt;ENCODING&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#125;&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;utf8&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;get_max_byte_count&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;i&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; i := &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#123;&lt;/span&gt;ENCODING&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#125;&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;utf8&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;get_bytes&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;a_str, &lt;span style=&quot;color: #FF0000;&quot;&gt;0&lt;/span&gt;, i, bytes, &lt;span style=&quot;color: #FF0000;&quot;&gt;0&lt;/span&gt;&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; l_str8 ?= a_result&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; l_str8.&lt;span style=&quot;color: #000060;&quot;&gt;make&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;i&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; l_str8.&lt;span style=&quot;color: #000060;&quot;&gt;set_count&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;i&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#123;&lt;/span&gt;SYSTEM_ARRAY&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#125;&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;copy&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;bytes, l_str8.&lt;span style=&quot;color: #000060;&quot;&gt;area&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;native_array&lt;/span&gt;, i&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; &lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;else&lt;/span&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;This new implementation creates a .NET &lt;code class=&quot;geshifilter csharp&quot;&gt;Byte&lt;/code&gt; array (a &lt;code class=&quot;geshifilter eiffel&quot;&gt;NATIVE_ARRAY &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#91;&lt;/span&gt;NATURAL_8&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#93;&lt;/span&gt;&lt;/code&gt;, in Eiffel-speak) large enough to hold the biggest possible UTF-8 encoding of the .NET &lt;code class=&quot;geshifilter csharp&quot;&gt;String&lt;/code&gt; (a &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING&lt;/code&gt; in Eiffel-speak). It then calls &lt;code class=&quot;geshifilter csharp&quot;&gt;UTF8Encoding.&lt;span style=&quot;color: #000000;&quot;&gt;GetBytes&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;#40;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;/code&gt; to encode the UTF-16 characters in the &lt;code class=&quot;geshifilter csharp&quot;&gt;String&lt;/code&gt; as UTF-8.&lt;/p&gt;

&lt;p&gt;Finally, it copies these bytes straight into the &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt; result&#039;s &lt;code class=&quot;geshifilter eiffel&quot;&gt;native_array&lt;/code&gt;. This part is tricky; it took me quite a while to understand what I needed to do. The Eiffel &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt;&#039;s &lt;code class=&quot;geshifilter eiffel&quot;&gt;native_array&lt;/code&gt; is a .NET array of .NET characters (a &lt;code class=&quot;geshifilter eiffel&quot;&gt;NATIVE_ARRAY &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#91;&lt;/span&gt;CHARACTER_8&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#93;&lt;/span&gt;&lt;/code&gt;, in Eiffel-speak). Because .NET characters are UTF-16, you might expect that the &lt;code class=&quot;geshifilter eiffel&quot;&gt;native_array&lt;/code&gt; would be UTF-16 too. I sure did. But it isn&#039;t; it&#039;s UTF-8. Only the least significant of each character&#039;s two bytes is used by normal Eiffel code. This can get really confusing, because it is possible, via .NET classes, to stuff &lt;code class=&quot;geshifilter eiffel&quot;&gt;native_array&lt;/code&gt; with UTF-16 characters; this can produce weird logic errors, such as when the EiffelStudio 5.7 debugger told me that a particular character &lt;code class=&quot;geshifilter eiffel&quot;&gt;a_char&lt;/code&gt; had the ordinal value &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;span style=&quot;color: #FF0000;&quot;&gt;45&lt;/span&gt;&lt;/code&gt;, and a debugger watch expression told me that &lt;code class=&quot;geshifilter eiffel&quot;&gt;a_char &amp;lt;= &lt;span style=&quot;color: #FF0000;&quot;&gt;127&lt;/span&gt;&lt;/code&gt; was &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;span style=&quot;color: #800080;&quot;&gt;True&lt;/span&gt;&lt;/code&gt;, but the running program evaluated &lt;code class=&quot;geshifilter eiffel&quot;&gt;a_char &amp;lt;= &lt;span style=&quot;color: #FF0000;&quot;&gt;127&lt;/span&gt;&lt;/code&gt; as &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;span style=&quot;color: #800080;&quot;&gt;False&lt;/span&gt;&lt;/code&gt;. Weird! After many hours, I figured out that the character&#039;s ordinal value was actually not &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;span style=&quot;color: #FF0000;&quot;&gt;45&lt;/span&gt;&lt;/code&gt;, but that it had something in the high byte due to UTF-16 encoding. Once I understood this important point, I realised that I simply needed to copy the UTF-8 bytes straight into the &lt;code class=&quot;geshifilter eiffel&quot;&gt;native_array&lt;/code&gt;. Simple!&lt;/p&gt;

&lt;p&gt;This dealt with the worst inefficiency, but I decided to tackle &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY.&lt;span style=&quot;color: #000060;&quot;&gt;from_string_to_system_string&lt;/span&gt;&lt;/code&gt; too.&lt;/p&gt;

&lt;p&gt;&lt;div class=&quot;geshifilter eiffel&quot; style=&quot;font-family: monospace;&quot;&gt;nb := a_str.&lt;span style=&quot;color: #000060;&quot;&gt;count&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;create&lt;/span&gt; bytes.&lt;span style=&quot;color: #000060;&quot;&gt;make&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;nb&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;from&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; i := &lt;span style=&quot;color: #FF0000;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;until&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; i &amp;gt; nb&lt;br /&gt;
&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;loop&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; bytes.&lt;span style=&quot;color: #000060;&quot;&gt;put&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;i - &lt;span style=&quot;color: #FF0000;&quot;&gt;1&lt;/span&gt;, a_str.&lt;span style=&quot;color: #000060;&quot;&gt;code&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;i&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;to_natural_8&lt;/span&gt;&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;br /&gt;
&amp;nbsp; &amp;nbsp; i := i + &lt;span style=&quot;color: #FF0000;&quot;&gt;1&lt;/span&gt;&lt;br /&gt;
&lt;span style=&quot;color: #0600FF; font-weight: bold;&quot;&gt;end&lt;/span&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;span style=&quot;color: #800080;&quot;&gt;Result&lt;/span&gt; := &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#123;&lt;/span&gt;ENCODING&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#125;&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;utf8&lt;/span&gt;.&lt;span style=&quot;color: #000060;&quot;&gt;get_string&lt;/span&gt; &lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#40;&lt;/span&gt;bytes&lt;span style=&quot;color: #FF0000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;/div&gt;&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;geshifilter csharp&quot;&gt;UTF8Encoding.&lt;span style=&quot;color: #000000;&quot;&gt;GetString&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;#40;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&amp;#41;&lt;/span&gt;&lt;/code&gt; helps out, by decoding the bytes in the &lt;code class=&quot;geshifilter eiffel&quot;&gt;&lt;a href=&quot;http://www.google.com/search?q=site%3Ahttp%3A%2F%2Fdocs.eiffel.com%2Feiffelstudio%2Flibraries+STRING&amp;btnI=I%27m+Feeling+Lucky&quot;&gt;&lt;span style=&quot;color: #800000&quot;&gt;STRING&lt;/span&gt;&lt;/a&gt;&lt;/code&gt;&#039;s &lt;code class=&quot;geshifilter eiffel&quot;&gt;native_array&lt;/code&gt; to create a .NET &lt;eiffel&gt;SYSTEM_STRING&lt;eiffel&gt;. That&#039;s all there is to it. The only complication is the loop, which converts the character array &lt;code class=&quot;geshifilter eiffel&quot;&gt;native_array&lt;/code&gt; into the byte array &lt;code class=&quot;geshifilter eiffel&quot;&gt;bytes&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;My new override of &lt;code class=&quot;geshifilter eiffel&quot;&gt;SYSTEM_STRING_FACTORY&lt;/code&gt; is attached. Like the old implementation, it assumes that we are working with UTF-8 strings.&lt;/p&gt;

</description>
 <comments>http://www.eiffelroom.org/blog/peter_gummer/utf_8_in_net_revisited#comments</comments>
 <category domain="http://www.eiffelroom.org/tag/utf_8_unicode_net">UTF-8 Unicode .NET</category>
 <enclosure url="http://www.eiffelroom.org/files/system_string_factory_0.zip" length="1163" type="application/zip" />
 <pubDate>Thu, 17 May 2007 07:11:03 -0700</pubDate>
 <dc:creator>peter_gummer</dc:creator>
 <guid isPermaLink="false">181 at http://www.eiffelroom.org</guid>
</item>
</channel>
</rss>
