<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>var/log &#187; Programming</title>
	<atom:link href="http://www.varslashlog.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.varslashlog.com</link>
	<description>Yet another weblog</description>
	<lastBuildDate>Sat, 12 Sep 2009 13:34:27 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How to stop spam without captcha or javascript</title>
		<link>http://www.varslashlog.com/2009/06/03/how-to-stop-spam-without-pictures-or-javascript/</link>
		<comments>http://www.varslashlog.com/2009/06/03/how-to-stop-spam-without-pictures-or-javascript/#comments</comments>
		<pubDate>Wed, 03 Jun 2009 00:15:46 +0000</pubDate>
		<dc:creator>AHSauge</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Antispam]]></category>
		<category><![CDATA[captcha]]></category>
		<category><![CDATA[Header]]></category>
		<category><![CDATA[HTTP]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[picture]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[spambot]]></category>

		<guid isPermaLink="false">http://www.varslashlog.com/?p=169</guid>
		<description><![CDATA[This time I&#8217;m going to put up a really bold statement. I believe it&#8217;s currently possible to fight spam on the web without pictures, javascript or anything else that might compromise usability. Sounds quite impossible and too good to be true, right? Well, it might not be that far fetched as it seems. Please read [...]]]></description>
			<content:encoded><![CDATA[<p>This time I&#8217;m going to put up a really bold statement. I believe it&#8217;s currently possible to fight spam on the web without pictures, javascript or anything else that might compromise usability. Sounds quite impossible and too good to be true, right? Well, it might not be that far fetched as it seems. Please read on and I&#8217;ll explain why.<span id="more-169"></span></p>
<p>We&#8217;ve all seen it, <a title="Article on wikipedia about captcha" href="http://en.wikipedia.org/wiki/Captcha">captcha</a> or pictures that protects alot of web-forms against those nasty spam-bots while at the same time blissfully destroying the usability of the web (<a href="http://www.johnmwillis.com/other/top-10-worst-captchas/">10 &#8220;good&#8221; examples</a>). While a lot of attempts have been made to make it more usable for more people (pictures aren&#8217;t exactly user-friendly for blind people &#8230;), it still boils down to some requirements either for the browser (e.g. javascript) or the user (typing into an input field). While requiring the browser to support something isn&#8217;t that bad, requiring the user to do something definitely isn&#8217;t usability at it&#8217;s best. Wouldn&#8217;t it be nice to just instant see whether or not the current request is made from a bot or a browser? Well, you might be able &#8230;</p>
<p>For the last two weeks I&#8217;ve created my personal pet project logging spam attempts in some scripts running at <a href="http://www.ascdevel.com">Ascended Development</a>. After more than 300 attempts and above 100 unique spam-bots (or IPs anyway), the results shock me &#8230; a lot. The spam-bots that has visited the site is incredible stupid. In simplest terms they don&#8217;t even seem to be able to handle cookies (e.g. thereby also session in PHP), little less the pictures or javascript. The most ridiculous thing is that they actually do send data in the antispam input field. The problem is that it&#8217;s 2 to 4 times longer than the number of characters in the picture, and yes, it didn&#8217;t supply the cookie for the session meaning it wouldn&#8217;t succeed even if the wild guess was right.</p>
<p>So what can we do without requiring user input or javascript? Certainly not add a hidden input field in the form and hope that the bot add something to it. All bots (yeah, actually every single one of them) either provide the default value or didn&#8217;t provide the field at all. Actually out of the 300 attempts only 50 didn&#8217;t provide the field, and a massive 250 attempts had the default value. None tried any other values, and it seems that those that didn&#8217;t provide the field tried again with the default value. Simply put, a hidden field don&#8217;t work, at least not on those bots visiting our site at the logged locations.</p>
<p>So here&#8217;s the trick: Simply read the HTTP-headers. Has to be too good to be true? Well, the log is equally clear on this matter too. None of the bots provided the fields Accept-Language and Accept-Encoding, both of which quite frankly any decent browser sends out these days (Opera, Firefox, Konqueror, Chrome, Safari and even IE). Even lynx, a text browser, does send these headers, and I tested it with a two year old release. It does make sense if you think about it. The browsers will add the Accept-Language so pages can be correctly localized and Accept-Encoding so that compression can be used. Both things is benefitial to the user, and therefor present despite both being optional. The spambots on the other hand seem to be using libaries like libcurl to build their HTTP-client, and by default these libraries don&#8217;t seem add Accept-Language or Accept-Encoding. Add the fact that few, if any, sort spam-bots from browsers this way, and we can see that it&#8217;s not really such a surprise after all.</p>
<p>This isn&#8217;t without it&#8217;s flaws though. I&#8217;ve only encountered simple, general spam-bots and not the ones attacking widespread software like phpBB or vBulletin. Also, the site isn&#8217;t subject to targeted attacks. That been said, I wouldn&#8217;t be surprised if they too fail to provide these HTTP-headers, and for the time being I&#8217;m quite confident that this method is about as efficient, or better, than the current widespread method of using pictures. It should also continue to be that way until this type of checking is more widely used. So until then, I believe this is a good way to avoid spam in you web applications:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #990000;">isset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'HTTP_ACCEPT_ENCODING'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">===</span> <span style="color: #009900; font-weight: bold;">false</span> <span style="color: #339933;">||</span>
    <span style="color: #990000;">isset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$_SERVER</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'HTTP_ACCEPT_LANGUAGE'</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">===</span> <span style="color: #009900; font-weight: bold;">false</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">'Spambot'</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #b1b100;">else</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">'Browser'</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>If only this was working against email spam too &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.varslashlog.com/2009/06/03/how-to-stop-spam-without-pictures-or-javascript/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>How to use Unicode/UTF-8 in PHP properly (part 1)</title>
		<link>http://www.varslashlog.com/2009/02/09/how-to-use-unicodeutf-8-in-php-properly-part-1/</link>
		<comments>http://www.varslashlog.com/2009/02/09/how-to-use-unicodeutf-8-in-php-properly-part-1/#comments</comments>
		<pubDate>Mon, 09 Feb 2009 01:45:13 +0000</pubDate>
		<dc:creator>AHSauge</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[iconv]]></category>
		<category><![CDATA[mbstring]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[UTF-8]]></category>

		<guid isPermaLink="false">http://www.varslashlog.com/?p=119</guid>
		<description><![CDATA[I&#8217;ve previous been writing about why PHP and Unicode/UTF-8 is a bad combination. Even though UTF-8 in PHP should (for now) be avoided, it is sometimes a necessity to use it. As UTF-8 can be quite problematic for some people to use, I thought I this time should write about how to actually use it [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve <a title="Why PHP and Unicode/UTF-8 is a bad combination" href="http://www.varslashlog.com/2008/10/27/why-php-and-unicode-is-a-bad-combination/">previous</a> been writing about why PHP and Unicode/UTF-8 is a bad combination. Even though UTF-8 in PHP should (for now) be avoided, it <em>is</em> sometimes a necessity to use it. As UTF-8 can be quite problematic for some people to use, I thought I this time should write about how to actually use it properly. In this first part I&#8217;ll deal with the basic handling of UTF-8 in PHP using the PHP extensions mbstring and/or iconv.</p>
<p><span id="more-119"></span></p>
<h4>The basic facts</h4>
<p>The key element when using a multibyte character set in PHP is to know exactly what you&#8217;re doing. If you don&#8217;t, you can easily end up with partially corrupted text and wrong results. The one big reason for this is the fact that PHP by default does not support anything other than byte-sized character set. In fact, strictly speaking, PHP doesn&#8217;t really know what a character set is. All it sees are bytes, not characters. This means that every string-function in PHP works on the assumption that a byte is a character. When dealing with for instance UTF-8 this is no longer true. The result is that strlen reports the number of bytes in the string, not the number of characters. Similarly, strpos will give you the position in bytes, not characters, and many of the other string-functions have similar problems. So what to do?</p>
<p>First off, a very handy fact about UTF-8 is it&#8217;s ASCII-compatible (7bit ASCII that is), meaning these characters are binary represented as 0xxx xxxx (where x are ASCII bits). Another handy fact about valid UTF-8 strings is that any encoded Unicode character has a unique byte sequence, meaning it can&#8217;t be confused with a part of another character. This means that if you encounter a byte 00100000 (20 hex or 32 dec) it can not be anything other than a space character, or else it&#8217;s not a valid UTF-8 string.  For those interested in how this is archived, here&#8217;s the binary representation of UTF-8 characters (skip if you&#8217;re not into that type of stuff ;o)</p>
<blockquote>
<pre>1 byte:  0xxx xxxx
2 bytes: 110x xxxx 10xx xxxx
3 bytes: 1110 xxxx 10xx xxxx 10xx xxxx
4 bytes: 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx</pre>
</blockquote>
<p>Well, enough with talk about UTF-8 in general. Here&#8217;s a step-by-step guild how to use UTF-8 in PHP.</p>
<h4>1.  Find you what you have available</h4>
<p>If you&#8217;re going to use UTF-8 in PHP you&#8217;ll first need to find you what&#8217;s available to you. Ideally you should have the PHP extension <a title="link to the PHP manual for mbstring" href="http://www.php.net/mbstring">mbstring</a> install and it should <span style="text-decoration: underline;">not</span> be set to overload str-functions. With mbstring you&#8217;ll have a set of functions that are multibyte aware. If you don&#8217;t have mbstring available, check for <a title="Link to PHP manual for iconv" href="http://www.php.net/iconv">iconv</a> (also an PHP extension). In PHP 5 and later this extension will give you some very simple functions to work with (strlen, strpos, strrpos, substr and validation). If you have neither mbstring nor iconv available at your host (I&#8217;m assuming that you&#8217;re going to run something in a hosted server), you should strongly consider change host and/or your need for Unicode/UTF-8, as you&#8217;re going to have to make some native functions that works properly on UTF-8 encoded strings. For the sake of simplicity, I&#8217;m going to assume you have mbstring or iconv installed.</p>
<h4>2. Store your files as UTF-8</h4>
<p>This should be somewhat obvious. If you&#8217;re going to use UTF-8, you should store your files as it too. The simple reason is that any strings you have stored in your scripts will be UTF-8 too, and should be outputed properly given you&#8217;ve done everything else correct.</p>
<h4>3. Define input and output as UTF-8</h4>
<p>This is quite depending on what you&#8217;re doing, but chances are that you&#8217;re dealing with the HTTP-protocol and HTML. If so you have to add the following function call before any output (if you&#8217;re not using output buffering).</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #990000;">header</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'Content-Type: text/html; charset=utf-8'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>and add the following to the head-part of your HTML-document</p>

<div class="wp_syntax"><div class="code"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">meta</span> <span style="color: #000066;">http-equiv</span><span style="color: #66cc66;">=</span>Content-<span style="color: #000066;">Type</span> <span style="color: #000066;">content</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;text/html; charset=utf-8&quot;</span>&gt;</span></pre></div></div>

<p>This will state that any output is html and UTF-8. If you&#8217;re not dealing with html, just replace text/html with whatever you&#8217;re using (and proably drop the meta tag too). This should also ensure that input from most browsers is UTF-8, but to be really sure it might be a good idea to add the attribute accept-charset to any forms you might have, like this:</p>

<div class="wp_syntax"><div class="code"><pre class="html4strict" style="font-family:monospace;"><span style="color: #009900;">&lt;<span style="color: #000000; font-weight: bold;">form</span> <span style="color: #000066;">accept-charset</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;UTF-8&quot;</span> <span style="color: #000066;">method</span><span style="color: #66cc66;">=</span><span style="color: #ff0000;">&quot;post&quot;</span>&gt;</span>
<span style="color: #808080; font-style: italic;">&lt;!-- Input stuff here --&gt;</span>
<span style="color: #009900;">&lt;<span style="color: #66cc66;">/</span><span style="color: #000000; font-weight: bold;">form</span>&gt;</span></pre></div></div>

<p>This attribute can also be used with a comma-separated list of character set your script accept. To keep it simple you should just use UTF-8 as the only accepted &#8220;character set&#8221; (strictly speaking, UTF-8 is an encoding of the Unicode character set, and not a charset by itself).</p>
<p>If you don&#8217;t have control over the input, for instance RSS feed from 3. party server, transform it to Unicode and encode to UTF-8. This can be done like this:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">//mbstring</span>
<span style="color: #000088;">$UTF8string</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mb_convert_encoding</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$string</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'other charset'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'UTF-8'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// or call</span>
<span style="color: #990000;">mb_internal_encoding</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'UTF-8'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// once to use UTF-8 as default and and drop last parameter like this:</span>
<span style="color: #000088;">$UTF8string</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mb_convert_encoding</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$string</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'other charset'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">//iconv</span>
<span style="color: #000088;">$UTF8string</span> <span style="color: #339933;">=</span> <span style="color: #990000;">iconv</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'other charset'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'UTF-8'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$string</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Consult the PHP manual for supported character sets.</p>
<h4>4. Validate your input</h4>
<p>I can not stress this point enough. Check that your input actually is UTF-8, or the multibyte aware functions might not work as expected. Also, if not validated, those who view or store your data might get security problems like SQL-injection (though it would be their fault it&#8217;s happening &#8230;). This can be done as follows for mbstring:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">//last parameter can still be dropped as showed in last example</span>
<span style="color: #000088;">$validUTF8</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mb_check_encoding</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$string</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'UTF-8'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #666666; font-style: italic;">//will give true if valid, false if not</span></pre></div></div>

<p>for iconv you&#8217;ll have to go for a bit more dirty solution</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">function</span> validateUTF8_iconv<span style="color: #009900;">&#40;</span><span style="color: #000088;">$before</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #000088;">$after</span> <span style="color: #339933;">=</span> <span style="color: #990000;">iconv</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'UTF-8'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'UTF-8'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$before</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$before</span> <span style="color: #339933;">===</span> <span style="color: #000088;">$after</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The reason this works is because any non-valid characters are removed or changed, and thus the strings will not be equal anymore.</p>
<h4>5. Store your data correct</h4>
<p>Now this is really a pit a lot of people fall into. Storing UTF-8 on disc is no problem, and is done just as before. Storing data in a database however, is in the case of MySQL definitely <em>not</em> as before. First of you&#8217;ll need MySQL 4.1 or later as the versions before don&#8217;t support what we&#8217;re about to do. The big problem with PHP and MySQL is that the connection is by default set to latin1, also known as ISO-8859-1, and a lot of people then actually store there data as a UTF-8-transformed ISO-8859-1. That is, MySQL thinks your input is ISO-8859-1 and then  convert it to Unicode and encode it as UTF-8. This will lead to problems when you view your data in for instance phpMyAdmin which connects to MySQL the proper way when dealing with UTF-8. The correct way to connect to MySQL is now</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$link</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_connect</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'localhost'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'mysql_user'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'mysql_password'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #990000;">mysql_query</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;SET NAMES 'utf8' COLLATE 'utf8_general_ci'&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The difference is the second line which states what charset is being used for further talk on the connection. The collate part can be dropped if you&#8217;re using the default one (utf8_general_ci). In addition to this, the table and fields should be defined with collation utf8_[something] and to save space don&#8217;t use char-fields as they will use 3x number of characters you define. Instead you should use varchar. Please also note that MySQL only supports Unicode 3.0, that is 2 byte Unicode up to code-point FFFE (<a title="Byte Order Mark" href="http://en.wikipedia.org/wiki/Byte_Order_Mark">BOM</a>) or 3 byte UTF-8. If you need any code-points above that, you&#8217;re unfortunately in for some trouble &#8230;</p>
<p>This is however only the case of MySQL. For other database systems you should check what is the default charset and how to change it. The documentation/manual of the system is a good place to start to find this type of information.</p>
<h4>6. Functions operating on UTF-8</h4>
<p>The points above should make sure that you&#8217;re using UTF-8. The last thing is that you have to remember that any str-functions might not work correctly. Try to use str-functions defined in mbstring or iconv (see documentation). Some functions do however work as expected. Strcmp does only a binary compare and still works, strcasecmp however don&#8217;t. Str_replace will also work on valid UTF-8 strings as any given character has a unique byte sequence, but the case-less version str_ireplace don&#8217;t. In general any str-function that is not case-less and don&#8217;t need to operate on the number of characters in the string, should work just fine as long as any input to the function is valid UTF-8.</p>
<p>There are also some functions that are not part of mbstring or iconv that do support UTF-8. Htmlspecialchars, htmlentities and preg_* (with u modifier) are examples of this. There are also functions that operate on a purley binary level without any regards to charset. Examples of this is the md5 and sha1 functions.</p>
<p>That concludes this part of the howto. You should now be able to use UTF-8 the right way. If you have any questions, please leave a comment and I&#8217;ll happily answer. The next part will hopefully deal with some common pitfalls and how to debug and solve them.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.varslashlog.com/2009/02/09/how-to-use-unicodeutf-8-in-php-properly-part-1/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Busting &#8220;echo is faster than print&#8221;</title>
		<link>http://www.varslashlog.com/2008/11/05/busting-echo-is-faster-than-print/</link>
		<comments>http://www.varslashlog.com/2008/11/05/busting-echo-is-faster-than-print/#comments</comments>
		<pubDate>Wed, 05 Nov 2008 23:17:09 +0000</pubDate>
		<dc:creator>AHSauge</dc:creator>
				<category><![CDATA[Optimization]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[echo]]></category>
		<category><![CDATA[echo or print]]></category>
		<category><![CDATA[echo vs print]]></category>
		<category><![CDATA[print]]></category>

		<guid isPermaLink="false">http://www.varslashlog.com/?p=89</guid>
		<description><![CDATA[I&#8217;ve previously been talking about premature optimization. Today I thought I should illustrate how ridicules some of the stuff is by busting &#8220;echo is faster than print&#8221;. There&#8217;s a lot of people calming that this is true (1, 2, 3, 4 and a lot of others). Now to be clear, they are somewhat right about [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve previously been talking about <a title="Premature optimization is bad" href="http://www.varslashlog.com/2008/11/01/premature-optimization-is-bad/" target="_blank">premature optimization</a>. Today I thought I should illustrate how ridicules some of the stuff is by busting &#8220;echo is faster than print&#8221;. There&#8217;s a lot of people calming that this is true (<a href="http://elliottback.com/wp/php-performance-echo-print/trackback/" target="_blank">1</a>, <a href="http://reinholdweber.com/?p=3" target="_blank">2</a>, <a href="http://www.chazzuka.com/blog/wp-trackback.php?p=163" target="_blank">3</a>, <a href="http://hmvrulz.wordpress.com/2008/09/23/20-php-optimization-tips-make-it-faster/" target="_blank">4</a> and <a title="Search result for 'echo is faster than print'" href="http://www.google.no/search?q=echo+is+faster+than+print" target="_blank">a lot of others</a>). Now to be clear, they are somewhat right about this. Echo <em>is</em> really faster than print, which it really should due to the fact that echo don&#8217;t return anything while print does. The problem and my point here however, is how much this difference really is. The answer is so tiny that you can forget actually measuring it in a real world application. There&#8217;s why:<span id="more-89"></span><br />
First off, the code I&#8217;ve been using is</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #000088;">$res</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array_fill</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">5</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">5</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #990000;">ob_start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$start</span> <span style="color: #339933;">=</span> <span style="color: #990000;">microtime</span><span style="color: #009900;">&#40;</span><span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$j</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$j</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">1000000</span><span style="color: #339933;">;</span> <span style="color: #000088;">$j</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>
        <span style="color: #b1b100;">echo</span> <span style="color: #0000ff;">&quot;.&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$end</span> <span style="color: #339933;">=</span> <span style="color: #990000;">microtime</span><span style="color: #009900;">&#40;</span><span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$res</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$end</span><span style="color: #339933;">-</span><span style="color: #000088;">$start</span><span style="color: #339933;">;</span>
    <span style="color: #990000;">ob_end_clean</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">5</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>
    <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$res</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>and</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #000088;">$res</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array_fill</span><span style="color: #009900;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">5</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">for</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">5</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #990000;">ob_start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$start</span> <span style="color: #339933;">=</span> <span style="color: #990000;">microtime</span><span style="color: #009900;">&#40;</span><span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$j</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$j</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">1000000</span><span style="color: #339933;">;</span> <span style="color: #000088;">$j</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>
        <span style="color: #b1b100;">print</span> <span style="color: #0000ff;">&quot;.&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$end</span> <span style="color: #339933;">=</span> <span style="color: #990000;">microtime</span><span style="color: #009900;">&#40;</span><span style="color: #009900; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000088;">$res</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$end</span><span style="color: #339933;">-</span><span style="color: #000088;">$start</span><span style="color: #339933;">;</span>
    <span style="color: #990000;">ob_end_clean</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$i</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span> <span style="color: #339933;">&lt;</span> <span style="color: #cc66cc;">5</span><span style="color: #339933;">;</span> <span style="color: #000088;">$i</span><span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span>
    <span style="color: #b1b100;">echo</span> <span style="color: #000088;">$res</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$i</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">.</span><span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>As you can see I&#8217;ve tried to make this as simple as possible. I&#8217;ve used output buffer to remove the huge variables also known as browser and network (yes, they&#8217;re really a bottleneck here, trust me or try comparing firefox with lynx). Testing was done in the 64bit version of Fedora 9 (which is a linux distro) with PHP 5.2.6 on an AMD Athlon 64 X2 4600 (dual core CPU running at standard 2.4GHz).</p>
<p>Running this gives me an average of 0.333 seconds for echo and 0.346 seconds for print. Doing some calculation you&#8217;ll see that the average difference between each echo and print is 13ns or 0.000000013 seconds. Even though this is a 3.8% difference, it&#8217;s not even a millionth of a second, but 13 billionth! Hardly any difference if you ask me. I know, it&#8217;s just testing with a single, silly punctuation mark. To prevent people from going &#8220;There must be a bigger difference with a longer string!&#8221;, I ran the test with the first paragraph of lorem ipsum.</p>
<blockquote><p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nunc luctus arcu vehicula est. Donec facilisis iaculis magna. Mauris neque dui, varius in, fermentum id, scelerisque et, massa. Nam ac velit nec odio molestie pellentesque. Nulla dolor mauris, tempus ultrices, cursus in, ultrices sollicitudin, orci. Sed at ligula. Sed id erat id nisl molestie tempus. Vestibulum nibh dolor, vulputate nec, dictum non, sollicitudin nec, nibh. Sed vitae diam eget felis dignissim tempor. Aenean vel risus. Integer consectetuer nibh. Ut eu nunc. Donec at sapien.</p>
</blockquote>
<p>This is 551 characters, which should satisfy most people. Results? 0.987 seconds for echo and 1.014 seconds for print, meaning echo is 2.6% faster. Again, hardly any difference as this is 27ns or 0.000000027 seconds. Conclusion: Myth busted! You&#8217;ll have to do a ridicules amount of outputing to make any actual difference.</p>
<p>PS: If you run this on your own computer, I would love to hear about your results (including type of CPU, OS and PHP version). Remember this though: Please bar in mind that you&#8217;ll have to disable any CPU-throttling and run it in separate files to get accurate results.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.varslashlog.com/2008/11/05/busting-echo-is-faster-than-print/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Premature optimization is bad</title>
		<link>http://www.varslashlog.com/2008/11/01/premature-optimization-is-bad/</link>
		<comments>http://www.varslashlog.com/2008/11/01/premature-optimization-is-bad/#comments</comments>
		<pubDate>Sat, 01 Nov 2008 00:45:15 +0000</pubDate>
		<dc:creator>AHSauge</dc:creator>
				<category><![CDATA[Optimization]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[premature optimization]]></category>

		<guid isPermaLink="false">http://www.varslashlog.com/?p=76</guid>
		<description><![CDATA[Donald Knuth once wrote that &#8220;premature optimization is the root of all evil&#8221;. I have to agree with him on that. One thing I&#8217;ve been dying to write about is the massive amount of crappy PHP optimization tips found on the net. Two examples of such is Reinhold Weber&#8217;s 40 Tips for optimizing your php [...]]]></description>
			<content:encoded><![CDATA[<p>Donald Knuth once wrote that &#8220;premature optimization is the root of all evil&#8221;. I have to agree with him on that. One thing I&#8217;ve been dying to write about is the massive amount of crappy PHP optimization tips found on the net. Two examples of such is Reinhold Weber&#8217;s <a title="40 Tips for optimizing your php code" href="http://reinholdweber.com/?p=3" target="_blank">40 Tips for optimizing your php code</a> and chazzuka&#8217;s <a href="http://www.chazzuka.com/blog/?p=163" target="_blank">63+ best practice to optimize PHP code performances</a>. They <em>do</em> both have some good tips here and there, but in general this is just stupid tips, or more precise, premature optimization. There are reasons why such tips are bad.<span id="more-76"></span></p>
<p>First of all, most of these tips are undocumented and presented as obvious facts. This makes them hard to disprove or confirm as the reality is a bit different and more complicated than what these people tend to think. A good example of such a clam is &#8220;a foreach-loop is faster than a for-loop&#8221; or vica versa. I&#8217;ve tested this a couple of times, and though I get a consistent result in favor of for-loop, just changing the platform from Linux to Windows seems to reverse the result. This displays a major problem. Even the slightest difference in OS, software version, hardware etc. can change the results significantly. This is never mentioned by people presenting optimization &#8220;tips&#8221;.</p>
<p>Second, even though the result is correct, it&#8217;s often pointless to actually make the change because you&#8217;ll never actually save considerable amount of time. Let&#8217;s look at the first point in both my links.</p>
<blockquote><p>If a method can be static, declare it static. Speed improvement is by a factor of 4.</p></blockquote>
<p>Sounds good, doesn&#8217;t it? Speed improvements by a factor of 4! That must be significant, right? No, not necessarily. It can very well be an improvement by a factor of 4 (even though my testing seem to show about equal speed), but that&#8217;s only as significant as the time it initially takes. If one execution takes, say 1µs, saving 0.75µs isn&#8217;t that significant (µs = microsecond = 10^-6 s = 0.000001) considering the function itself might using 100µs on each execution. This is very often the case when these people present their &#8220;tips&#8221; if they&#8217;re actually correct on what&#8217;s the faster part.</p>
<p>Third, with the previous stuff in mind. Even if the clam is correct and even if there is some performance to be gained, is it worth it? Not necessarily. There are several things to keep in mind here. Primarily optimization degrades readability, and there&#8217;s also the issue of someone actually have to make these changes. Is it really that worth it if you have to spend an hour changing alot of code just to have a cumulative saving of about 1ms? I don&#8217;t think so &#8230;</p>
<p>The final nail in the coffin for these &#8220;tips&#8221; is the fact that it&#8217;s never ever tested with any load. Load is important as that&#8217;s the reality. It doesn&#8217;t matter if solution A is x times faster than solution B saving y amount of seconds, if solution A hits a bottleneck when it&#8217;s run in 10 parallel requests making it much slower than solution B under the same conditions. Here&#8217;s where real optimization come in handy. Caching stuff, using a PHP accelerator, having a thought through algorithm, actually knowing what&#8217;s the perfomance hit and so on is always going to beat these type of lists hands down anytime. That&#8217;s the reality of optimization, not whether or not you use single or double quotes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.varslashlog.com/2008/11/01/premature-optimization-is-bad/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>PHP gets a new namespace separator: \</title>
		<link>http://www.varslashlog.com/2008/10/27/php-gets-a-new-namespace-separator/</link>
		<comments>http://www.varslashlog.com/2008/10/27/php-gets-a-new-namespace-separator/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 23:20:24 +0000</pubDate>
		<dc:creator>AHSauge</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[backslash]]></category>
		<category><![CDATA[namespace]]></category>
		<category><![CDATA[separator]]></category>

		<guid isPermaLink="false">http://www.varslashlog.com/?p=22</guid>
		<description><![CDATA[No, I&#8217;m not joking. As of last Saturday, the development team of PHP decided that backslash will be the new namespace separator in PHP, replacing :: which has been used up until now. The reason for this is quite alarming to be honest. A bug.
Because PHP basically fail to distinct between for instance foo::bar(); as [...]]]></description>
			<content:encoded><![CDATA[<p>No, I&#8217;m not joking. As of last Saturday, <a href="http://news.php.net/php.internals/41374" target="_blank">the development team of PHP decided</a> that backslash will be the new namespace separator in PHP, replacing :: which has been used up until now. The reason for this is quite alarming to be honest. <em>A bug</em>.<span id="more-22"></span></p>
<p>Because PHP basically fail to distinct between for instance foo::bar(); as in namespace foo with function bar vs. class foo with static function bar they&#8217;ve just decided to scrap the whole :: thing and instead use backslash. I can&#8217;t tell you how stupid I think this really is. Instead of actually fixing the problem, faulty look-up, they&#8217;ve taken the very easy way out. This just adds to the inconsistencies for PHP. There are several programing languages that I know of that don&#8217;t have these problems, and none of them have namespace separation in this way. Actually, I can&#8217;t even remember ever seeing any language using backslash for anything other than escaping, but that might just have something to do with other languages being <em>consistent</em>. This quote from a <a title="Comment on slashdot about the new namespace separator" href="http://developers.slashdot.org/comments.pl?sid=1008291&amp;cid=25522773" target="_blank">comment on slashdot</a> really says it all:</p>
<blockquote><p>Java:<br />
Attribute/Method access: foo.bar<br />
Static method access:    Foo.bar<br />
Package access:          foo.bar.baz</p>
<p>C#:<br />
Attribute/Method access: foo.bar<br />
Static method access:    Foo.bar<br />
Namespace access:        foo.bar.baz</p>
<p>Python:<br />
Attribute/Method access: foo.bar<br />
Static method access:    Foo.bar<br />
Module access:           foo.bar.baz</p>
<p>PHP:<br />
Attribute/Method access: $foo-&gt;bar<br />
Static method access:    Foo::bar<br />
Namespace access:        foo\bar\baz</p></blockquote>
<p>Now I&#8217;m not quite done here. If you look at the <a title="Request for Comments: Namespace Separators" href="http://wiki.php.net/rfc/namespaceseparator" target="_blank">RFC at PHP about the change</a> they list &#8220;IDE compatibility&#8221; as criterion for the new namespace separator. This is just plain stupid. Who are you designing a language for, humans or IDEs? I would say it&#8217;s for humans, but it seems the PHP devs thinks otherwise &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.varslashlog.com/2008/10/27/php-gets-a-new-namespace-separator/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Why PHP and Unicode/UTF-8 is a bad combination</title>
		<link>http://www.varslashlog.com/2008/10/27/why-php-and-unicode-is-a-bad-combination/</link>
		<comments>http://www.varslashlog.com/2008/10/27/why-php-and-unicode-is-a-bad-combination/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 00:09:46 +0000</pubDate>
		<dc:creator>AHSauge</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[bugs]]></category>
		<category><![CDATA[mbstring]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[UTF-8]]></category>

		<guid isPermaLink="false">http://www.varslashlog.com/?p=6</guid>
		<description><![CDATA[One thing that really annoy me these days is the unrestricted enthusiasm for PHP and Unicode, primarily in the form of UTF-8. These people seem to think that Unicode is some fantastic thing they just have to use even though a single byte charset like ISO-8859-1 is more than sufficient for their need. Don&#8217;t get [...]]]></description>
			<content:encoded><![CDATA[<p>One thing that really annoy me these days is the unrestricted enthusiasm for PHP and Unicode, primarily in the form of UTF-8. These people seem to think that Unicode is some fantastic thing they just have to use even though a single byte charset like ISO-8859-1 is more than sufficient for their need. Don&#8217;t get me wrong, Unicode is good and everything, but not when you&#8217;re using PHP. Why? Because PHP by itself does <em>not</em> support Unicode. Now I know alot of people will object and say that there is mbstring. If you&#8217;re one of them, please read this carefully: mbstring is a Swiss cheese. Now I know alot of people is not using mbstring and basicly is using Unicode (most likely UTF-8) blissfully unaware of it&#8217;s dangers. So before I get into why mbstring is a horrible piece of coding, I&#8217;d like to explain why you shouldn&#8217;t use Unicode without some measures.<span id="more-6"></span></p>
<p>First of all, you shouldn&#8217;t be using a foreign charset or encoding without knowing it&#8217;s risk. Unicode requires a mulibyte encoding, which has it&#8217;s own set of problems compared to singlebyte charsets like ISO-8859-1 and simliar. As I&#8217;m only really familiar with UTF-8, I&#8217;m going to use that as an example. I could proably write a book about the problems, but here&#8217;s a small list:</p>
<ul>
<li>BOM (U+FEFF) and &#8220;reverse&#8221;-BOM (U+FFFE). This is not really a problem in UTF-8 as it doesn&#8217;t have the issues with little and big endian (<a title="Wikipedia article explaining endianness" href="http://en.wikipedia.org/wiki/Endian" target="_blank">wikipedia about endianness</a>), but if you convert to for instance UTF-16, having a &#8220;reverse&#8221;-BOM at the start of your document would be a disaster.</li>
<li>Str-functions aren&#8217;t multibyte aware and might &#8220;break&#8221; your strings.</li>
<li>Non-shortest form. There&#8217;s a bit of math behind this one, which I&#8217;m not going to explain today, so you&#8217;ll just have to trust me on this one. Non-shortest form is an illegal representation of a Unicode character. It basicly is represented using more bytes than necessary and in UTF-8 this results in &#8216; , which in hex terms is 0&#215;27, could be represented by 0xC0A7. If those who developed for example the RDBMS you&#8217;re using is just as unknowing as alot of people out there, they will decode or recode this to UTF-16, UCS-2 or UCS-4 and get a single quote ( &#8216; ) which might just result in a SQL-injection &#8230;</li>
<li>Surrogates. In UTF-8 surrogates (U+D800-U+DFFE) are not allowed, and is actually regared as a potential security risk. Allowing these will result in a � or a similar &#8220;this is an illegal byte&#8221;-character being displayed.</li>
<li>Illegal combination of bytes resulting in a � or a similar &#8220;this is an illegal byte&#8221;-character being displayed.</li>
</ul>
<p>In addition to those above, there is also the possibility of security problems when you&#8217;re actually escaping input with functions not aware of multibyte charsets (<a href="http://shiflett.org/blog/2005/dec/google-xss-example">here&#8217;s an example</a>). With all this in mind I can tell you why mbstring is such a bad piece of coding.</p>
<p>First of all it doesn&#8217;t have a complete replacement for every str-function. We&#8217;re missing *sort, *trim, strcasecmp, str_ireplace, ucfirst, ucwords and wordwrap just to mention some. The second thing is that mbstring is writen by a Japanese or something, which is quite obvious considering strcasecmp is missing. &#8230; and no, using mb_strtolower and strcasecmp is not a valid way as the process requires either simple or full case folding which has some differences from the &#8220;to lower&#8221;-process. Anyway, mbstring is writen more or less to enable Japanese, Chinese and similar languages to work. The thing is, these languages don&#8217;t have case as most other languages which explains the lack of the above.</p>
<p>Third, there is some bugs in mbstring which result in invalid validaton. Here are som examples:</p>
<ul>
<li>Anything behind a null-byte isn&#8217;t validated</li>
<li>Surrogates are allowed</li>
<li>&#8220;Reverse&#8221;-BOM is allowed</li>
<li>Title case isn&#8217;t working</li>
</ul>
<p>Fourth and final, the overloading abilites make the str-functions behave different from the original str-functions. These are undocumented differences, possibly breaking applications expecting str-functionality. For instance, if 2. paramter in strrchr is an int, is converted to a character. In mb_strrchr however, this will give an error.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.varslashlog.com/2008/10/27/why-php-and-unicode-is-a-bad-combination/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
