As we build up to the final article (see this thread for details), I’ll be asking questions in separate threads, in order to get feedback from the Sitepoint PHP community on some of the issues mentioned in the article PHP: a fractal of bad design. I’ll be doing one or two a day.
Today, I’d like to take on the “predictableness” of PHP as a programming language.
Eevee started off by saying this:
A language must be predictable. It’s a medium for expressing human ideas and having a computer execute them, so it’s critical that a human’s understanding of a program actually be correct.
No arguing with that. But, he added later that…
PHP is full of surprises: mysql_real_escape_string, E_ALL
As we all know, the mysql extension is deprecated. If you go to the manual looking for this function, you get in a nice big red box with this warning:
So, no argument there anymore. Not unless mysqli_real_escape_string() has the same issues. What would that be?
And how about E_ALL? What is wrong with this? In Eevee’s article, he mentions E_STRICT being included in 5.4. Was the missing strict warnings the only issue?
What else might make PHP unpredictable in your opinion? Is it unpredictable at all? Or, is it predictable enough? If you could wish for an improvement in PHP to make it more predictable, what would it be?
Maybe off topic regarding error_reporting(…). PHP 7 introduced a new declare(strict_types=1); declaration which is ONLY file wide. I think it would have been better if it also applied to included and/or required files.
Agreed. I sincerely believe we’ll be seeing the ability to enforce strictness extended over time, as it is a necessary path to head towards a functional JIT compiler.
Talking about strictness of types, I still wouldn’t call that or rather PHP’s dynamic typing (I like that term better than weak typing), an issue with predictability. There are clear rules for type juggling and coercion and using dynamic typing is maybe a tool that should be less used, but it also allows for some degree of flexibility. It is definitely something a PHP developer should be aware of. So, I wouldn’t call weak typing an issue with PHP’s predictability.
Thank you for asking. And especially for asking in separate threads, as, though I too value my time to enter a holywar thread, I’d be happy to answer a particular programming question.
See, MRES case is very exemplary, as it clearly demonstrates the fact that most of the blames for the language can be explained by a mere blamer’s own ignorance or cheating.
So, to clarify: there are no issues with mysql_real_escape_string, save for the improper use. Poor mysql ext has been deprecated not because of some essential drawbacks of this function, but out of laziness of the support team: they just didn’t want to support two extensions, leaving only new one. That is all.
So, neither mysqli counterpart suffers from any drawbacks. Save, again, for the improper use. But this blame have to be directed to the users and - I have to admit - to the manual, that is indeed promoting bad practices. But by no means these blames can be applied to the language itself.
Jumbling SQL and data together in the one statement when commands are available that can keep them separate.
You only ever need an escaping function when data can be misinterpreted as code. By not putting both in the same statement you do away with the need to escape the data at all.
My assumption is that mysql_real_escape_string continues to exist for the moment because many migrating from mysql_ to mysqli_ haven’t yet taken advantage of the ‘new’ features introduced in July 2004 that allow the data to be kept completely separate from the code.
Yes, I just thought myself that I should have posted the rules as well. So, here it goes:
This function have to be used to escape string literals only, to overcome the delusion shared by generations of PHP users: that this function magically makes whatever data “safe”.
Encoding have to be set through mysqli::set_charset(), to overcome the issue with peculiar encodings.
By no means the use of this function should be manually applied in the application code. Instead, it have to be used by some behind-the-scenes process only. Which will guarantee that all the proper formatting will be applied regardless of any possible human error.
Although this latter point means that the only justified usage of this function would be only as a part of placeholder processor, but it doesn’t affect the main point - there is nothing wrong with the function itself: it just does the job it is intended for.
As to mysql_escape_string I think he meant something different:
Warts like mysql_real_escape_string, even though it has the same arguments as the broken mysql_escape_string, just because it’s part of the MySQL C API.
These two functions have actually different argument signatures: mysql_escape_string vs mysql_real_escape_string. The former is “broken” in that it doesn’t accept the connection link and therefore cannot take the character set into account and lessens security in certain scenarios. Then it is true that mysql_escape_string is broken and the existence of the two functions can be confusing to programmers.
Historically, PHP has been a quick and dirty language and is full of such confusing traps. But it is also evolving rapidly in a good direction so now most of those bad ideas are now deprecated or removed. Even at the time the article was written mysql_escape_string was far on the way out and the reported problem was outdated. Currently, this is nothing but history.
But the article is right that the PHP page on SQL injection needs updating - at least there should be proper code examples of good practices instead of using the half-arsed sprintf!
The data should be known to be valid before a call to any escape function.
The purpose of an escape function is to convert anything in the data that has a special meaning in the code to a value that has the equivalent meaning for data for the particular output that you are giving the data to.
Escaping is therefore the very last step before outputting data jumbled with code as the type of escape to apply is dependent on the code you are jumbling it with.
It certainly doesn’t make data safe - you should already know that long before your data gets anywhere near an escape function.
The mysqli_real_escape_string function could also with being deprecated then removed as the mysqli_* extension has got the ability to use prepared statements which are a much safer way of preventing SQL Injection (once any user submitted data has been validated). I personally use prepared statements even if there’s no user submitted data involved, so that if I do change where some data comes from I don’t accidentally create a security hole
You are the one getting confused. I am arguing against the large number of people who think it is validation.
This function has NOTHING to do with validation as the data it receives should already BE valid. Its purpose is to escape valid data that would otherwise be misunderstood as a part of the code it is jumbled with.
I am attempting to show all those who mistakenly think that it is validation that they are WRONG.
Validate user input.
Escape outputs where the data will be jumbled with code.
As a side note, just going with that example, my personal preference (for a live site) is to have the MySQL user that is used by PHP to access a MySQL to just be able to access just the tables and/or fields required, so that in the event that something slips passed the validation and somehow gets passed prepared statements, any attempt at doing something like dropping or truncating tables wouldn’t work as the MySQL user won’t have the permission to do such an action
Personally I find this approach highly inconvenient and at the same time rather ineffective:
in general, all the tables have to be accessible, it’s hard to keep track of all the permissions.
at the same time it doesn’t prevent an injection but only can soften the effect
I prefer not to fight particular injection exploits, as it’s a sort of blacklist-like approach with all its drawbacks: you can overcome 100 kinds of exploits but suffer from 101th. Say, to me, injection that involved only SELECT query is no less disastrous than that notorious DROP TABLE case.
Instead of softening the effect of a successful injection, I prefer to prevent the injection itself.
This is something I also think is smart. If there is a hole, then only the data readable by the “restricted” user can be seen and thus, if you have much more sensitive data, it isn’t also compromised. An example of this would be having one database user for the “front-end” application connection and another user, for the back-end. You would have a lot more security to access the back-end (hopefully) and also a lot more trust in the users you allow into that area of the software. It isn’t hard to keep track of permissions and users either. This is along the lines of security through “need to know”. https://en.wikipedia.org/wiki/Need_to_know Whereas, this is more like “need to do”.