- It's easy to learn
- It's very efficient to code for
- It's widely available
- It has a large amount of useful built-in functionality
- It has a huge ecosystem of available libraries.
The criticism PHP receives generally falls into 5 areas:
- Criticism of mistakes long-corrected (e.g. safe mode, register_globals, and mysql_real_escape_string)
- Messiness in the language (e.g. inconsistent function naming)
- Genuine problems with the language and its standard library
- Real-world inconsistencies when deploying on different servers
- General pedantry from Java-lovers
In this blog post I will bring up some suggestions for fixing all the major problems in 3 & 4. I've never been interested in language pedantry, and if someone gets confused by inconsistency then a good IDE will help them well enough.
SecurityPHP has had a bad reputation for security because it generally leaves things to programmers to get right for every individual case in their code, while also encouraging newbie coders to dive right in. The unfortunate result is a lot of very bad code.
It may well be that we can never get newbie coders to write perfectly secure code, but we can add some advanced mechanisms that help experienced coders, or help frameworks better assist the newbies.
String taintingOne thing we implemented for our own custom PHP fork many years ago, to great success, is a very simple string tainting system. Strings start off as "not escaped", and then we can mark them escaped in certain core functions within the PHP library (e.g. htmlentities), or within our own functions.
Then when we output those strings, we emit an error if the string is not marked as escaped.
This is something we only do during development, where performance doesn't matter. There's no need for constant ongoing scanning.
Of course it's a limit more complex than I've described it. We need to propagate the marking correctly across concatenations (the result string is only escaped if both concatenated strings are escaped). And, it's never a perfect system – there are still obscure corner cases where insecurities could leak through. And, there are various different escaping contexts that we should not conflate. But, perfection is the enemy of good, and this is a good system that has proved itself for us.
PHP could build on the idea with a more sophisticated system using bitmasks.
Imagine being able to mark a string as escaped for a number of potential different situations, and then being able to run tests to ensure any particular taint was present.
One use case would be for SQL queries, already covered in this RFC:
Even stricter typingI contributed to the argument about strict typing for PHP7, which we won (after much drama). However, we can do better.
Doing consistent strict typing is a good way to spot unexpected bugs in your code, including security bugs.
A great improvement would for PHP to have a register_type_coercion function. We'd pass it a callback that would be called for any instance of type coercion in the engine, including with the use of operators. We could then do cool things inside that function.
For example, a senior engineer doing a security audit may register such a callback and log the back-traces for them all, so that each case can be corrected, or at least they could be scanned for bugs (manually, or via some kind of lint tool that spots known problematic patterns).
Safer HTTP headersThe PHP header function should be able to take array as first parameter, e.g.
This stops potential "header split" vulnerabilities by abstracting away the notion that programmers should code direct to the HTTP syntax.
As a general rule, PHP should always encourage a clean abstraction around syntaxes, to prevent vulnerabilities when translating between languages. Even if modern PHP avoids header split vulnerabilities by some other way (I didn't check), it should maintain abstractions as a basic principle so that people think about things in a better way.
Saner cryptographyThe PHP crypto is frankly a mess.
Let's decide for each of the following use cases, what the canonical way should be:
- Generating random strings
- Generating random numbers
- Checking passwords against hashed copies
- Generating secure hashes
- Generating checksums (possibly recommend to consider using hashes instead though)
For each canonical way, we would:
- Support it regardless of build settings
- Support it regardless of platform
- Support it in a compatibility library, for all non-EOL versions of PHP
- Make the API as easy as is possible, while maintaining security
All non-canonical functions would consistently be redirected to the canonical ones (crypt function, openssl functions, sha1, md5, rand, mt_rand, uniqid, …).
Maybe everything should be moved into a new crypto extension (non-optional of course), just to clearly delineate this stuff.
Due to the proliferation of different ways of doing things over the years, functions should be put on a deprecation track ASAP.
basenameAllow the suffix parameter to be passed just as boolean true, for the file extension(s) to be stripped off.
strposThere are 2 major problems with the core strpos function:
a) The needle is essentially a delimiter, but unlike most other functions it, comes in parameter position #2 rather than position #1.
b) It is very easy to confuse "substring not present" with "substring exists at offset 0". You need to use !==false to do it right.
I propose an incredibly simple solution to both these problems: adding a new function, in_str.
boolean in_str($delimiter, $string)
This serves the more common use case for strpos in a way totally consistent with functions such as in_array.
One improvement to make to strpos though – add a new $end_pos = false parameter. If set to true, it adds the length of the delimiter string to the returned value. This is a common thing to want, often saving a line of code or copying and pasting the delimiter into a strlen call.
tempnamThis function should not need to be given parameters. The average programmer just wants the system temp dir to be used, and has no interest in any prefix. Why make the programmer think about these things?
Whats-more, it is not uncommon for webhosts to not allow writing to the defined system temp dir. The open-ended nature of this standard PHP functions gives them an excuse – they can just say, "yeah, you need to pick your own temp dir". Let's have things sane and standard by default so that we don't leave room for interpretation.
ini_getWe should guarantee that boolean ini settings will return from ini_get as either 0 or 1, never blank.
Right now it's not sane for a programmer to check the status of settings.
error_reportingRight now this function is plain confusing when trying to read the current error reporting status. I suggest being able to do:
file_get_contentsThis function should respect locking. It currently does not. Locking should be considered a basic feature that just works, not a luxury. Multiple people use websites at the same time you know.
explodeAllow explode to produce array() if the exploded string is empty. Currently it produces array('').
We would add a new flags parameter to handle it:
$parts = explode(',', '', null, EXPLODE_TO_EMPTY); // Produces array()
It might seem a simple thing the ternary syntax could handle, but it cleans up an ugly repetitive coding pattern.
umaskumask is an important function to make sure that system-written files are deletable by account users on a non-suEXEC server (maybe not the best approach, but users panic when they can't delete files).
However, we simply cannot use it because it is not thread-safe.
I suggest that for threaded PHP builds we simply reimplement it within the PHP filesystem implementation.
Right now the programmer needs to chmod manually throughout their code.
Simplify the function namespaceAs PHP grows, so does the function namespace. It's reached an unmanageable level.
People argue that there should not even be a function namespace, and it should all be object-orientated. You can agree or not. For what it's worth, I love the quick simplicity of global functions. However, my preference is irrelevant – it's not realistic for it to be changed at this point.
So, we should curate what we have better.
Let's start by dropping some unnecessary function aliases: show_source, doubleval, ini_alter, get_required_files, user_error, chop, diskfreespace, is_double, is_long, is_real, join, key_exists, strchr, pos, sizeof
There's really no good reason to have these apart from compatibility, but we could draw a line in the sand and just a ship a compat library as a stop-gap, for those who need them added back in.
Then there are some functions that are just plain excessive nowadays: str_rot13, get_browser, date_sun*, most of the Calendar functions
Let's remove them. If people want them, they can use a library.
Really obscure extensions should be kicked out of the main PHP manual, or at least buried. It just pollutes the mind-space of programmers to have things like the following: OpenAL, Radius, Cyrus, Trader, PS, Judy, Lua.
I mean, WTF?
die vs exitCurrently die is an alias of exit. Given that these are language constructs rather than true functions, we can't implement 'die' in a compatibility library.
However, there's a particular point of confusion with die/exit, that we could resolve by splitting these constructs out.
The exit function takes either an exit status code, or a message to output. When debugging it is common to exit with the value of a variable, but if that variable is a number then you won't see it in the output, because it's used as the exit status code rather than a message. This is confusing and annoying. Therefore it would be good to make it so that the 'die' language construct always outputs the parameter given, while the 'exit' language construct behaves as it currently does.
Cross platform consistency
Function libraryThere are some functions that are not cross platform but I can't see any good reason for them not to be.
The first is getallheaders. Could we not just have that implemented for all the SAPIs. If there are too many old SAPIs to update, dump them – there is a lot of historic cruft to prune out there anyway.
The other is sys_getloadavg, which doesn't work on Windows. There's an obvious reason, but surely it would not be too hard to implement something that is equivalent enough for common use.
Line endingsIt's important to be able to have consistent text across the web, considering that there are users providing text originating from different platforms.
Add a simple new fix_line_endings function:
fix_line_endings($str[, $convention = OS_UNIX]);
Also add a php.ini setting to allow this to automatically apply for request parameters.
Domain namesIt should be trivial to get the domain name for the current PHP request. Right now it is conflated with the request port.
Yes, I know HTTP_HOST comes from the CGI spec, but let's move on please.
I suggest we have HTTP_HOST_DOMAIN, which is the equivalent of stripping any port number from HTTP_HOST.
$_SERVEROn some IIS servers (I'm not sure if this is still the case), $_ENV is populated instead of $_SERVER. The SAPI layer should guarantee $_SERVER to be populated as expected.
LocaleLocale in PHP is currently a mess. It is not per-thread. Surely this can be fixed at the PHP layer somehow?
As a separate issue, app localisation using locales can potentially completely mess up basic number formatting for use across standard formats/interfaces/protocols/APIs. It should be possible for programmers to declare that locale is not used during type coercions and/or for use of strval. Instead, number_format would become the place where number locale was applied. This puts control back in the hands of those experienced programmers that choose to be careful.
Inconsistencies on web hostingPHP is a very flexible environment, which in most cases is a bigger problem than it is actually useful.
It would be great if the PHP project could have a "base line" environment that a webhost meets. It could be done without too much trouble by having a github-hosted "setting checker" tool that people could contribute to. A host would be granted the right to use a trademarked "Quality PHP hosting" logo only if their hosting passed the benchmark (without configuration changes), and they continued to reassess it on a bi-yearly basis (declaring when they last did in the alt-text for their use of the logo).
The benchmark would check:
- The memory_limit setting is no lower than the standard default
- The host hasn't done something to hack in a fake internal memory limit (some hosts do unfortunately)
- The max_execution_time setting is no lower than the standard default
- Only certain white-listed functions may be included within the disable_functions setting (e.g. shell_exec and the like). The particular functions that we allow to be disabled would be highlighted within the PHP manual.
- post_max_size is no lower than the standard default
- upload_max_filesize is no lower than the standard default
- PHP sessions are configured and working
- The temp directory is writable
- PHP scripts, or shell processes, are not able to probe into directories of other users on shared servers
- The PHP mail function is working
- All the following $_SERVER variables work exactly as expected: SCRIPT_NAME, SCRIPT_FILENAME, REQUEST_URI, PHP_SELF, PATH_TRANSLATED
- php.ini will always be searched within the base directory of the web hosting
- The PHP version is not EOL
- Scripts can make HTTP calls both to self-hosted domain names, and external domain names, without getting blocked by any software firewall or routing problems
- None of the default PHP features that custom builds can disable are actually disabled (e.g. the ctype, filter, and XML extensions)
- mbstring.func_overload must not be set
- (also see my notes about Suhosin below)
This should go a long way to helping programmers writing Open Source PHP code.
We could consider having multiple base lines. For example, a second "pro" level would allow shell commands and SSH (or remote desktop) access.
UnicodeEnough with the problems around Unicode. We can move forward by just paving a few things over:
- Make either iconv or mbstring standard. Pick one, always include it. If necessary include it in the PHP source package in the same way that GD is.
- utf-8 is king, nothing else in Unicode matters. Prioritise easy conversion between utf-8 and ISO-8859-1 as core, nothing else unless the code already exists within PHP.
- Move utf8_decode and utf8_encode outside the XML extension into the string extension.
- Add a new function to fix up bad utf-8, to help programmers deal with bad data. Perform deletion of non-recognised code-points, while preserving the rest of the string. This will be incredibly handy when passing off data to other systems that may be very unforgiving with the dirty data PHP picks up.
It's not perfect, but we don't need perfect. We just need "good enough" and "consistent".
ExceptionsAllow a new "exceptions for everything" mode. If turned on, you don't need to ever check error functions, error values, or classic-style errors, just use try…catch.
Over time the error functions can then be removed, and we can take a parallel approach of having both exceptions and classic-style errors being exact equivalents of each other.
max_input_varsmax_input_vars is the new Safe Mode or register_globals – a setting that will mangle the behaviour of your code from one host to another.
It's especially bad if you have forms that may grow based on the amount of user data (e.g. a menu editor).
I would think there is no good reason for the setting at all, except for laziness. I recognise it comes from the DOS vulnerability, but can't we just detect if that hash population is taking too long and trigger an error for that specific situation?
If it is too problematic, at least let's have a new function to allow us to re-parse from within authenticated parts of admin areas:
reparse_superglobals($max_input_vars = null);
This way the programmer can code-in when they know that no DOS vulnerability could occur (due to a particular part of the system being private). This would cover the majority of cases where max_input_vars is currently problematic, as front-end code is already going to be heavily paginated.
array_peekWe have array_push, array_pop, array_unshift, array_shift, and can do $array, but there's no good way to read the last PHP array element.
Use the "end" function I hear you say? No, because it takes a reference so cannot be used with expressions. Plus it changes array iteration state which is an undesirable side-effect.
So we either implement array_peek, or similarly follow this RFC: https://wiki.php.net/rfc/array_key_first_last_index
I like array_peek because it's consistent with the existing naming, and only a single new function.
GDWhy does PHP still support ancient versions of GD that don't support gifs or PNG alpha blending? Time to move on – have a minimum GD version, and simplify the PHP manual.
SuhosinMost webhosts install Suhosin (or suPHP). It's time the PHP community makes a call on how to approach this. Either:
- Specifically recommend Suhosin, and include base line reasonable settings for the hosting benchmark
- Specifically discourage use of Suhosin, and reflect that in the hosting benchmark
- Merge Suhosin's functionality into mainline PHP
- Partially merge Suhosin's functionality, then discourage use of Suhosin in the hosting benchmark
Optional function parametersIt's beyond time we implemented named parameters:
Perhaps the sticking point is many people thinking you should just restrict the number of parameters to a reasonable number.
I don't agree. I'd often rather do a one-line function call than 20 lines of code setting each individual parameter into an object instance, before calling a method. Help me write good terse code, rather than awful terse code . Because I'm sure as hell not going to do it the Java way.
PHP excels at quick and terse code, let's build on that.
Cleanup of PHP manualHow about a wiki-style approach to the comments section? There's some ancient stuff in there that is often no longer even valid, so let users edit or delete comments, while keeping a revision history. Maybe only for people who have a certain amount of karma. Obviously include a system for mass-reverting edits by particular IPs.
Finally (dreaming)How about implement server push for HTTP/2?
How about being able to set autoboxing classes for all primitive types, including overriding custom type coercion?