HTML Logo by World Wide Web Consortium (www.w3.org). Click to learn more about our commitment to accessibility and standards.

Moving forward with Composr

ocPortal has been relaunched as Composr CMS, which is now in beta. ocPortal 9 will be superseded by Composr 10.

Head over to compo.sr for our new site, and to our migration roadmap. Existing ocPortal member accounts have been mirrored.


Type-strictness in PHP7

Type-strictness in PHP7 Over the last week or so, the PHP community has been on fire with a big controversy, whether or not to allow strict-typing into the language. In the words of one of our developers this evening: "damn, things are getting heated".

The RFC is up for voting, and requires a 2/3 majority to pass. It has been achieving this for most of the last few days.

https://wiki.php.net/rfc/scalar_type_hints

It's a split roughly between the old-hands, and most other developers.

I'm very much pro-strict-typing, and have tried to be a cheerleader for it.

For the last 7 years or so, I've developed on a custom fork of PHP with strict-typing enabled. It's not quite the same as the RFC, as it is controlled via an ini_set option, which we toggle on/off as we dive into and out of third-party code that doesn't support it, but largely it is the same concept.

This is coupled with our PHP static analyser (our "code quality checker"), which derives type information from phpdoc comments (including a commented version of the main PHP library) and enforces correctness across it all.

I feel it's time for us to make the case for static typing. I don't have a lot of time to spend writing this up, but I'll do my best.

$security=$correctness && other_stuff(); $integrity=$correctness && more_other_stuff()

PHP is renowned as a language for prevailing sloppiness. Ironically I consider myself a moderate, because I get annoyed by people who over-engineer just as much as I get annoyed by people who write messy code.

However, I just do not comprehend how a programmer can seriously think it is "okay" to have data types being converted around all over the place. When you write some code you are trying to meet a tightly-defined algorithm, and it doesn't take a super-pedantic theoretical mathematician to know that having sensible constraints on inputs and outputs is a part of any properly implemented software system. It's not enough for something to just "work" in normal cases – while it is masochistic to have a language where ensuring it works in all cases is a matter of very careful tracing, or manually constraining all the possible side-effects of unconstrained input parameters.

Let's put it another way, non-strict PHP gives the programmer 2 choices:
  1. Pray to God
  2. Make the programmer as smart as God

And in the real world this means people are either going to be very irresponsible and go for '1', or probably incompetent and try for '2'.

When I make these arguments, I'm not being an extremist. There is no magic bullet. There is no will to impose a complex development process. There is just a programmer, who doesn't have much time, and a desire to get something done with minimal surprises. A language which brings incorrect assumptions to the surface rather than burying them is a safer language. It won't be a perfect language, it won't save the programmer from many mistakes they might make – but it's better.

Some people seem to think it is onerous to make sure that type conversions are explicit. I can say from experience it is not. If you look at ocPortal's source code you will not see type conversions going on in a particularly excessive way, yet it is 100% type-strict and has been for years.

Type-strictness provides a programmer guarantees, removes risk, and it makes a code-base more intuitive (less magical). It removes cognitive burden, it doesn't add it. At least, it removes it from those somewhat-responsible programmers who think that just praying to God is not a reasonable approach.

Security

I'm not just going to make some vague suggestions that type-strict is more secure. I'm also going to repeat that there is no magic bullet. But it helps, and every little bit helps.

Let's consider this code, called with ?files[]=a.txt&files[]=b.txt:

PHP code

$operation_id=uniqid();
$backup_stub='backups/'.$operation_id;
mkdir($backup_stub0777true);

foreach (
$_GET['files'] as $i => $file) {
    if (!
preg_match('#^\w+\.txt$#'$file)) { // Don't allow processing non-conformant file names
        
exit('Security error');
    }

    
// Take backup
    
copy('files/'.$file$backup_stub.'/'.$i);

    
// Delete file
    
unlink('files/'.$file);
}


The hypothetical programmer making this tried to make it secure (assume that the user authorised to enter this script is allowed to manage files under 'files'). He/she tried to be as smart as God.

However, there is a huge security hole in the above code that type-strictness would have avoided. I copied all the files pegged for deletion into an operation-specific backup folder, based on their sequential order in the operation. Don't ask why I did it that way, assume there's some reason (it's hard to concoct good succinct examples when under pressure). $i is a numeric index. Except, no, it's not. PHP arrays and hash maps are the same thing, and I could call like ?files[../../config.php]=a.txt.

Whoops, the hacker just wiped your app config.

Bad assumptions about input parameters are a lot easier to make if PHP is doing hidden magical conversions for you.

(I am assuming the RFC does do type checking on operators, if not I'll put on a red face – but I could have come up with an example where the issue was in call arguments too)

We've spotted real security holes in ocPortal through type-strictness that we might have instead missed.

Integrity

Let's say you have a function taking a few parameters, and for a good reason you add a new parameter in the middle somewhere. You fix all the calls, but then you merge in a different feature branch which was not updated. Of course, unit tests should find the problem, but we're not in a perfect world. Having strict typing will very commonly find mismatched parameters, because the types don't line up correctly.

Workflow

Just being able to scan your code for type-consistency before you run it is just a useful part of a development flow. Having strict-typing in the language empowers tool-makers to build on it.

Consider this code:

PHP code

for ($x=0;$x<$count;$x++) {
    
// ...

    
$x=array_diff($a,$b);

    
// ...
}


It's really handy to have tools that find out these kind of mismatches in advance of debugging through your code. It saves a lot of time up-front.

Moving towards greater complexity

When PHP was originally conceived, it was a simple alternative to writing CGI-scripts. CGI-scripts typically were things like hit counters, or other relatively primitive systems.

We're now in 2015, and expectations are nowhere where they were in the past. Programmer cognitive overhead is much higher, and trading the need for a bit more professional care, for the ability to be able to have constraints maintained across thousands of interfaces, is a very worthy trade in my opinion.

I don't wish to knock the people who brought us the super-efficient language that us PHP, but based on the discussions held, I just don't think there's a mindset that is quite appropriate for certain regards of modern coding.

Is Facebook creating hacklang for the fun of it, or because the latest generation of programmers know that programming with some basic constraints is an important part of building out complex systems?

Choice

Another aspect to this whole debate is choice. The RFC proposes giving programmers a choice. A lot of people wanting only weak typing do not want PHP to have this diversity of choice, fearing that a broader language will put off newbie developers, or force developers to convert to strict-typing so that they look like professionals. Those may be valid points, but when a huge army of developers want something, I just think it is totally wrong for people to stand in the way. The benefits out-weigh the risk, and on a basic philosophical level, innovation and leadership should trump anyone looking to keep things in the past. Andrea has made the effort and provided us an implementation, taking a lot of time, a lot of heat, and a lot of risk – so I think she has earned the right to be a thought leader if she can win the popular backing. It's not a matter of a few n00bs coming along to make some uneducated case for how the language should automatically escape input data for use in MySQL queries (sarcasm).

No reasonable programmer is going to insist everyone do strict-typing. When I code quick and dirty scripts out I don't bother running them in the strict mode of our PHP fork.

Perfection

The RFC is a compromise, and everyone knows that. The syntax is not perfect, and there's an attempt to meet the needs of a diverse group.

However, I don't think it is nearly as bad as some people are making out. Any deficiencies are very minor, and already Andrea has been progressing conversations about improving things before PHP7 is finalised.

Particular areas of discussion involve supporting a general 'number' psuedo-type and also changing the syntax of how strict-typing is enabled.

I find two things about this discussion distasteful:
  1. Pedants. Pedants are always annoying. You find them everywhere. It is better we move forward than get into a never-ending argument about how tiny deficiencies should be resolved. People have been trying to get this through for 10 years, it's time to get it done.
  2. Hypocrisy. PHP is the language that gave us 'mysql_real_escape_string', 'magic quotes', 'run-time magic quotes', inconsistent naming, flip-flops about array operators, and a whole lost version. Are we really so bothered about whether 'declare' is an ugly word to use?



That's it :). I've tried to get my thoughts across. I very rarely engage directly in discussions regarding the PHP language. I don't feel I've earned anything in the internals community, but I feel strongly about this subject.

View all

Trackbacks

There have been no trackbacks yet

Edited