Code comments and execution time

aamonkey · March 9, 2011, 8:04pm

deathshadow60:

Ok, I think I’m seeing where you guys are having the disconnect, as I wasn’t referring to JUST his comments with the fail hard part – as I explained in the PM rather than drag the original topic pre-split further OT.

I just snipped out the copy of his code I was responding to. The comments were just PART of the problem.

If you bother to READ the rest of the post you just linked to, you’ll see it’s not just commenting on the comments, but the constant opening/closing of php which I always consider sloppy (probably because it’s PHP’s the only language I’ve used in three decades to even ALLOW that type of coding) … and then there’s all the stuff I pointed out about the code after.

The entire code has a bunch of small issues, that IMHO add up to /FAIL/. From the comments that are confusing/don’t add anything useful, to the difficult to follow logic flow caused by all the <?php ?>, to the always true/false if statements, to the use of PHP to output static CSS.

BUT YOU GUYS KEEP HARPING LIKE the small easy to do just code it that way from the start difference of using meaningful comments and stripping out useless ones is the be-all end-all of the topic; OR that it’s difficult to bother doing in the first place.

WHISKEY TANGO FOXTROT.

Not doing something that is easy, can be done from the start… just because… ok, why exactly? You can’t be bothered? LAME.

The irony here is that I have found all your comments in this thread meaningless. Could you please remove them? They are slowing down the server

samanime · March 9, 2011, 8:21pm

Let’s calm down everyone. This topic isn’t about deathshadow’s comments, it’s about comments and code execution time.

While I disagree with a good deal of what deathshadow claims, I do see where he is coming from and it does make sense.

It seems most of us priorities readability and understandability, while he priorities execution efficiency. Neither are bad places to be, and neither of them are wrong. Let’s stop the personal attacks and discuss the topic at hand.

Michael_Morris1 · March 9, 2011, 8:27pm

I had a perfect score on the reading comprehension section of the ACT, as well as the English and Social Studies sections (interestingly, math was my weakest score and I program - go figure). I can assure you there’s nothing wrong with my reading skills.

cheesedude · March 9, 2011, 8:49pm

Yeah, by maybe 2/1000ths of a second.

I use PHP’s alternate syntax exclusively when making templates. My sites execute very fast.

The slowest part of doing anything in PHP is file access. Even slower still is database queries (which accesses tables in the file system). Compiling and executing PHP code is very fast, even with lots of comments and even with “dropping out” of the compiler by using PHP’s alternate syntax with straight PHP code.

Besides, how do you know how the PHP compiler executes behind the scenes to be able to say the alternate syntax slows things down by any significant amount?

Now I see where you are coming from. SMF is one of the biggest piles of crap ever written in PHP. Template code echoed out from here, there, and everywhere. It’s plain awful. Difficult to modify. One big pile of spaghetti code. Mods open up existing files and dump ugly code right in the middle of SMF’s ugly code.

echo 'SMF really ';
if (!empty($sucks))
{
echo ‘sucks!’;
}

Write some code for us and I’ll rewrite it using PHP’s alternate syntax and with tons of comments. Then we’ll do a little comparison and see how much faster yours is.

You want to see some slow code? Look at Wordpress 3.1. That thing has 76 separate files included on every pageview using the default template. Including a few things that are absolutely not needed.

WordPress › Support » 77 Files Included Per Page View! Insane!

If you want to speed up a site, limit the number of files included. File access is magnitudes slower than executing code with or without comments and alternate syntax. PHP can execute thousands of lines of code in the time it takes to seek and open one file from the hard drive.

samanime · March 9, 2011, 8:52pm

Oh, something I just realized of note.

If you keep all of your HTML inside of PHP, you generally have to build lots of large strings which take up memory (or you echo a billion times). If you have low amounts of memory, you may not be able to handle all of that.

However, with the alternative syntax, the HTML only parts can be sent through directly, so you don’t have to store crazy amount of memory.

Why doesn’t someone put together an example and do some actual tests, instead of all this speculating? I would, but I’m away from my normal computer (business trip) and don’t have what I need to set up such a test.

cheesedude · March 9, 2011, 8:58pm

That’s why Facebook developed HipHop to transform PHP into C++ which is then executed.

HipHop for PHP: Move Fast - Facebook Developers

No interpreted language is going to execute as fast as something that is already compiled and memory resident.

system · March 9, 2011, 9:23pm

AGAIN speed aside, that has to be maintennence NIGHTMARE code-wise.

that’s true of anything on computers – Disk access remains the largest bottleneck… including the disk access time of reading in that .php file.

Because I’ve written lexical analysis programs, compilers and interpreters giving me a decent clue how they work? It might not be a significant amount, BUT AGAIN if it provides ANY benefit and is just a simple change in code writing behavior requiring no extra REAL effort to do it… why all the griping about it even being suggested?

Funny since I say the same thing about vBull, phpBB and myBB… and those are some of the BETTER ones. vBulletin 4 in particular being one such a train wreck it makes wordpress 2 look good. (gotta love that 170k of markup to deliver 16k of plain text, 6k of that plain text being content cloaking nonsense thanks to the use of non-semantic markup applied under the banner of semantics)

I wasn’t aware that self contained directories of theme.functionName.php files for each theme constituted “here, there and everywhere”… In fact I’m fairly certain that it makes it EASIER for skinners than having to dive into the code that handles database and data interactions to make changes… without completely neutering what the skinner can do using those stupid ‘template systems’ like what myBB has.

The clear definition of “get the data” and then “output the data” always seemed more efficient; certainly more so than building the output buffer prematurely.

It’s really how I like to build programs; Get the data – THEN show it. The get / show one line / get / show one line approach just makes things more complicated… especially if in the middle of the data you up and decide you want to output a HEADER.

See why in my own CMS the theme_header() function starts out by “foreach $data[‘headers’] as $header” to call header() before any markup is started.

I build all the page contents up into an array, then I output the results releasing the array values from memory with unset as I’m done with them. (since I prefer to do my own garbage collection as the output buffer builds… since I’m in mod_deflate from the start of output)

Given that the files are clearly named, have a clear and logical directory structure, have prefixes to tell you what directory they might be contained in when back-referencing, and the output routines ONLY handle output and the data handling only handles data handling, I fail to see where you are getting the “here, there, everywhere” part.

AND AGAIN sure, the speed benefit might be miniscule, but it’s there – and if it means less code, simpler code, and more legible code, and less typing, WHERE’S THE PROBLEM?!?

Now on that we can agree… turdpress is an unmitigated disaster under the hood; it’s up to what? 400+ separate .php files when overall it has maybe 30 actual operations it performs? IF THAT? In a lot of ways you poke your head into the code, and it’s like the people writing WP are deathly afraid of functions AND objects; almost more like a giant old-fashioned batch file or basic program written without even using GoSub. Actually, no, it’s more like the old CHAIN command used in 8 and 16 bit memory spaces to let one program halt itself, unload it’s code from memory while leaving it’s variable data intact to load and run some other code.

WP’s idea of a function often seems to be to just make another file… then we wonder why it won the Pwnie for M4ss 0wnage a year or two ago.

ABSOLUTELY.

samanime · March 9, 2011, 9:39pm

I actually find breaking my PHP up and intermixing it with HTML to be easier to maintain then if it’s all together.

The reason is because I first get all of my information into one area, then output it in pieces. It’s kind of an adhoc MVC model, where the HTML outputting would be the View.

I think it’s all up to personal preference. =p

Paul_Wilkins · March 9, 2011, 9:45pm

deathshadow60:

Go futher down his code, and you have comments like this:
header("Content-type: text/css;charset:UTF-8"); http://www.barelyfitz.com/projects/csscolor/
What on earth does that URL have to do with sending content-type?!? Pointless meaningless comment.

Ooh, I know! I’ve sadly seen that stuff before. That’s a link that’s unrelated to the charset itself. Instead, he wants to leave a reminder to that site so that he doesn’t forget to go and visit it later on. It’s a poor use of a comment.

deathshadow60:

if(0){include'style.class_colors.php';} http://www.typefolly.com/
Again, huh?!? Meaningless comment that makes no sense. Of course, this is the OTHER thing I took it to PM over; the condition is always false so this code would NEVER run; why is this line even there?

Because he doesn’t want to forget about that line, but he doesn’t want it to run either. What should actually be done is to remove the line completely, and add the comment to his todo list.

Because he wants to match up with the if(0) statements. You’re dealing with someone there who doesn’t have very much mustard on his sandwich.

Those are the colors that he wants to use in his color scheme.

One of the few beneficial uses of using an explicitly stated true condition, is when you need to perform one-and-a-half loops:


while (TRUE) {
    $row = mysql_fetch_assoc($result);
    if ($row === FALSE) {
        break;
    }
    ...
}

However, the following can be more understandable, even though linting programs can dislike it:


while ($row = mysql_fetch_assoc($result)) {
    ...
}

Agreed :tup:

aamonkey · March 9, 2011, 9:49pm

That actually seems like a low number of includes for something as big as WP.

I don’t know that the number of includes is as important as the size of the includes - for example, is including a hundred 10k files really slower than including one 1,000k file? I doubt it, and if you are trying to create a maintainable and organized application (instead of just something that runs as fast as possible) it would be ridiculous to even attempt that. Hell, I’m working on a relatively small application right now that has over 150 includes per page.

At any rate, all these supposed performance issues can easily be fixed by installing APC.

Lemon_Juice · March 9, 2011, 10:01pm

I’m afraid it really is.

for example, is including a hundred 10k files really slower than including one 1,000k file?

I think it’s a LOT slower. I haven’t done tests myself but I’ve read of people gaining massive speed-ups by combining many small include files into one big chunk. PHP code parsing is lightning fast compared to file access. But as you said, with a good opcode cache the problem might not exist at all. It also depends on the efficiency of file caching by the OS - when I start a php site on localhost that has lots of includes I have to wait a few seconds hearing disk thrashing but after that the files are grabbed from a memory cache and the slowness almost disappears.

system · March 9, 2011, 10:02pm

Depends on the transfer sizes actually; Though with the numbers you used – yeah, unlikely…

But if you’re talking a hundred 100 byte files vs. a 10k file, then sure it might be enough of a difference – 100 bytes is smaller than the sector size on most filesystems – meaning that’s a hundred separate sectors read as opposed to 20 sectors. (gets worse the larger the sector size – you try that on a 2048 bytes per sector drive…)

You also have the overhead of requests – It’s not just about the time to transfer the file, you have the head seek to the file system to find the filename’s information and starting sector, the write to say the file is opened by someone, the seek to the position on the disk of the file ALL before you even start receiving or sending data…

It’s the same phenomenon as handshaking for HTTP or worse, FTP…

Gah, FTP… Upload a hundred 10k files, then a single 1000K file… Guess which one runs about a hundred times faster. THANK BE that hard disk access isn’t as inefficient. Apart from the sector round-up it’s nowhere NEAR as bad.

Happens on HTTP too – each separate file request past the first few (depending on the number of simultaneous connects) can add real-world anywhere from 150ms to a full second and a half depending on ping-time, connection latency, distance from the server, etc… See why image recombination techniques are so good and using 20 separate .js files with a bloated library for two or three goofy animations are such total trash. (especially if requesting the file from a separate server given the delay on the second connect)

IF you allocate enough memory to it, IF the program you are trying to run is already in it’s cache…

Which is why it’s not a 100% solution, but it’s a damned fine one. In fact, the performance difference is so big it shows WHY calling a file from the disk and parsing it is where the bottleneck really is – since when it does work those are removed from the equation.

But that often comes down to are you on your own dedicated, a properly configured shared, or a dime a dozen stuff as many people as possible into a shared – the further down the scale you go the less effective APC can be as the cache doesn’t keep up with requests – especially if the host isn’t tuning it properly.

Which is why as a developer it’s often not a good idea to assume it’s even going to be present/available/properly configured – though praise the stars if it is.

Michael_Morris1 · March 9, 2011, 10:10pm

Better option - use APC_CACHE.

Conflating files needlessly makes for hard to manage code on a large project.

aamonkey · March 9, 2011, 10:15pm

so out of curiosity I did a quick timed test:

I included 1000 22kb files in a loop
I placed all the contents of the 1000 files into one 20mb+ file and included it.

The single include ran about .07 seconds faster with no APC. Not something I’m likely to worry about anytime soon, as that is a pretty minimal time difference for all the added benefits.

TomB · March 9, 2011, 10:20pm

Was it the same 22kb file 1000 times? It’s possible PHP does an internal cache so it doesn’t need to do a full read again.

Michael_Morris1 · March 9, 2011, 10:26pm

It does. Even without APC cache the PHP engine only compiles bytecode for an include or require once per page request. The cache gives the engine a mechanism to store the bytecode for later page requests.

aamonkey · March 9, 2011, 10:28pm

no, it was 1000 different files - each was a 750 line file of functions - I duplicated the files by loading the original, appending each function name with the iterator #, and saving it as a separately named file.

Michael_Morris1 · March 9, 2011, 10:44pm

.07 seconds doesn’t sound like much until you multiply it by 1000 user requests / second - then you have a problem.

Anything that takes over a hundredth of a second is worth looking into improving, but apc cache is the answer, not file collation.

Lemon_Juice · March 9, 2011, 11:15pm

But you didn’t tell us the exact time each of the test took - then we would have a better idea about relative difference.

It’s possible PHP does an internal cache so it doesn’t need to do a full read again.

I don’t think PHP caches separate include files but every modern OS does so in the test there was actually no physical disk access. The .07 seconds was spent seeking those files by the OS in the memory. If you measured the time on first page load after system restart I believe the difference would be huge.

system · March 10, 2011, 12:01am

drop those to 256 bytes apiece and try it. Make each file a non exponant of two nearest their file size (21.25k) and try it.