PHP
Article

18 Critical Oversights in Web Development

By George Fekete

Oversights

Over the past years I had the opportunity to work on some interesting projects, complex in nature with an ongoing development, constantly upgrading, refactoring and adding new features to them.

This article will cover the biggest coding oversights most PHP developers make, when dealing with medium and large projects. Oversights such as not differentiating between development environments or not implementing caching and backup.

The examples below are in PHP, but the idea behind each problem is generic.

The root of these problems lies mainly in developers’ knowledge and experience, especially the lack of it. I’m not trying to bash anybody, I do not consider myself the perfect developer who knows everything, so bear with me.

In my experience we could categorize these problems in three main groups: design level, application level and database level oversights. We’ll break down each one separately.

Application Level Oversights

Developing with error reporting off

Error Reporting

The only question I can ask is: why? Why do you not turn error reporting on when developing an application?

PHP has many levels of error reporting, and it ALL should be turned on in the development phase.

If you think errors will never occur, you are coding for the ideal scenario, which only happens in an ideal world.

Error reporting and displaying those errors are not the same either. error_reporting() sets the level of errors (e.g. notice, warnings, fatal errors) and display_errors controls whether these errors will be outputted or not.

Error reporting should always be at the highest setting in development: error_reporting(E_ALL); and ini_set('display_errors', true);

Note: E_ALL is the highest since PHP 5.4+, because E_STRICT errors became part of E_ALL in PHP 5.4. If you use an older PHP version than 5.4 use error_reporting(E_ALL | E_STRICT); to include strict error warnings too.

Suppressing errors

Suppressing errors using the @ operator is even worse than not turning it on at all, because you’re consciously sweeping dirt under the carpet. You know the error is happening, you just want to hide it, close the task and go home early. What you don’t realize is that building something on a shaky foundation will have much bigger consequences later on.

You can read an in-depth explanation on this here.

No logging anywhere in the code

Developing a project has to happen with logging in mind from the start. You can’t just bolt on logging at the end.

Most of the developers do use logging one way or another, but almost none of them take the time to actually verify those logs for errors. What’s the point of logging if nobody looks at the logs?

PSR recommendations do exist for logging, PSR-3 to be exact, and this excellent article explains how to implement PSR-3 logging.

Not implementing caching

Caching can be done in many different ways on multiple levels in an application, such as on a server level, application level, database level, etc.

Caching, too, should be implemented from the start. You can always disable it in development, but make sure everything works once it’s pushed to a production environment.

On a server level you can use Varnish, which is a reverse HTTP proxy, it stores files in memory and it should be installed in front of the web server.

To speed up PHP, you can install/enable an opcode cache, which optimizes the compilation into byte code of PHP scripts. For PHP 5.5 and later an opcode cache is already compiled in the core called OpCache.

You can read about it in-depth in this article: SitePoint PHP – Undestanding OpCache.

Before PHP 5.5, you could use APC, which has user cache functionality too.

On an application level, you can use APCu which is the user cache extracted from APC, Yet Another Cache which has similar functionality as APCu, or Memcached which is a distributed caching system and it has solid PHP support. Memcached can also be used to cache database queries.

There are a couple of techniques when implementing caching in an application. A good practice is to cache data which doesn’t change very often, but is queried repeatedly.

Cache database queries heavily, because the database is always the biggest bottleneck in every PHP application.

Disregarding best practices and design patterns

How many times did you see someone implement his own password encryption algorithm? Sadly, this still happens today, because the lack of knowledge or more dangerously, because of an “I know it better” attitude.

Well, I hate to bring you the bad news, but 99% of the time you don’t know it better.

These best practices and design patterns were thought of and created for a reason by software engineers way smarter than you and me, the developer’s sole job is to pick the right pattern for the job.

There are many books and resources on this subject. I’ll mention two:

  1. Patterns of Enterprise Application Architecture by Martin Fowler
  2. PHP Objects, Patterns, and Practice by Matt Zandstra

Not using automated tests

Tests should be added for every feature of the web application, but tests are good for nothing, just like logs, if nobody is looking at them and actually running the test code to see if something breaks.

Running tests manually is a tiresome process. Fortunately, there “is an app tool for that”. In fact, there are lots of tools that can help automate your tests, a whole practice called Continuous Integration.

One such tool that widely used in the PHP community is called Jenkins, which is a CI server and can do a lot more than just test an application. Sebastian Bergmann created an excellent template for Jenkins specifically constructed to work with PHP projects.

If you find this too overwhelming, then at least write unit tests for your application using PHPUnit, Behat or PHPSpec. It may seem a lot of work at first, but it’s proven countless times that tests are helping projects in the long run.

Not reviewing / auditing code

No Review

Working in a team can be challenging, especially if every team member is used to different styles of programming, and without good specification a project can go sideways real fast.

If you’re in a team and not inspecting each others’ code, you should really do it. Just like unit tests, it helps a project stay clean and consistent.

The difference between review and audit is the time when you inspect the code. Review usually happens before any code is merged to the code base and audit after the code is merged in.

Review is a much much better thing to do, because you have the opportunity to talk about the code, suggest improvements or fixes before it gets merged with the other team members’ code.

The disadvantage of reviews is that it’s blocking development, because before every merge (after all tests are green) at least two developers need to discuss the code, and this is where audit comes into play.

Audit happens post merge, and it’s non-blocking, but it’s significantly less powerful, because it misses the opportunity of catching bugs early on.

Audit is still better than not inspecting code at all.

To help this process go as smooth as possible, you can use the tool called Phabricator, which was created specifically for this purpose by the good engineers at Facebook. It supports both code inspection strategies.

Coding for the ideal scenario

Ever find yourself in or heard about cases where some insignificant, boilerplate code was merged in and all hell broke loose? I sure did.

Most of the time this happens because developers are lazy and write code for the ideal scenario, where database fails, PHP fatal errors and server hacking are non-existent.

Code should be written with the exact opposite scenario in mind, developers should write code for the worst possible scenario that they can think of, and even then the code won’t cover some obscure corner case where the user types in a $ sign and has instant full administrator access.

Assuming that your server won’t be hacked or your code won’t break at some point and your database will always be up and running is just wrong. Production code should cover these scenarios and log errors accordingly.

In PHP, it is so easy to commit errors without even realizing it. This is mainly because of poor language design decisions that were made in the past and not corrected in time.

PHP wants to make it easy for developers not to think about security, encodings and corner cases, where in fact developers should be very aware of this and always practice defensive programming.

Not using OOP principles correctly

Most PHP developers new to PHP are not using object oriented programming in their code, because the concept is a little bit hard to grasp at first. OOP was first used in the 1960s and constantly refined over the years, there is a ton of information about it on the Web.

Also, OOP is a lot more than just procedural code organized in classes.

The concept of objects, properties, methods, inheritance, encapsulation, etc. are all an integral part of OOP.

A developer who uses these principles correctly knows about OO design patterns, SOLID principles (Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion) and how to write clean code in general, code that is flexible, doesn’t have hard coded dependencies and is easy to extend and build upon.

Alejandro Gervasio covers these principles from top to bottom.

It’s never too late to learn about OOP and start writing clean code which doesn’t rely on hard dependencies (looking at you, PHP frameworks).

“On-the-fly” coding

What most developers do when they get yelled at “Quick, the client needs this feature up and running ASAP.”, is to hack some code together and push it directly to the live server. This is called on-the-fly coding or cowboy coding.

As in every other industry, in software development too, workflows and sane processes should be implemented in order for a project to succeed.

PHP and dynamic languages in general encourage rapid changes to the codebase, seeing the results of the modification instantly, but these changes should be limited in a production environment.

Only critical bugs should be committed and pushed directly to the production server. For the rest, a workflow should be implemented such as Github’s fork and pull request, or Gitflow. More on workflows using Git can be found here: https://www.atlassian.com/git/workflows.

Managers and clients who think these processes are unnecessary should be educated to see otherwise. I’ve never once met a client who couldn’t wait a couple of hours or a day for a little feature to go through the necessary steps in order to be deployed live.

One other thing to note, don’t confuse Continuous Delivery with cowboy coding and chaotic management. Continous delivery is exactly about implementing and optimizing the development workflow so code can be deployed to the production environment as soon as reasonably possible.

Database Level Oversights

programming magic

Not differentiating between read / write queries

To support a long running complex project, scaling needs to be in the back of every developer’s mind. 99% of the time a web application doesn’t need to scale, because it won’t reach that kind of traffic.

If you know for sure that the web application will be used by many, such as an enterprise application used by hundreds of employees internally in the company, you can make the necessary steps to enable easier scaling for the project.

So why separate read / write queries?

The database is always the first bottleneck in every application. It will be the first one to fail under huge traffic. To offload traffic to multiple database servers, developers use either Master – Slave or Master – Master replication. Master – Slave is the more popular one, which says that every SELECT statement needs to be routed to the Slave database server(s), and other ones to the Master in order to balance traffic.

If your application doesn’t know the separation between read and write queries it won’t know to which database server to connect.

Keep this in mind if you know that eventually you will need to setup a Master – Slave replication scheme.

Only coding for one database connection

This strongly relates to the above oversight, but sometimes developers can have other reasons to connect to multiple databases. For example, if you keep user logs, activity streams, analytics or other data where you know the read/write operations happen often, it’s good to offload this traffic to a different database server.

Make sure you use a database library which allows you to connect to multiple database servers and it’s easy to switch between them. A good solution is to implement PDO and use Aura.SQL which extends PDO.

Not testing queries for exploits

This oversight relates to the “coding for the ideal scenario” oversight above. Same thing, different platform.

If you don’t test your database (and your application) for exploits, some hacker will, and he may succeed.

Databases are vulnerable to a whole range of exploits, the most common is SQL injection attacks.

Use this cheat sheet and run the queries through your application’s database access library. Write these statements in fields on your front-end like username, password fields on a sign up page.

If none of the queries go through, you can buy yourself a beer and celebrate.

Not adding indexes to tables

Indexes are like the TOC of a table, it’s a performance boost and should be added to every table, to the columns on which the query is performed (e.g. the columns after the WHERE clause).

There’s a whole theory behind database indexes, when to create it, on which columns and what to cover. A whole separate article series was written about that.

Not using transactions

Data integrity is very important for web applications. Whole websites could break if data is handled incorrectly.

You use transactions for related data that is handled together, either persisted or deleted together.

For example, you save data about a user such as: e-mail, username password in table 1, and profile data like first name, last name, gender age, etc. in table 2.

Now if a user wants to delete his account, this should be one operation regarding running the SQL query, using transactions. If you don’t use transactions, you risk loosing data integrity, because the operations on the data are running separately.

If deleting the data from table 1 succeeds, but fails on table 2, the profile data for the user will remain in the database and worse it won’t be connected to anything, it will be orphaned.

By using transactions this won’t happen, because the whole operation will succeed only if all the separate operations (e.g. deleting data from table 1 and table 2) in the transaction succeed, otherwise the database will roll back to the previous state.

Not securing sensitive data

no security

Storing passwords in plain text, or rolling your own encryption algorithm in 2014 is unacceptable. The PHP community has matured enough to know better by now.

Still, there are, probably, thousands of databases out there where sensitive data is stored unencrypted begging to be stolen by hackers.

PHP 5.5 has already added strong hashing functions just for this, simply calling it Password Hashing. It’s really simple to use – you create a hash from the plain text password with this method:

$hash = password_hash( $password, PASSWORD_BCRYPT );

Note: There’s no need to salt this password, because it is already handled for you.

Store $hash in the database, then you verify the hash with this method:

if ( password_verify( $password, $hash ) ) { ... }

Note: If you don’t have PHP 5.5 (you really should by now), you can use the password_compat library, which implements the exact same methods.

Handling financial data is much trickier, because you need to have PCI compliance on server, application and database levels. A more in-depth article is already written on the subject here: SitePoint PHP – PCI Compliance and the PHP Developer.

Application Design Oversights

Not differentiating between development environments

I saw many developers and even small teams setting up poor development environments for themselves.

For example, working on a new feature or fixing a bug and FTPing the files directly on the live website. This is wrong on so many levels.

There is an infinite number of workflows that teams can create, but the classical one for web development is to use at least three environments: development, staging and production.

A development environment can be local for each programmer, staging and production are usually remote and share some parts between them. Development is for coding, staging is for testing and finally production is for consumption.

The oversight happens when these environments are not set up the same way. For example each developer running a different version of PHP, or staging configuration differs from production.

Guess what happens? You’re right. Everything will be working in development and even in staging, and when you push it to the production server all hell breaks loose resulting in long nights and lots of caffeine.

No wonder the most common phrase in development circles is: “It works for me.”

So what’s the solution? Make sure everything is set up the same way in EVERY environment. The operating system should be the same, PHP, database, web server, all should have the same version across the environments.

Since the creation of Vagrant, Docker and VirtualBox it is very easy now to create identical environments with the same exact configuration on each one. If you haven’t used these tools before, you should stop whatever you’re doing and start using them immediately.

No backup

no backup

Everything is going well, the website is live, launched on time, everything is up and running, users consume the beautiful data. Nom, nom, nom… Until you receive an e-mail at 3AM.

Backup, just like logging, caching, security and defensive programming should be an integral part when developing a web application, but most developers (or sysadmins) forget to do this.

Backups should be automated as well, or if that’s not possible, at least a weekly manual backup should do. Any backup is better than no backup.

Store your code base in version control and use a distributed version control system like Git or Mercurial. This setup makes code bases very redundant, because every developer who’s working on the project has a version of the code base. Likewise, store the code base on Github or Bitbucket, they have backups.

Backing up the database is more important, because it’s user created content. ALWAYS store the actual data and the backup in different places.

Not backing up data can ruin businesses, and it will do that – see the famous case of Ma.gnolia, one of the better social bookmarking websites back in the day. Wired has a cover story on the whole disaster.

No monitoring

no monitoring

“Everything’s amazing and nobody’s happy.” – Louis C.K.

You’re not happy, because you don’t know what’s going on. Implementing an intelligent monitoring framework for your application is really important. Monitoring answers the following questions:

  1. Did somebody access the main application server?
  2. Are the servers under heavy load?
  3. Do we need to scale to another database server?
  4. Where is the application failing?
  5. Is it offline or not working only for me?

It is important to know the answers to these questions at any given moment, and with real-time monitoring, you will. To make this happen, tools like Nagios or New Relic should be part of your application’s infrastructure.


Conclusion

Use this knowledge to be a better programmer. Remember these oversights and try not to commit them. The application and database level oversights are the most important ones to remember.

Backup is very important, always practice defensive programming and be prepared for the worst, this is how web development works. Programming is hard, but when done right, a lot of fun.

Checklist

Below you’ll find a checklist of all the oversights found in this article. See how many can you cross off right now and always try to cross them all off.

  1. Is error reporting on and display errors on in development and off in production?
  2. Do not suppress errors in your code.
  3. Implement a logging framework.
  4. Use a caching strategy.
  5. Keep in mind and use programming design patterns and best practices.
  6. Use tests in your code and try to automate running these tests every time a change occurs in the code base.
  7. Review or at least audit team members’ code.
  8. Practice defensive programming.
  9. Learn and use OOP principles correctly.
  10. Have a solid workflow and processes for developing and deploying code.
  11. Differentiate between read / write database queries.
  12. Use a solid database library which can connect to multiple databases.
  13. Test SQL queries for exploits.
  14. Learn and use indexes on database tables
  15. Use database transactions.
  16. Secure sensitive data in the database.
  17. Use different coding environments: development, staging, production.
  18. Implement a backup and monitoring strategy.
  • Bruno Cassol

    Very well written article. This is obligatory reading for any aspiring web developer.

  • Bryan Latten

    Set error_reporting to -1 (instead of a specific level constant), you’ll be set even when new levels are created

    • George Fekete

      Cool trick!

  • Wesam Alalem

    That is a great checklist we have here. and punch of new tools to discover. thank you George for the amazing article. keep the good job :)

    • George Fekete

      Your welcome, thank you for reading and learning what I have learned the hard way :)

  • http://jeditux.wordpress.com/ Fernando Basso

    Awesome article. The links for other readings are also very useful, since they all relate to the subject. Congrats are in order.

  • http://jelmerschreuder.nl Jelmer Schreuder

    For the SQL exploits bit, you might want to add: always use prepared queries, there’s no excuse not to and you won’t have to worry about injection anymore unless you still put input directly into a query. Which one should never ever do and because of prepared queries doesn’t have to.

  • Saiful Bahri

    Nice checklist. Should be applicable for any coding stuffs, not just web development. Thanks for this article. ;)

    • George Fekete

      In general most of the oversights can be applicable to other programming languages too.

      I’m glad you like it!

  • Oliver Kastler

    Re your point “Caching, too, should be implemented from the start.”.
    I disagree, there are some forms of caching that happen on the lower level (e.g. opcode caching) that won’t do any harm, but otherwise what you’re doing is premature optimization. You should add the caching once the business logic is in place, tested and working. Then you can speed it up.
    But if you try to achieve both at the same time, implement logic and make it work fast you’ll end up cutting corners.
    The first objective should be to implement it in the best way, and as there’s usually some refactoring happening during the initial implementation you’ll end up fixing your caches again and again.

    • Michael

      @Oliver Kastler: I also agree with your disagreement! Putting in caching from the start is a premature optimization, a version of YAGNI. One should first write clean code, then measure where the bottlenecks are and cache where necessary. There are probably less parts of the app that need caching than you think and it would be a waste of time and harder to maintain if everything is cached.

      • Aaron Saray

        I agree with both of you on this too. Write great code, and great database. Then introduce caching.

        I think a good middle of the road solution would be something like: write code with a cache scaffolding, but do not enable the cache mechanism inside of it. That is to say, I’d like the calls to my ‘cache’ to be in my code, but maybe always return ‘not cached’ for my first round of tests. Then, I can introduce various different caching mechanisms without having to alter my core code again. (Then again, if you run tests without caching, and then enable it, doesn’t your code execution path change? Ah man!)

        Anyway, for the sake of discussion, I disagree with introducing cache right away – it really becomes a crutch way too often.

    • Radek Dvořák

      I would agree with you in the sense that code should not be frantically wrapped in caching conditions. On the other hand I think applications should be designed for caching from the very beginning. It is good to know which subsystems can be cached and in which manner. Besides opcache, which is transparent to the application, there may be several other caches for which the application needs to be appropriately structured.

  • http://cneude-createur-web.com Matthieu

    Very good article, I think i’ll read it many times !

    However I think coding for the worst scenario is not a solution : it can multiply tests in the code, sometimes for no reason at all and create a big mess. Testing everything is not really good in my point of view.

    • George Fekete

      You should strive for 100% test coverage, all the time, and in my book it shouldn’t be any exception from this rule.

  • Christian Snodgrass

    This is a very good list. The only one I have a slight problem with is #1, “Is error reporting on and display errors on in development and off in production?”. You should definitely have full-blown error reporting on, monitoring all levels. However, I don’t think you need to use display errors. And in fact, since it can break many things itself (sending headers too soon, breaking layouts, etc), it’s best instead of displaying them you let them go to the log file. Then you can just keep the log file “tail -f”-ed to watch the errors come through.

  • Dhhdjd

    All good points, though I would say be careful using transactions. It’s easy to “abuse” transactions in place of proper application-level error handling. I’ve seen developers get too carried away with transactions and lock up the database (as the transaction puts a write lock on affected tables for some write queries)

    I would ask “what is the use case for a transaction”. Surprisingly, I can think of very few instances I’ve seen them used where they were actually needed. I would avoid using them blindly – not that I think you are advocating this – and be very cautious about using them.

  • Ryan

    A good article though what you said about Master-Slave is slightly misleading.

    Having master-slave means that any writes to the master are replicated to the slave and the slave is read only. That’s it.

    It is up to the implementer to decide how to route the SELECT’S, though writes will always go to the master. The master can still process reads.

    Everything else said seems spot-on.

  • Олег Абражаев

    I’m implemented PSR-3 module for ZF2

    https://github.com/seyfer/ZendPsrLogger

    pull requests welcome

  • Bill

    Awesome article. One that I’ll need to come back repeatedly. Thanks for the cheatsheet.

  • Roy

    ‘Disregarding best practices and design patterns’ is poorly written. How about actually explaining what best practices and design patterns are instead of patronising the reader and relying on an appeal to authority?

  • ptdorf

    I’d say there’s a big one missing from that list: profiling your app.

    Learn where the bottlenecks are to optimize them first.

  • http://yapf.blogspot.nl/ vinny42

    “Cache database queries heavily, because the database is always the biggest bottleneck in every PHP application.”

    Why do PHP users keep repeating this nonsense?

    And why am I wasting my breath again trying to educate them….

  • Pritesh Jain

    You summed up all my concerns in a nice manner. Have been trying to make all these points relevant for others, now i can just share this post as reference.

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in PHP, once a week, for free.