SitePoint Sponsor

User Tag List

Results 1 to 19 of 19
  1. #1
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    To use file_exists() or not?

    Hi guys,

    We're having a debate over the pros and cons of using file_exists() to check whether an image exists or not, prior to deciding whether to show it or a holding image. Generally we agree that it's a good thing as it prevents image placeholders being displayed by accident, but on some pages that have a lot of images, especially ones with a lot of traffic, it hammers the server somewhat and introduces significant processing delays.

    The way that it's used in this case is basically to check whether a resized, cached version of an image exists or not. If not then the image will be processed and saved as necessary. It seems to be the only way to do this particular task but we still have these performance considerations, especially on large directories on busy sites.

    So I'm wondering if anyone has any thoughts and real-world solutions? Is file_exists() the best solution? is_file() instead? stream_resolve_include_path()? Just curious about what different people have chosen to do with these requirements.

    Thanks

  2. #2
    SitePoint Zealot
    Join Date
    Jun 2010
    Location
    Arizona
    Posts
    109
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    file_exists() is what I've always used. According to the PHP documentation, "Note: The results of this function are cached. See clearstatcache() for more details." The same is true of is_file(). You would have to look at the PHP source code to see which function does more work and what order that work is done in (my guess is that both are equally performant).

    The bigger problem is how you are processing the images. You are waiting until the last possible second to create a cached, resized the image. Traditional "lazy evaluation" doesn't work well in multi-user environments under load. This is because multiple users will make the same request for the same resource and then the system has to work twice as hard to resolve each request (i.e. both users run the image resizer code on the same image). And you will also start seeing weird scenarios that involve file locking.

    What you want to do instead is schedule the resize operation by storing finished images in one directory and working images in another. Then, scan the working directory periodically and resize and move images to the final location. Your script can check for the existence of a scheduled image and show a different image (e.g. "This image isn't available yet.").

    Ideally, you would create all the sizes of the image you will ever need during the upload process of the original image, but that isn't always possible.

    Just a few ideas to chew on.
    Thomas Hruska

    Single Sign-On Server/Client - The PHP login system that rocks.

  3. #3
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thanks for that. Different articles I've read (including comments on PHP.net) suggest that file_exists() is as much as half as fast as the other two, though these appear to be very simple benchmarks and don't take in to account a heavily loaded server, for example, or massive directories.

    I agree that this is a lazy way of doing it. The problem with doing it at upload is that a) not all images need to be resized to all sizes and b) if we add the requirement for another size at a later date then we'll not have the sizes that we need. Ideally we could do with a combination, I suppose, or schedule them to be resized as you say.

    My boss at a previous employer wrote a caching system (for HTML really) whereby he had a status file that was checked first. It had either a 0, 1 or 2 in it. I forget exactly which meant what, but one stated that a page was already cached, one stated that a new cached version was being generated (you wouldn't expect this to last for long) and another that the cached file had expired. If I hit an expired page then the status file would be changed to indicate that a new one was being generated but I'd still see the old file. If you then hit it while the cached file was still being generated then you'd be given the old file again but no further action would be taken. After the page was generated the status file would be updated to say that it was cached and the next person would see the new page. If you wanted to manually expire a page then you just changed the status.

    It prevented people ever having to wait for a new page to be generated (in some cases that was a god-send as some pages took forever to generate due to poorly designed database schema and massive data-sets) but it also meant that you might be looking at data that is no longer valid. Some files we regenerated over night to ensure that they were the up to date every day (like what the best selling products were yesterday, as that would be the same from midnight to midnight). Not sure if you can implement something like this for images though. In fact, it's going to be significantly more intensive than just checking if a file exists or not.

  4. #4
    SitePoint Mentor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,832
    Mentioned
    73 Post(s)
    Tagged
    6 Thread(s)
    Hi Antnee,

    Have you looked at page file caching your site? I believe there are lots of PHP options available, search and see if there are suitable options.

    If file_exists() should only be used once and results used to build the HTML page. The resultant HTML page should be cached and then used for subsequent requests.

    Cached pages can be deleted at any time and the php script used again to generate a new page which is cached once again...

    edit:
    Looks like you posted just as I was creating my response to your original post
    Learn how to be ready for The New Move to Discourse

    How to make Make Money Now with a *NEW* look

    Be sure to congratulate Patche on earning Member of the Month for July 2014

  5. #5
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    We can't cache whole pages, just sections of it, unfortunately. But yes, I agree and have done so in the past

  6. #6
    SitePoint Zealot
    Join Date
    Jun 2010
    Location
    Arizona
    Posts
    109
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Antnee View Post
    I agree that this is a lazy way of doing it. The problem with doing it at upload is that a) not all images need to be resized to all sizes and b) if we add the requirement for another size at a later date then we'll not have the sizes that we need. Ideally we could do with a combination, I suppose, or schedule them to be resized as you say.
    For the first issue: With hard drive space cheap as dirt, is this really an issue?

    For the second issue: When that happens, write a script that walks the directory and converts the originals to the new size. Only has to be done one time and can be done all in one go.

    Also, when working with large numbers of images, prefer GraphicsMagick over GD and ImageMagick. It takes advantage of multi-core hardware to pipeline the conversion and also has built-in features like "I'm going to be sizing this to WxH, so don't bother loading the whole image into RAM."
    Thomas Hruska

    Single Sign-On Server/Client - The PHP login system that rocks.

  7. #7
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Hard drive space is cheap, managed backup space is not I'm currently on a mission to find as many directories as possible to exclude from some of our backups as they already cost a lot and we're way over quota on some servers. We're also running a lot of sites on a small number of servers. Some sites get a lot of traffic, some get the odd person every few weeks. I'm reluctant to schedule anything as it'll probably end up doing a load of unnecessary work. Lots of considerations to make before making a start on anything... like how am I going to update 1200+ sites with no common codebase? :O

  8. #8
    SitePoint Wizard wonshikee's Avatar
    Join Date
    Jan 2007
    Posts
    1,223
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Antnee View Post
    Hi guys,

    We're having a debate over the pros and cons of using file_exists() to check whether an image exists or not, prior to deciding whether to show it or a holding image. Generally we agree that it's a good thing as it prevents image placeholders being displayed by accident, but on some pages that have a lot of images, especially ones with a lot of traffic, it hammers the server somewhat and introduces significant processing delays.

    The way that it's used in this case is basically to check whether a resized, cached version of an image exists or not. If not then the image will be processed and saved as necessary. It seems to be the only way to do this particular task but we still have these performance considerations, especially on large directories on busy sites.

    So I'm wondering if anyone has any thoughts and real-world solutions? Is file_exists() the best solution? is_file() instead? stream_resolve_include_path()? Just curious about what different people have chosen to do with these requirements.

    Thanks
    You could also consider using JS to check if an image exists, this would take the burden off your server.

  9. #9
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Really? Would doing it via JS not still be making additional checks to the filesystem?

  10. #10
    SitePoint Wizard wonshikee's Avatar
    Join Date
    Jan 2007
    Posts
    1,223
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Antnee View Post
    Really? Would doing it via JS not still be making additional checks to the filesystem?
    No, you would pass the image in PHP without checking. In JS, it would check to see if the image loaded, it provides an onerror event handler, from where you could replace it with the placeholder image.

  11. #11
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    63 Post(s)
    Tagged
    0 Thread(s)
    Is file_exists() really the source of "hammering" the server? I only ask because it sounds like the machine will have a lot of other things happening, including the actual image processing too, which would commonly be more of a problem.
    Salathe
    Software Developer and PHP Manual Author.

  12. #12
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    It's still a lot slower if the cached images are found than with file_exists() disabled

  13. #13
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,127
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Antnee View Post
    It's still a lot slower if the cached images are found than with file_exists() disabled
    What about using htaccess to detect broken image files and replacing them with a placeholder? Not sure if this is viable, but you would then be able to drop the file_exists and allow apache to do the work.

  14. #14
    SitePoint Mentor bronze trophy
    John_Betong's Avatar
    Join Date
    Aug 2005
    Location
    City of Angels
    Posts
    1,832
    Mentioned
    73 Post(s)
    Tagged
    6 Thread(s)
    @Antnee

    We're having a debate over the pros and cons of using file_exists() to check whether an image exists or not, prior to deciding whether to show it or a holding image. Generally we agree that it's a good thing as it prevents image placeholders being displayed by accident, but on some pages that have a lot of images, especially ones with a lot of traffic, it hammers the server somewhat and introduces significant processing delays.
    Can you supply a link to your site?

    It's still a lot slower if the cached images are found than with file_exists() disabled
    Have you tried using the file_exists() and not displaying the images. Does nt loading the images "still hammer the server" and slow the page rendering?

    Have you tried moving the images to a sub-domain? There are many helpful articles explaining ? I like this one:

    http://www.thehobbyblogger.com/don’t...own-your-blog/

    2. Host images on a subdomain

    Did you ever have a kiddie pool? Do you remember how long it took to fill it up with your one hose? What if you could’ve used your neighbor’s hose at the same time and filled up the pool twice as fast?

    Most web browsers allow only two to four “hoses” or connections from a single domain (like TheHobbyBlogger.com) to download content. So if you have a lot of images on your blog, your readers might have to wait for one or more images to load before they can begin reading your content.

    By creating a subdomain such as img.yourdomain.com to host your images, you effectively add a second set of hoses to fill your readers web browsers. This allows your blog content to load simultaneously with your images and speed up your page loads.
    Learn how to be ready for The New Move to Discourse

    How to make Make Money Now with a *NEW* look

    Be sure to congratulate Patche on earning Member of the Month for July 2014

  15. #15
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    @cpradio - that might not be a bad idea and would be worth investigating. All requests already go through an .htaccess to direct the requests to the right place so I suppose it should theoretically be possible. Have never done such a thing though so would require some additional learning. Never a bad thing though!

    @John_Betong - As I mentioned earlier, there are a huge number of sites on a small number of servers and it's the cumulative effect that is causing the problem. It's not a massive problem - the servers get by fine - but I just noticed that if we removed the file_exists() checks that performance increases even more.

    Ultimately this was a hypothetical question, just to back up a debate we were having, though I really appreciate the practical solutions, so thanks guys.

  16. #16
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,127
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    @Antnee,

    A quick search shows it should be possible. Example 1 and Example 2

  17. #17
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thanks Saved me some effort later

  18. #18
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,127
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    @Antnee, you could use it two ways (maybe even more).

    1) You could just always try and output the file requested, let Apache pick up on its broken, and send the REQUEST_URI to a new php file that outputs a placeholder and kicks off a background task that generates the necessary image. (ie: placeholder.php?requestedFile=%{REQUEST_URI}

    2) You can continue your image resizing the way it exists today, and let Apache pick up on any broken images after your request has been served to the user. (ie: using placeholder.jpg for your rewriterule).

    I'd personally got with option #1, but it gives you discussion opportunities in your team.

  19. #19
    SitePoint Guru
    Join Date
    Nov 2004
    Location
    England
    Posts
    698
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Indeed. I like option #1 too. Thanks


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •