SitePoint Sponsor

User Tag List

Results 1 to 23 of 23
  1. #1
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Lightbulb An introduction to caching with php

    Introduction

    In this article I will try to give a view of what is the custom caching with php, why and how we can use it.

    In the modern days, most of the sites are database driven. That means that your site is actually an application which retrieves data from a DBMS ( database managment system, eg MySQL) , parses the data and shows the result to the user. Most of these data are usually don't change frequently or don't change at all, and the reason that we use the database is that we can easilly update the site and the content.

    A problem that this process creates is the server overhead. Every time we execute a query in the database, the instance of our script will call the DBMS, and then the DBMS will send the results of the query. This is time consuming, and especcially for sites with heavy traffic is a real big problem.

    How we can solve this problem?

    There are two ways to solve this if you want to make your site faster. First is optimizing the queries, but we will not talk about this at the present article. The second and most valuable is using some kind of custom caching technique.

    Custom caching with php

    First let me explain the idea behind custom caching. When we have dynamic pages that their data is not updated frequently, we can use a 'system' that will be able to create the page, and then store it for later use. That means that after the page's creation, our application will not run the queries again in order to display the page, but it will show the cached one. Of course this system must be able to keep the cached pages for a time period that we will set.

    Let's code it

    Here is a simple class that will do the job. Let's see the code first :

    PHP Code:
    <?php
    class cache
    {
        var 
    $cache_dir './tmp/cache/';//This is the directory where the cache files will be stored;
        
    var $cache_time 1000;//How much time will keep the cache files in seconds.
        
        
    var $caching false;
        var 
    $file '';

        function 
    cache()
        {
            
    //Constructor of the class
            
    $this->file $this->cache_dir urlencode$_SERVER['REQUEST_URI'] );
            if ( 
    file_exists $this->file ) && ( fileatime $this->file ) + $this->cache_time ) > time() )
            {
                
    //Grab the cache:
                
    $handle fopen$this->file "r");
                do {
                    
    $data fread($handle8192);
                    if (
    strlen($data) == 0) {
                        break;
                    }
                    echo 
    $data;
                } while (
    true);
                
    fclose($handle);
                exit();
            }
            else
            {
                
    //create cache :
                
    $this->caching true;
                
    ob_start();
            }
        }
        
        function 
    close()
        {
            
    //You should have this at the end of each page
            
    if ( $this->caching )
            {
                
    //You were caching the contents so display them, and write the cache file
                
    $data ob_get_clean();
                echo 
    $data;
                
    $fp fopen$this->file 'w' );
                
    fwrite $fp $data );
                
    fclose $fp );
            }
        }
    }


    //Example :
    $ch = new cache();
    echo 
    date("D M j G:i:s T Y");
    $ch->close();
    ?>
    Now let me explain :

    function cache()

    This is the constructor function of the class. The job of this function is to check if there is a cached file for the page that we want, or it should create it. Here is how this is done :

    $this->file = $this->cache_dir . urlencode( $_SERVER['REQUEST_URI'] );

    This line creates the file name of our cached page. So the cached file will be something like /path/to/cache/dir/request_uri

    if ( file_exists ( $this->file ) && ( fileatime ( $this->file ) + $this->cache_time ) > time() )

    Here we check if there is a cached version of this page, and if the file must be recreated because it has expired. If the file is cached, it will show the cached page and the exit. I will explain later why exit. If the cached file must be created this code will be executed :

    $this->caching = true;
    ob_start();

    The first statement indicates to the close() function that it is creating the cache file, and the ob_start() will start buffering the output. The buffer's data will be used later by the close() function to save the cache file.

    function close()

    This function must be called from the end of your script, and it will do the rest of the job. Actually it is needed only when we are in the process of caching that's why it starts with the statement if ( $this->caching )
    Let me explain what is happening here :

    $data = ob_get_clean();

    Here we get all the data from the output buffer while we unset it, and put the data in the $data variable. The four statements that folow up are showing the data and then write the cache file.

    Troubleshooting

    This is a very simple class, and the purpose is to learn how you can implement a caching solution for your site. The obligation using this class is that you must use it only in this form :

    PHP Code:
    <?php
     $a 
    = new cache();
     ....
     ....
     ....
     
    $a->close();
    ?>
    If you have code after the $a->close() statement, the class will not work right. This is because of the exit() statement in the cache() function.

    Of course you can take this code and make it work for your own needs.

    A quick solution is to remove the exit() statement in the cache() function and then use the class this way :

    PHP Code:
    <?php
     $a 
    = new cache();
     if ( 
    $a->caching )
     {
     ....
     ....
     ....
     }
     
    $a->close();
    ?>

    * You can publish this article to your site, but only if you give back credit, and a link to http://www.webdigity.com/ Thanks

  2. #2
    SitePoint Addict dek's Avatar
    Join Date
    Oct 2004
    Location
    UK
    Posts
    352
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Essential knowledge.
    Another wrinkle on this is the 'publish' model, where the site is published (either periodically, or after a set of changes have been made) to create a set of static pages, which the webserver serves directly. It's a little more involved, but does result in an even faster site.
    Only dead fish go with the flow

  3. #3
    Non-Member
    Join Date
    Oct 2005
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Does this script create an original "cache" file then keep overwriting the same file?

    If so, this is exactly what I needed. If not how could it be done? I want to be able to store the last say 100 dynamic pages as static....

    Thanks

  4. #4
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That's exactly what this script does. You set the time between updates, and you are done

  5. #5
    SitePoint Evangelist Will Kelly's Avatar
    Join Date
    May 2005
    Location
    London
    Posts
    475
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Having recently looked into caching (and written my own app), I found time based caching updates to be next to useless for most applications! If content is updated the cache should be updated there and then (or on next page load) not based on a set time. Otherwise content updates are not reflected straight away and the cache is updated un-necessarily when no changes have been made. Just my two cents.

  6. #6
    SitePoint Evangelist luxinterior's Avatar
    Join Date
    Aug 2004
    Location
    Here, there and everywhere!
    Posts
    458
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for that Nikolas. Caching is something I've been meaning to implement on a few of my sites for ages and you article was just the kick in the *** I needed

    Cheers!

    Lux

  7. #7
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey luxinterior, welcome


    Will Kelly there is a simple solution to your problem. You can just erase the cache files on every update from your data input - update script.

  8. #8
    SitePoint Evangelist Will Kelly's Avatar
    Join Date
    May 2005
    Location
    London
    Posts
    475
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes I know that, having recently written my own caching system...

    What I'm saying is in in your example and many others on this site and elsewhere (such as pear cache_lite) by default use time to update the cache which to me doesn't make sense as content is hardly ever updated at set intervals and is therefore redundant in the first place.

    Therefore why keep promoting that method? It just seems the wrong way of going about it. Any thoughts? discussion?

  9. #9
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sure. Here is an example of the same class but a little modified :

    PHP Code:
    <?php
    class cache
    {
        var 
    $cache_dir './tmp/cache/';//This is the directory where the cache files will be stored;
        
    var $cache_by_time false//Set this to false if you don't want to update the cache every $cache_time seconds
        
    var $cache_time 1000;//How much time will keep the cache files in seconds.
        
        
    var $caching false;
        var 
    $file '';

        function 
    cache()
        {
            
    //Constructor of the class
            
    $this->file $this->cache_dir urlencode$_SERVER['REQUEST_URI'] );
            if ( 
    file_exists $this->file ) && ( ( fileatime $this->file ) + $this->cache_time ) > time() && $this->cache_by_time ) )
            {
                
    //Grab the cache:
                
    $handle fopen$this->file "r");
                do {
                    
    $data fread($handle8192);
                    if (
    strlen($data) == 0) {
                        break;
                    }
                    echo 
    $data;
                } while (
    true);
                
    fclose($handle);
                exit();
            }
            else
            {
                
    //create cache :
                
    $this->caching true;
                
    ob_start();
            }
        }
        
        function 
    close()
        {
            
    //You should have this at the end of each page
            
    if ( $this->caching )
            {
                
    //You were caching the contents so display them, and write the cache file
                
    $data ob_get_clean();
                echo 
    $data;
                
    $fp fopen$this->file 'w' );
                
    fwrite $fp $data );
                
    fclose $fp );
            }
        }
    }


    //Example :
    $ch = new cache();
    echo 
    date("D M j G:i:s T Y");
    $ch->close();
    ?> 

    By using the extra method cache_by_time you can determine if the file will be updated after n seconds or it wont be updated at all

  10. #10
    SitePoint Evangelist Will Kelly's Avatar
    Join Date
    May 2005
    Location
    London
    Posts
    475
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks but I don't want code, I was just trying to discuss caching itself!

    (and btw I think my point still stands, time based is still redundant in that example.)

  11. #11
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hehe. Right. I just posted that in case someone want it.

    I think that when you have a site that there are no data updates at all, the database is fine only for storing your data. As your pages are actually static, the only reason for using a dynamic system to serve them is actually an advantage for the programmer, but not for the user as this procedure produces overhead.

    So my opinion is to use a cache system for these pages as you will have a better server response. For instance I own a portal site ( http://www.topsites.gr ) which has a news section.

    When an article is added, it wont be updated again. It could be easy to do something like "SELECT ...... WHERE id = ....", but as there are more than 10.000 records in the news table, and the server has some traffic, it would be too slow to serve the pages. And the real problem is when one or two bots are crawling the site. It can even crash the server.

    So I suppose that caching can really help in these situations, and the whole process of caching will not slow down the load time(at the first time that a page is cached).

  12. #12
    Keep it simple, stupid! bokehman's Avatar
    Join Date
    Jul 2005
    Posts
    1,935
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The only realistic thing to do it delete the cache when the database is updated, either selectively or globally then no timed updates would be needed.

  13. #13
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Bokeman that's true.

    I actually use all of the spoken methods of caching for my sites. I have pages that will not be updated untill the file is deleted, others that will be cahed on time frame basis, and others that will cache php files(the cached file will execute some code eg <?echo date("Y , m, d");?>).

  14. #14
    SitePoint Evangelist Will Kelly's Avatar
    Join Date
    May 2005
    Location
    London
    Posts
    475
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Nikolas, you havent really approached the subject I was talking about! But you are describing a situation that doesn't need timed caching. You've created the article and it's cached, unless you update the page layout or change the data it never needs to be updated. So why promote use of a caching system that will un-necessarily update that cache file every X days (or not update when you make a change) etc?

    Quote Originally Posted by Nikolas
    I actually use all of the spoken methods of caching for my sites. I have pages that will not be updated untill the file is deleted, others that will be cahed on time frame basis, and others that will cache php files(the cached file will execute some code eg <?echo date("Y , m, d");?>).
    This probably needs to be included in your example/tutorial as it seems incomplete without it. (or would that be part II )

  15. #15
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For example take a look at http://www.thetopsites.net/ directory.

    On the category pages (eg. http://www.thetopsites.net/sublime%2...ism_2_1_1.html ), the only thing that will need to change in timed basis is the date in the up header, and the menu (because the onMouse over data can actually change)

    The solution for this is creating a cache page which is actually a php page. So the cached page will not just printed in the screen, but it will be included from the caching script.

    On the other hand on a listing page ( eg. http://www.thetopsites.net/sites/?24.2.1.HumanWorks.Gr ) there are data that can be changed regullar. For example the rating of a site can change or a review can be added, so it will need a mixture of time caching and of the delete thing we told about (if a new review is added then the cache file will be deleted. Otherwise it will continue to exist untill it will expire)

  16. #16
    Non-Member
    Join Date
    Oct 2005
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Nice script, take a look at mine please!

    Quote Originally Posted by webhead2
    Does this script create an original "cache" file then keep overwriting the same file?

    If so, this is exactly what I needed. If not how could it be done? I want to be able to store the last say 100 dynamic pages as static....

    Thanks
    I am biting my tongue.... what I meant to say is "this is not what I needed." I would like to be able to cache 100+ versions of my pages.

    Keep in mind my goal of doing this is not to speed up the website.....
    I just want a directory full of static pages, so that Google LOVES it!

    I have a "cache wrapper" script that i modified, and it works decent. It creates a static page with md5 hash then appends it with my model, make and year name of equipment.

    So the end file is some thing like:

    903b56ba9d422d9ad2c73dfa1b0a6f48_1996_Grove_MZ71CXT.htm

    Which is really awesome. My only problem is htat i don't need the md5 hash. I'd be perfectly happy with a file name like:

    "_1996_Grove_MZ71CXT.htm"

    Am I correct in assuming Google will like this filename a LOT more?

    I have tried to reocde this line, omitting the md5 hash. but I always get empty files or other errors.

    $cachefile = $cachedir . md5($page) . '_' . '.' . $cacheext; // Cache file to either load or create.

    Here is my complete code..

    Code:
    <?php
    
      // Settings
      $cachedir = '/home/********public_html/***/cache/'; // Directory to cache files in (keep outside web root)
      //$cachetime = 10; // Seconds to cache files for
      $cacheext = 'htm'; // Extension to give cached files (usually cache, htm, txt)
    
      // Ignore List
      $ignore_list = array(
        'www.high-lift.com/***/cache/style.css',
        'www.high-lift.com/***/cache/show_image.php'
        'www.high-lift.com/***/cache/config.php'
        'www.high-lift.com/***/cache/images/'
      );
    
      // Script
      $page = 'http://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI']; // Requested page
      $cachefile = $cachedir . md5($page) . '_' . '.' . $cacheext; // Cache file to either load or create
    
      $ignore_page = false;
      for ($i = 0; $i < count($ignore_list); $i++) {
        $ignore_page = (strpos($page, $ignore_list[$i]) !== false) ? true : $ignore_page;
      }
    
     // If we're still here, we need to generate a cache file
    
      ob_start();
    
    
    ?>





    Code:
    <?php
    
      $newfile = $cachedir . md5($page) . '_' . $year . '_' . $make . '_' . $model .  '.' . $cacheext;
    
      $fp = @fopen($newfile, 'w');
         echo"$cachefile <br> $page";
      // save the contents of output buffer to the file
      @fwrite($fp, ob_get_contents());
      @fclose($fp);
    
      ob_end_flush();
    
    //
    
    
    ?>

  17. #17
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    About Google, it doesn't matter if the page is static or dynamic ( that was a long time ago )

    What matters for Google is the filename. It gets the words contained in this as keywords.

    As for your script, if you get rid of the md5, you should use urlencode so that the spaces and the other http characters like &, / etc will be rendered.

  18. #18
    Non-Member
    Join Date
    Oct 2005
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    [QUOTE=Nikolas]About Google, it doesn't matter if the page is static or dynamic ( that was a long time ago )

    how can you say that? That's not true at all... anyone?

    I have a page details.php that pulls a huge amount of info from the database.

    My cache script captures the page as it is rendered and then saves it. Then, it's going to write a link to an file of the file it just created.

    How else would google see the details of the equipment?

    Can google index MySQL now? lol

  19. #19
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For Google it doesn't matter if the uri is a.php?t=blabla or if it is blalbla.html

    This is from here

    Fiction: Sites aren't included in Google's index if they use ASP (or some other non-html file type.)
    Fact:
    At Google, we're able to index most types of pages and files with very few exceptions. A sampling of the file extensions we're able to index includes: pdf, asp, jsp, html, shtml, xml, doc, xls, ppt, rtf, wks, lwp, wri, swf, cfm, and php.

  20. #20
    Non-Member
    Join Date
    Oct 2005
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Nikolas
    For Google it doesn't matter if the uri is a.php?t=blabla or if it is blalbla.html

    This is from here
    You missed my point. I aleady know the above. What I said was that Google can't reference VALUABLE data stored in a DATABASE without some sort of caching system.

    In my situation i had a SINLGE php file that was responsible for getting all of the information from my database.

    Of course google can see this file!!! it just can't see the information that it is supposed to RETRIEVE.

    I just finished up a custom cache system that even builds the sitmap.xml file dynamically.

    Think before speak

  21. #21
    SitePoint Wizard Nikolas's Avatar
    Join Date
    Feb 2005
    Location
    Greece
    Posts
    1,221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I really don't understand what you mean. Sorry my english really sucks.

    About your caching system, may I ask you how do you implement this?

    You are running a cron job to create the files or .htaccess?

    I used a cron job for my directory, and after the site had a lot of pages, it was impossible to continue this way because it was creating the files in more than 2 hours, so I made an htaccess based caching system

  22. #22
    Keep it simple, stupid! bokehman's Avatar
    Join Date
    Jul 2005
    Posts
    1,935
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Nikolas
    I really don't understand what you mean.
    Basically what he is saying is that if the data is stored in a database and accessed using a query string (not a real file) google can't find it with a directory read. Since google doesn't "brute force" query strings how would googlebot know the data existed in the first place. The only way is by luck: i.e. it has crawled another page with an anchor to the query string.

  23. #23
    Non-Member
    Join Date
    Oct 2005
    Posts
    205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by bokehman
    Basically what he is saying is that if the data is stored in a database and accessed using a query string (not a real file) google can't find it with a directory read. Since google doesn't "brute force" query strings how would googlebot know the data existed in the first place. The only way is by luck: i.e. it has crawled another page with an anchor to the query string.
    bokehman, yea, you understand me.

    However
    The only way is by luck: i.e. it has crawled another page with an anchor to the query string.
    that is one way but not the only way.

    The other way is to save the resulting page as a static one when it's resolved. It also helps to give the file itself a specific name.

    For an extra measure.... then have PHP append your sitemap.xml file!

    Go to www.high-lift.com/cache...... these are all static pages that google would have otherwise never seen. It's still in testing phase, but looks very promising.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •