Why do we create 'files'? "fopen($filename, 'w')

Hello Sitepoint PHP,

Quick question regarding the use of files; creating, reading, writing, etc… As I understand you can create text files to store information and recall or append them later on.

How is this function used in php development?
Could you not use MySQL to store the same information?

I searched online to see if there were any posts explaining why and how this function is used but turned up nothing but actually tutorials on how to use it.

Thank you,
Chris

It all depends on what you intend to do with the files afterwards. If the file you are creating is intended to be downloaded and read into another program then saving the information in a database is not an option as you still need to create the actual file before it can be downloaded.

Are you seriously saying that files have absolutely no use?

What about logs? Do you want to make a DB query every time you need to log (i.e. every page)? Sessions? XML? Queued emails? Configuration? Uploaded files?

This can’t be a serious question.

It’s a perfectly reasonable question, I think. Of course you can do stacks of stuff with a database, including storing files. The heart of this question seems to be more about why you might choose to do one or the other.

Let’s try and shed some light on the pros and cons of each approach before we dismiss the whole idea out of hand.

Thank you raena, I am new to PHP so initially I was curious as to what tasks would be better suited for file creation rather than using a database.

Perhaps Aliendev could you elaborate a bit on some of your examples?

Sure.

  • Logs - you virtually always want to log at least 1 piece of info about every page hit. That could be IP, referer, etc. Database queries are slow, so adding 1 query to every page load will slow your website significantly with high traffic volumes. Also, these are literally never searched in the way tables are.
  • Sessions -Same as above. The data they hold is undefined and may be big and multidimensional, so there is no point trying to put it in a DB.
  • Configuration - This will typically contain (among other things) the database connection details, so those obviously have to be in a file (XML, YML, whatever). Also, these files will grow to have many different keys and structures.
  • Uploaded files -the are some people who think storing images in a database is a good idea, but they are idiots, so ignore them.

Basically, anything that you don’t NEED to do joins/searchs on, or that isn’t an exactly defined data set, shouldn’t be stored in the database without good reason.

Actually, here is an example I have on hand. It records who is active on the website and when, to create a “currently online” feature. Doing this in a DB is of course possible but adds unnecessary overhead when a simple file works.


class LatestOnlineTable extends Singleton
{
    private $file, $list;


    protected function __construct()
    {
        $this->file = '/data/latest-online';
        $data = (is_file($this->file))
            ? file_get_contents($this->file)
            : false;

        $this->list = ($data !== false)
            ? unserialize($data)
            : array();

        $this->purgeOld(600);
    }


    /**
     * Remove entires that are now past the expiry limit.
     *
     * @param integer $secondLimit
     */
    private function purgeOld($secondLimit)
    {
        $now = time();

        foreach ($this->list as $username => $time)
        {
            // Remove entry if old
            if ($time < ($now - $secondLimit))
            {
                unset($this->list[$username]);
            }
        }
    }


    /**
     * Set a username's timestamp to right now.
     *
     * @param string $username
     */
    public function touch($username)
    {
        $this->list[$username] = time();
    }


    /**
     * Get a list of usernames that were recently active.
     *
     * @return array
     */
    public function getActiveMembers()
    {
        return array_keys($this->list);
    }


    public function __destruct()
    {
        file_put_contents($this->file, serialize($this->list));
    }
}

To me it’s about how often the data is going to change and be needed. For example.
I have an RSS feed that reads from a (relatively small - 12KB) CSV file, the data doesn’t change that often so it’s easy for me to edit-upload the file when necessary.
One of my plugins writes to log files. They may be appended to several times during a day, but after that day they’re essentially static. Only the admin may need to read them later.
WordPress uses a database for most everything except the config file, yet if someone wants to speed things up, it’s recommended to cache (create files for) the post pages.

IMHO I don’t consider optimized queries to an optimized database to be slow at all, yet there is always the question of a possible database error as opposed to a possible filesystem error. I don’t know as one is all that much faster or reliable over the other, but it’s something to consider. Probably the real bottleneck is poorly written PHP code.

Not true. Using mod_rewrite and the proper [fphp]header[/fphp] calls to get the mime type right PHP can, in theory, output everything from the database without ever creating a file. It’s going to be slower than outputting files at all times though bringing up the more pertinent question of should it be done. Because it certainly can be done.

Static is always faster than dynamic. Doesn’t matter the language, and no amount of skill on the programmer’s part is going to come within spitting distance of closing the gap.

When I built my own framework I designed it so that mod_rewrite passes control to PHP only when the file doesn’t exist. The .htdocs folder of the project becomes my main cache area - the framework php files themselves are all stored elsewhere with only one PHP file in the .htdocs folder called pamwf.php which the mod_rewrite code points to (and if I could bypass the creation of said file I would). This way when possible static file sends are used at all times.

I wasn’t talking about HTTP requests for static files vs. HTTP requests for dynamically created files. I meant a dynamically created file pulling data by reading a file vs. querying a database. (well, in the beginning and last part of the reply). As said, to speed WordPress up, ditch the dynamic db pages and create static cached pages (middle of the reply).

:blush: Now even I’m getting confuzzled.

That’s what I thought you meant and you’re right - doing a file passthrough is not likely to be any faster or slower than using a database unless the file is the end result of a LOT of queries (say, more than 10). But if you’re able to build a cache file why not put it where Apache can statically read it and dodge PHP entirely?

That’s the approach I use with this caveat - I still use PHP snippets to handle file expiry, taking advantage of PHP’s ability to write a file it can eval as code later. While this does involve using the parser I can dodge my framework startup (starting my framework involves reading at least 10 files because it is Object oriented) and the overhead associated with that.

Do keep in mind that checking if the file exists with mod_rewrite adds an overhead performing that action :wink:

The thing that gets me every time I see a file VS database conversation is that people seem to forget that databases store data in files. Technically, a database is nothing more than an application which provides structured access, statistics about, and easy ways to replicate, data that’s in files.

You can’t really compare database engines in general, to basic file system functions. Technically the database engines are going to be using those same file system functions anyways. You can compare database engines, to other applications designed to store and retrieve data from files though.

Where database engines shine is reading. They do so because the engine maintains indexes about where certain types of data are currently kept in files. Whereas basic file system functions don’t really track what’s in the files as much as they do track the files themselves.

These indexes however, put database engines behind basic file system functions by themselves when it comes to writing. While the file system functions can just seek to the file, determine where to start writing, and write, the database engine typically has to update the indexes as well which means looking at the data and determining where to put it along with what indexes need to be altered/recalculated/etc.

To get a picture of how this works, imagine you have someone standing in the back of a moving truck taking boxes from you and placing them in the back of the truck.

Basic filesystem functions would take the box, find somewhere in the back of the truck the box would fit without wasting a bunch of space, and be ready for another box immediately after putting it in that spot.

A database engine on the other hand, would look in the back of the truck, then look in the box, then look a categorized list of things that have already been put in the back of the truck, determine if there are enough of a certain type of item already in the truck to trigger lunch time, update the list of items, recalculate the statistics about everything in the back of the truck, and a few other odds and ends before being ready for another box.

This is why you typically don’t see things like server access logs using a database. You might see something that takes previous logs and indexes them using a database engine, but not so much something that logs directly to a database engine.

Imagine that same moving truck, but now imagine there are a dozen people bringing boxes all at the same time instead of just you.

Off Topic:

@joebert, Another night-owl, or early riser?

I like that analogy. Now imagine someone coming and saying “I need the blue dishes back”.

So I guess the moral is “What are you planning on doing with the data?” which goes back to #2 felgall

If you do not want to use file system using fopen etc and also do not want to use Database then SQlite is a good option. It is file base database. So it will take care of many things like database does but using file.

My localhost uses MySQL in mysql/data/[db name]/ are files
db.opt - containing the config info eg.

default-character-set=latin1
default-collation=latin1_general_ci

[table name].frm - binary and text
[table name].MYD - binary and text (the “data”)
[table name].MYI - binary

These files are the database/tables

Uploaded files -the are some people who think storing images in a database is a good idea, but they are idiots, so ignore them.

I would actually argue that it is a good idea to store some images in the database. Personally, I think avatars are a good candidate for an image that would do well in the database. Of course, photos, wallpapers and other images should be on filesystem.

But you have to do that anyway so your point is? (and mod_rewrite in the httpd.conf file does not incur a noteworthy performance hit unless you cause after trillionths of a second).

Why are you arguing that using a hammer to insert screws is better than using a screwdriver for the task?

You should use the most appropriate tool for each task and where the most appropriate is to create a file you should do that rather than using some convoluted approach to avoid it by creating a file with more complex interface (a database) instead.