Large file uploads

I’m looking forward to using what I learned in the fifth edition of “PHP & MYSQL: NOVICE TO NINJA” to learn how to handle large file uploads, of more file types than were used in examples in that book. Will this still involve only binary data? If so, I hope I’m wrong, but as far as I can tell, binary data doesn’t seem to be one of the more important things in the sixth edition (though, it can’t cover everything). One way is to look for a book that (1) covers how to handle these large files, and (2) has more recent information about how to make a website than the 2012 fifth edition. Hopefully, these two things will be in the same paper book. (I prefer paper books, but I would rely more on learning from websites about code if that would be better for this. By paper, I mean what the pages are made of. What the cover is made of doesn’t matter to me.) Also, there are PHP frameworks, such as the frameworks the fifth edition mentions (which are Zend Framework, CakePHP, and Symfony). Or, is there a better way to learn how to work with large files?

Not sure of any books that teach this but the gist of the problem is to slice a file up with JavaScript and send the data sequentially to the server via XHR requests. Then these chunks will be stitched back up in the server.

In terms of the server PHP this is not very complicated. You’ll just need to do something as such when you receive the data in the server:

file_put_contents($fileName, base64_decode($data), FILE_APPEND);

The JavaScript on the contrary is a little more involved but I’m happy to give you a head start:


const file = document.querySelector('input[type="file"]').files[0];
const reader = new FileReader();
const fragmentSize = 1000000; // the size of each file fragment to upload in bytes. 1000000 = 1mb
let fragmentIndex = 0;
let position = 0;

const readFile = () => {
    const blob = file.slice(position, position + fragmentSize);
    reader.readAsDataURL(blob);
    position += fragmentSize;
    fragmentIndex++;
};

reader.onload = () => {
    let progress = Math.floor(fragmentIndex * fragmentSize * 100 / file.size) || 0;
    if (progress > 100) progress = 100;
    
    uploadFragment(file, sanitizeFragment(reader.result), progress, (response, requestPayload) => {
        if (position > file.size) {
            // file has completed uploading
            return;
        }
        readFile();
    });
};

const uploadFragment = (file, data, progress, callback) => {
    // this function makes an AJAX request to the server and sends the file data
    // callback function gets called upon successful completion of AJAX request
}

reader.onerror = event => {
    // error handling here
};

// trims unnecessary info from file data 
const sanitizeFragment = (data) => {
    if (data.indexOf('base64,') != -1) {
        data = data.substring(data.indexOf('base64,') + 7);
    }
    return data;
}

readFile();
1 Like

How large is large? And in what way do you think that uploading a large file will be any different from uploading a smaller file?

1 Like

I haven’t figured out what range the file sizes should be in. It would be great if there was a source that matched the purpose of files with a very loose approximation of the range their sizes should be in, so I can learn the language I should use to answer your question. (For example, the book I mentioned describes one purpose of files as storing head shots (for use in a staff directory), which are small enough to be processed one way, but not a way that I can use.) Until I learn that language, the best I can say (which probably isn’t very useful) is as small as possible, while still letting artists give people a taste of their artwork.

As for the difference, beginning on page 371, it says “large files lead to large databases, and large databases lead to reduced performance and humongous backup files.” My understanding is that this could be the result of storing BLOBs in MySQL. Also, on page 388, it says “fully developed solutions to these problems [of large files] are beyond the scope of this book.” I think an alternative is to use filenames in a database with code like the following (which is on page 370).

$filename = 'C:/uploads/' . time() . $_SERVER['REMOTE_ADDR'] . $ext;

$ext could be ‘.jpg’, ‘.gif’, or ‘.png’. As far as I can tell, this doesn’t have to involve storing BLOBs in MySQL, but rather uses a path/filename.

Thank you, Andres_Vaquero

Do you think the book I mentioned said “fully developed solutions to these problems [of large files] are beyond the scope of this book,” because the book didn’t cover any JavaScript at all, and/or because the book didn’t cover more advanced PHP? If I can use JavaScript or PHP for this, what are the advantages and disadvantages of both? While my knowledge of PHP isn’t very advanced, it is more advanced than my knowledge of JavaScript, so I might prefer to use PHP. I would only prefer PHP if it isn’t much harder to learn than, and can do as much as, JavaScript, for what I’m working on.

Yeah you should avoid storing images in a database and store them instead in the server’s file system and just store the path in the database if you need to.
If you only want to use PHP there is a simple way to upload files through an HTML form and an HTML file input. Below is a simplistic working example without any error handling:

<form method="POST" enctype="multipart/form-data" action="/upload-file.php">
    <div>
        <label for="my-file-input">My File:</label>
        <input id="my-file-input" name="my-file" type="file" />
    </div>
</form>

The PHP would be something like:

if (isset(($_FILES['my-file']['tmp_name'])) {
    move_uploaded_file($_FILES['my-file']['tmp_name'], '/path/to/my/images/folder/'.$_FILES['my-file']['name'])
}

This will let you upload files up to the server’s upload max size limit. These settings can be changed in php.ini: upload_max_filesize and post_max_size.

This approach is okay so long as your files are below that limit and you will not need any JavaScript to upload the files. The limitation of this approach however is that you cannot provide any feedback to the user while the file is uploading, and if the files are large they might have to wait for a while until they are uploaded and the server gives a response.
If you have proper error handling in place you will avoid making the user wait for a while in vain, as the server by itself does not cancel the upload until it has reached its max size limit.

The way I described in my first post, which involves Javascript, is a bit more advanced and makes asynchronous uploads, which lets you provide progress feedback to the user, and lets the user keep interacting with the page while the files are uploading.

That is one way, yes. Another way would be use some kind of cloud storage, like Amazon S3, Google Cloud Storage, etc.

The advantage of those is it decouples the storage from your server, so your server is no longer a single point of failure for that storage. If the server ever fails you get a new one and continue as normal.

Another advantage is that if you ever hit enough traffic that you need multiple servers you don’t need to change anything because both servers can access all files in the cloud storage.

Will the JavaScript Andres_Vaquero mentions still be useful if I use the cloud storage rpkamp mentions?

No, cloud vendors will provide an API to use of their own. A more flexible approach would be using a library like Flysystem to handle all this for you. Flysystem not only makes it easy to read and write to to a local file system but has several adaptors for cloud vendors as well. Flysystem is really powerful once you learn some more advanced techniques of using dependency management libs like composer and OOP. The PHP ecosystem has a lot of really great libraries once you embrace composer and using other peoples work. Writing low level code like this when there are libs is really not productive, professional or considerate to future developers.

1 Like

To do what I want, after reading what Andres_Vaquero wrote, I got the possibly wrong impression that I need “more” than PHP, like JavaScript.

Is Flysystem “more” than PHP?

Also, is there a reason Flysystem was mentioned, and not a different web framework at Comparison_of_web_frameworks?

Flysystem is a library, not a framework. I don’t think anyone is suggesting you should use a framework.

Sorry, a Wikipedia search using the word “Flysystem” only redirected me to the Laravel page, and I think Laravel is a framework. Thank you for the explanation.

Apologies if I was misleading, you’ll need JavaScript if you need to upload large files that surpass the server’s file upload size limit setting. You’ll also need JavaScript if you want to display upload progress to the user while the files are uploading. Otherwise PHP is just fine.

I think because Laravel uses that library, but it is possible to use the library without Laravel. Laravel is however one of the most popular frameworks for PHP.

Correct. The other one would be Symfony, which is a better framework in my opinion.

1 Like

Does SitePoint, or another publisher, have a print book about how to do the following? I know I already asked this question, but I didn’t get more than one reply to it, and they didn’t seem to know.

For what I am about to describe again, does JavaScript have adaptors for cloud venders that might be useful? If so, how can its adaptors be compared to Flysystem’s, and things like Flysystem?

In such a book, I hope to find examples of common uses of different file types and their sizes, because what I want to use files for might not be common, but I hope to match my uses with common uses, so I can figure out what size files I should prepare to work with.

Which, if any, of those examples (of common uses of different file types) are likely to have a file size that surpasses the server’s file upload size limit? For those that surpass, can/should Flysystem, or anything like Flysystem, handle those files? While uploading files, which websites would be better if they (1) let users keep interacting with a page, and/or (2) displayed upload progress to users? Can Flysystem, or anything like Flysystem, do one or two of these two? I think JavaScript, but not Flysystem, can; but I want to double-check.

Again, I’m working on a website to let artists share samples of their art with other users. Considering this, are there any obvious things that I should have asked, but didn’t?

I think there may be some slight misunderstanding. I get we’re discussing - large files - and then there’s frameworks. I’m not really getting how they relate.

OK, large files are, errmm, large. By extension, they will occupy more storage space and have longer I/O

It’s subjective how much is too much / too slow, but AFAIK these will be the bottlenecks. I have a feeling that finding a “specialized” host server would be time well spent. eg. a CDN, streaming, ???

It is possible to tweak settings in PHP, but I’m not so sure it’s the best choice of languages. I have some experience with Java and I know it has mature I/O classes so I’d look there first. But it’s very likely another language would be a good choice. eg. Python?

One important distinction to make here is client side code (that runs in the browser of your user) and server side code.
For the client side code you can only use Javascript, browsers don’t support anything else.
For the server side there are indeed multiple options, but it doesn’t matter much; they are all able to handle large file uploads.

Regarding the upload progress, it is only possible to show this through Javascript. The server side on its own can never display this, because then they need to refresh the page and that would break the upload.

So progress bars only work with Javascript.

Then there is the question of file types and file size. That’s easy, there is no relation. There are some typical ranges for certain file types, but there are no hard rules. For example, plain text files are typically a few kilobytes, but I’ve seen text files of several gigabytes. Images are typically a few megabytes, but when you get into RAW they easily surpass 100 megabytes.
So anything you do based on this will be pure speculation, and quite likely to be wrong.

Lastly, regarding the uploading of really large files, people normally do what was discussed earlier in this thread; break it up in chunks and upload these chunks individually. One reason is that large uploads take a long time, and any backend language will time out after some time of processing an upload. Another reason is that if the upload fails just at the end the entire thing would need to uploaded again, whereas if a chunk fails it should be possible to just retry that chunk, saving quite some data and time.
What you’d do than is wait until the entire file is on your server, and then ship it off to some cloud storage.

1 Like

If I wanted to write my own code instead of relying on a CDN to do what I need it to do, would you please tell me about how many more books I would have to read, so I can figure out if it is realistic for me to get that advanced within my time frame? To give you an idea of my level of skill at computer language, in college, I passed two terms of Java, and one in data structures. I’ve read a book on JDBC (Java Database Connectivity), “Web coding & development all-in-one” (a long book in the For Dummies’s series), and “PHP & MYSQL: NOVICE TO NINJA” (published by SitePoint).

Just to double-check, will Flysystem let me do this? For what I’m working on, is there one or more better ways than Flysystem?

“Libs” means library, right?

Will “Learning PHP : a gentle introduction to the Web’s most popular language” or “PROGRAMMING PHP : creating dynamic web pages” (both of which are published by SitePoint’s partner, O’Reilly) try to teach me enough about composer to learn as much about Flysystem as I need for this?

By using JavaScript, right? Does the following involve doing any more?

People also discussed JavaScript uploading files that surpass the server’s file upload size limit setting, providing upload progress feedback to users, and letting users keep interacting with a page while a file is being uploaded. Will “JavaScript: Novice to Ninja, 2nd Edition” (which is published by SitePoint) try to teach me how to do all this with JavaScript?

You really do want a CDN for this kind of stuff though. It’s a solved problem. Use others solutions instead of rolling your own. It’ll take a lot of time and it will never be as good as what is out there.

Flysystem would work for this. There may be other solutions, but they all do the same thing: copy files from your machine onto a CDN. The advantage of Flysystem is that the code is quite good, it’s been battle tested by lots of projects and it has support for a lot of different storage mechanisms.

Correct

Regarding the questions on the books, I’m not sure as I have not read the books you’re referring to.

To double-check, Flysystem can be used to communicate with a CDN in addition to communicating with a cloud? That would be great. Before rpkamp’s last post, we were only talking about using Flysystem with a cloud, not a CDN.

Because Flysystem seems so useful, it seems like it would be in WorldCat, which is the world’s largest bibliographic database. Why isn’t it?

Can you please recommend any other (preferably print) books for all this?

zee, because you knew about the https://flysystem.thephpleague.com/docs/ link, maybe you will also be able to help me get such (preferably print) books.