SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)

    PHP + cURL + MD5 = Slow, Help?

    I have a PHP script which uploads a file to a CDN using cURL.

    They send me back an MD5 checksum (or I can generate and send them one for them to check).

    Either way I check it, I need to generate an MD5 on my end. Problem is, MD5 is SLOW. In about an hour the file had only uploaded about 65% (100MB). I took out the MD5 to check and a 500MB file uploaded much quicker.

    It's only one file at a time (right now), so I can't really parallel any calls since there is nothing really to parallel.

    Any tips to make this faster?

    Thanks.

  2. #2
    SitePoint Addict
    Join Date
    Dec 2005
    Posts
    336
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Dumb question, can you (or them) create the checksum, store it somehow (text), upload the file, and check it after the file's been uploaded?
    Another dumb question, does it have to be PHP?

  3. #3
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by samanime View Post
    Either way I check it, I need to generate an MD5 on my end. Problem is, MD5 is SLOW. In about an hour the file had only uploaded about 65% (100MB). I took out the MD5 to check and a 500MB file uploaded much quicker.
    What do you use to generate the md5 checksum?

    As you mentioned when you removed the md5 check it was faster, this leads me to believe you create the md5 checksum before uploading, or?

  4. #4
    SitePoint Guru
    Join Date
    Aug 2009
    Posts
    669
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I use md5 on files myself. I have one file which is around 650Mb and it takes less than a minute to perform md5_file() on it.

    You're either on a shared server or you're using your own md5 function because it shouldn't take that long.
    I'll do anything to avoid working on my own code

    Are you using: if (isset($_POST['submit'])) ?
    IE has a bug and does not always send the value.

  5. #5
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sounds like a processor issue... beef up the machine you're using.

    Are you doing an md5() on each packet you send? How big is a packet?

  6. #6
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    The first time I was generating the md5 before I sent, though I can do it before or after in this case.

    I'm using md5_file() to generate the MD5.

    We're on a cloud server which is currently running at 512MB, I can (and will when it goes to production) beef it up, but would that cause a substantial increase?

    I'm doing the md5_file() on the entire file once it is uploaded (by fopening the tmp_name and giving it that).

  7. #7
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    It might seem your limitation is on the cloud server, especially disk IO and how much memory your script can utilize (php memory limit). While cloud servers are nice since they allow you to easily expand when required they can in some cases be "slower" than a normal dedicated with same specs due to the restrictions running on the cloud system (if this is a case for you, would of course depend on the cloud provider you use etc).

    What you might give a try is to use md5sum directly, in the past we have used that as we found it to be faster than md5_file.

  8. #8
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    md5sum directly, as in use exec() to call it?

    I saw that may be a good idea. I'll test it out next week and see if it provides any substantial help.

    It's not a huge deal if it takes a while because I can upload and then take my time with the checksums. However, it's likely we'll have a pretty large number of people uploading a pretty large number of files over the course of a few hours at the end of each week, so I want to make things as quick and as efficient as possible, and right now this seems to be the largest bottleneck I might have some level of control over.

  9. #9
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I doubt it's a memory issue... likely I/O and processor. You should be on a dedicated server with a decent processor and direct, singular I/O access. Try running top and iotop to confirm this assumption.

  10. #10
    SitePoint Wizard TheRedDevil's Avatar
    Join Date
    Sep 2004
    Location
    Norway
    Posts
    1,196
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by samanime View Post
    md5sum directly, as in use exec() to call it?
    Yes or by using backticks, whatever floats your boat

    I can not remember how much faster it was on our case as its a few years since we did that project, but it was big enough difference for it to matter when you checked larger files.

  11. #11
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    This project has been put on hold until late this week/early next week, so I haven't had a chance to test this.

    Once I do, I'll report back my findings.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •