SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    $books++ == true matsko's Avatar
    Join Date
    Sep 2004
    Location
    Toronto
    Posts
    795
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Simple Question about MD5_File and SHA1_File functions

    I imported a database of users from one website to another and for some reason a few of the images downloaded defaulted to a "Image not found" image - so I still ended up downloading an image.

    So I'm not sure what profiles have this image, but I was thinking of getting a md5 or sha1 hash of the file (md5_file or sha1_file) and then comparing that hash to every image on my website.

    So if the hashes match...
    PHP Code:
    $imageNotFoundHash md5_flle('...');
    ...
    if(
    $imageNotFoundHash == md5_file(DIR.$images[$i])) {
    //ohh noooo the image is the default one!!

    then does this mean that this code will work? Is the hash unique to any file contents?
    I can't believe I ate the whole thing

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    It's not unique, but the likelihood of two images having the same hash are low enough that you can act as if it is.

  3. #3
    SitePoint Addict ArunB's Avatar
    Join Date
    Jun 2008
    Location
    Hyderabad
    Posts
    252
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Upon a quick look, I think this is going to work out.

    I have two images(same, copied) like
    Code:
    -rw-rw-r-- 1 ArunB ArunB 152875 Apr  9 05:14 linux1.jpg
    -rw-rw-r-- 1 ArunB ArunB 152875 Apr  9 00:05 linux.jpg
    I applied sha1_file() on both the files, got the same hash out.

    PHP Code:
    <?php
    echo $str sha1_file'/home/ArunB/Desktop/linux.jpg' );
    echo 
    PHP_EOL $str1 sha1_file'/home/ArunB/Desktop/linux1.jpg' );
    Result:
    Code:
    83d1dfbe5ec9ad4b20fcc9f30770be418b2a265b
    83d1dfbe5ec9ad4b20fcc9f30770be418b2a265b

  4. #4
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Those files contain the same image I assume? Then that's the definition of a hash function (the same input always gives the same output, it's a deterministic function). The caveat is that it's not a 1:1 mapping in the reverse direction; two different images can have the same hash value. A good hash function's design is to minimize the number of such collisions.

  5. #5
    SitePoint Addict
    Join Date
    Apr 2009
    Posts
    248
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Note that both md5 and SHA-1 are considered "insecure" hashing algorithms, in that it's considered mathematically easy to create two different files which will hash to the same value. While this probably isn't necessary for your purposes, if you'd like to be completely sure that you won't have any overlap, I'd recommend a "secure" Hash Algorithm like SHA-512 or Whirlpool (both of which are available in the PHP core build).

  6. #6
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,840
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by SituationSoap View Post
    Note that both md5 and SHA-1 are considered "insecure" hashing algorithms,
    That is irrelevant when you are actually using them as a hash to confirm that the file content is unchanged rather than using it fior some other security related purpose.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  7. #7
    SitePoint Addict
    Join Date
    Apr 2009
    Posts
    248
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by felgall View Post
    That is irrelevant when you are actually using them as a hash to confirm that the file content is unchanged rather than using it fior some other security related purpose.

    The OP isn't checking for unchanged content, he's checking to see which users have an image set which match the hash of a default image. As I noted, the chances of two arbitrary images matching are exceedingly rare, but md5 and SHA-1 do not provide the minimal chance of this happening. Does it really matter either way? Probably not.

  8. #8
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,840
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by SituationSoap View Post
    The OP isn't checking for unchanged content, he's checking to see which users have an image set which match the hash of a default image.
    Which is effectively the same thing since the hash works the same way for both.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •