I am creating an account and download system for myself that my users can download my script from their account.
Do you suggest that I show a md5 or sha1 checksum for zip files in my account that my users can see the checksum? What’s its purpose?
The idea is that users can generate their own checksum on the downloaded file, compare it to the one that you display on your site, and if the two are different they can wonder whether it was a bad download, or a corrupted file, or that you’ve posted the wrong file, or someone else has done that. Basically it tells them whether or not the file they have downloaded is the same as the file you provided for them.
I guess whether you should do it is down to your user base - a lot of users will see that there is an MD5 checksum there, and either just ignore it or not know what to do with it.
I googled and found that more often this is done for .iso or .exe files, so is it still good for a php product that is zipped or .tar.gzipped?
I am aware of md5sum command of linux but how would a shell script to calculate and compare?
If your user is the kind of user that checks MD5 checksums, they will have something to do it with already.
I can’t offer an opinion on whether you should or shouldn’t do it for your type of file. I’m not sure what you mean by " a php product that is zipped or .tar.gzipped " in terms of a file or its contents, but zip files can contain unwelcome stuff. If you were running on a server that had a compromised version of zip then it might add anything - but if it was the zip on your server doing it, then an MD5 wouldn’t help because it would always be correct.
If I want to create a account/dowload manager script to give away for public use, showing/hiding checksum next to listed files for download is a usefull feature for my download manager script?
If the file is redistributable and can be downloaded from mirror sites providing a checksum will allow others to determine if the file is the same.
If the file has been tampered with in any way even if the file name is identical the checksum will be different.
If the file has become corrupt during download the checksum will be different.
So for a account/download manager that user needs to purchase a product then login to download, it is not much useful to provide checksum?
Well if the file doesn’t download correctly the hash would tell you straight away.
The purpose for creating the concept of hashes such as md5 in the first place is that a small change in the file (even one bit lost during the transmission) will result in a completely different hash making it obvious straight away that the file received is not identical to the one sent.
So is it a good feature to offer the ability to show checksum to my account/download manager users?
Yes - it allows people to check that there were no transmission errors in sending the file. The bigger the file the more useful it is
I’m wondering what type of checksum might be best. It’s easy enough to use
$checksum = md5_file($filename);
to generate an md5 checksum.
I know md5 has gotten a reputation for being “rainbowed to death” and is not recommended for hashing anything sensitive. (eg. passwords, CC numbers etc.)
But I can’t imagine anyone wanting to “crack” a checksum hash.
I have noticed some sites offer SHA256 and other checksums
Personally, I have a DIY localhost PHP script that I use.
But there are a lot of “tools”, some specific to various OS’s
My guess is the md5 is “universal” and would be good enough.
But that’s only a guess.
Is that guess close as in horseshoes and hand-grenades?
that’s not relevant for this purpose (only when using a hash for its secondary purpose is a rainbow table relevant) - it isn’t the exact value of the hash that has any relevance it is that the smallest change in the file received will produce a completely different hash - it is that the hash stays the same to confirm the correct transmission of the file that you are checking for.
Anyway you have the original file to produce the hash so you don’t need a rainbow table to tell you some other completely different (and much smaller) file that would produce the same hash.
“Rainbowed to death” is not a problem for file integrity checking in most cases, however there are other weaknesses of md5 that can be important depending on circumstances. Imagine someone wants to spread a malware file that pretends to be the original one (e.g. an installer of a popular software) - certain weaknesses of md5 allow a skilled person to generate a different file whose md5 is identical to the original one. Still, as far as I know, this procedure is not easy and in some way limited so I think it would be very hard to produce a different executable of the same or similar size that would contain a working malware, however the hackers are evolving their techniques and soon this may become quite easy. That’s why if the site contains downloads that are of critical importance for some reason then I would suggest a stronger hash, at least sha1. But for most general use cases I’d say md5 is good enough (like if you don’t work for the military, secret services and the like…).
However, having downloaded really loads of files over many years I haven’t had even one case where a downloaded file was corrupt, even when I had connectivity problems. I suppose the network transport mechanisms already implement some checksums for data packets to make sure the data are not corrupted. The worst thing that happened sometimes was that I got a partial file in which case a simple check of the file size showed something was wrong.
@Mittineague If you are concerned I suggest to use sha384.
hash_file(‘SHA384’, …)
As md5 has a weak algorithm, I expected to find a md5 decoder on the net but there is not any. I know this is intended to be one-way, but decoder for weak algorithm is not odd.
Which of the trillions of trillions of trillions of trillions … of possible values that hash to the same single md5 hash do you wantit to return to you. Even if your decoder could return a trillion values that generate that single hash every millisecond on each and every computer that exists today your script would still be running and returning more values that can generate the same first MD5 hash when the universe ends (even if it never ends). No time to ever move on to decoding all the values that hash to a second MD5 hash value.
You mean some values of trillions may have the same md5 hash. So with one of those trillions may have the same md5 password. Is this why md5 is vulnerable?
Actually, @felgall was a little off in the number trillions of trillions of trillions of trillions - in fact the number is infinity . There is an infinite number of values that hash to the same value and it applies both to md5 and any other algorithm. And this is not md5’s weakness.
I did imply that the actual number is infinity in that the number I entered was incomplete in that I left off an infinite number “of trillions” on the end plus when I said that all the computers in the world running nonstop forever would still never output all the different values that map to a single hash.
So many people are used to only using hashes for their secondary purpose of concealing passwords that they overlook that hashes were not designed for that purpose but were intended to provide a way to detect small changes in the original content by only having the same has repeat for big changes in the content and with the smaller the change in the content the bigger the change in the hash.