File extension validation when uploading it

I have a site where customer can upload a wireframe of his house. I need to give option for all possible formats (world, openoffice, all images including professional program extensions like illustrator, cdr, …).

Now I have the problem because I don’t know what extensions can be allowed to apload. I found here some information but the problem is that IE changes file extension names. For example it changes jpeg into pjeg. And I don’t know how many similar cases can be left.

Anybody know for any source or library where you get all extensions for each program?

tnx!

I think you just have to create your own array of allowed extensions and also, important to check the mime type as well. This is to prevent a user from uploading some executable file by simply naming it image.jpg
I mean, you can’t just trust the file extension alone, always check the mime type.

Then log all cases when not allowed file is uploaded and examine it, maybe you will spot one more extension that you should allow. This way you will soon build a good list of allowed extensions that suit your sites’ requirement.

Available mime types(Copied from codeigniter framework):

<?php
/*
| -------------------------------------------------------------------
| MIME TYPES
| -------------------------------------------------------------------
| This file contains an array of mime types.  It is used by the
| Upload class to help identify allowed file types.
|
*/

$mimes = array(	'hqx'	=>	'application/mac-binhex40',
				'cpt'	=>	'application/mac-compactpro',
				'csv'	=>	array('text/x-comma-separated-values', 'text/comma-separated-values', 'application/octet-stream', 'application/vnd.ms-excel'),
				'bin'	=>	'application/macbinary',
				'dms'	=>	'application/octet-stream',
				'lha'	=>	'application/octet-stream',
				'lzh'	=>	'application/octet-stream',
				'exe'	=>	'application/octet-stream',
				'class'	=>	'application/octet-stream',
				'psd'	=>	'application/x-photoshop',
				'so'	=>	'application/octet-stream',
				'sea'	=>	'application/octet-stream',
				'dll'	=>	'application/octet-stream',
				'oda'	=>	'application/oda',
				'pdf'	=>	array('application/pdf', 'application/x-download'),
				'ai'	=>	'application/postscript',
				'eps'	=>	'application/postscript',
				'ps'	=>	'application/postscript',
				'smi'	=>	'application/smil',
				'smil'	=>	'application/smil',
				'mif'	=>	'application/vnd.mif',
				'xls'	=>	array('application/excel', 'application/vnd.ms-excel'),
				'ppt'	=>	'application/powerpoint',
				'wbxml'	=>	'application/wbxml',
				'wmlc'	=>	'application/wmlc',
				'dcr'	=>	'application/x-director',
				'dir'	=>	'application/x-director',
				'dxr'	=>	'application/x-director',
				'dvi'	=>	'application/x-dvi',
				'gtar'	=>	'application/x-gtar',
				'gz'	=>	'application/x-gzip',
				'php'	=>	'application/x-httpd-php',
				'php4'	=>	'application/x-httpd-php',
				'php3'	=>	'application/x-httpd-php',
				'phtml'	=>	'application/x-httpd-php',
				'phps'	=>	'application/x-httpd-php-source',
				'js'	=>	'application/x-javascript',
				'swf'	=>	'application/x-shockwave-flash',
				'sit'	=>	'application/x-stuffit',
				'tar'	=>	'application/x-tar',
				'tgz'	=>	'application/x-tar',
				'xhtml'	=>	'application/xhtml+xml',
				'xht'	=>	'application/xhtml+xml',
				'zip'	=> array('application/x-zip', 'application/zip', 'application/x-zip-compressed'),
				'mid'	=>	'audio/midi',
				'midi'	=>	'audio/midi',
				'mpga'	=>	'audio/mpeg',
				'mp2'	=>	'audio/mpeg',
				'mp3'	=>	'audio/mpeg',
				'aif'	=>	'audio/x-aiff',
				'aiff'	=>	'audio/x-aiff',
				'aifc'	=>	'audio/x-aiff',
				'ram'	=>	'audio/x-pn-realaudio',
				'rm'	=>	'audio/x-pn-realaudio',
				'rpm'	=>	'audio/x-pn-realaudio-plugin',
				'ra'	=>	'audio/x-realaudio',
				'rv'	=>	'video/vnd.rn-realvideo',
				'wav'	=>	'audio/x-wav',
				'bmp'	=>	'image/bmp',
				'gif'	=>	'image/gif',
				'jpeg'	=>	array('image/jpeg', 'image/pjpeg'),
				'jpg'	=>	array('image/jpeg', 'image/pjpeg'),
				'jpe'	=>	array('image/jpeg', 'image/pjpeg'),
				'png'	=>	array('image/png',  'image/x-png'),
				'tiff'	=>	'image/tiff',
				'tif'	=>	'image/tiff',
				'css'	=>	'text/css',
				'html'	=>	'text/html',
				'htm'	=>	'text/html',
				'shtml'	=>	'text/html',
				'txt'	=>	'text/plain',
				'text'	=>	'text/plain',
				'log'	=>	array('text/plain', 'text/x-log'),
				'rtx'	=>	'text/richtext',
				'rtf'	=>	'text/rtf',
				'xml'	=>	'text/xml',
				'xsl'	=>	'text/xml',
				'mpeg'	=>	'video/mpeg',
				'mpg'	=>	'video/mpeg',
				'mpe'	=>	'video/mpeg',
				'qt'	=>	'video/quicktime',
				'mov'	=>	'video/quicktime',
				'avi'	=>	'video/x-msvideo',
				'movie'	=>	'video/x-sgi-movie',
				'doc'	=>	'application/msword',
				'word'	=>	array('application/msword', 'application/octet-stream'),
				'xl'	=>	'application/excel',
				'eml'	=>	'message/rfc822'
			);


?>

Agreed! Never trust on the extension. What if someone just rename a file with different extension and uploaded? I always check the mime types. I think the list of mime types from codeignighter posted above by PHPycho is so far good enough. If you find anything else then you can add them in the array.

Additionally,
consider a case when someone renames a .exe file to .jpg and uploads.
In such the server hangs when it tries to make a thumbnail from it.

So mime-types must be checked.

Thank you. I used $_FILES[‘uploadedfile’][‘type’] to get mime type and created the following code. Please tell me if it is ok:


$mimes = array(    'hqx'    =>    'application/mac-binhex40',

                'cpt'    =>    'application/mac-compactpro',

                'csv'    =>    array('text/x-comma-separated-values', 'text/comma-separated-values', 'application/octet-stream', 'application/vnd.ms-excel'),

                'bin'    =>    'application/macbinary',

                'dms'    =>    'application/octet-stream',

                'lha'    =>    'application/octet-stream',

                'lzh'    =>    'application/octet-stream',

                'exe'    =>    'application/octet-stream',

                'class'    =>    'application/octet-stream',

                'psd'    =>    'application/x-photoshop',

                'so'    =>    'application/octet-stream',

                'sea'    =>    'application/octet-stream',

                'dll'    =>    'application/octet-stream',

                'oda'    =>    'application/oda',

                'pdf'    =>    array('application/pdf', 'application/x-download'),

                'ai'    =>    'application/postscript',

                'eps'    =>    'application/postscript',

                'ps'    =>    'application/postscript',

                'smi'    =>    'application/smil',

                'smil'    =>    'application/smil',

                'mif'    =>    'application/vnd.mif',

                'xls'    =>    array('application/excel', 'application/vnd.ms-excel'),

                'ppt'    =>    'application/powerpoint',

                'wbxml'    =>    'application/wbxml',

                'wmlc'    =>    'application/wmlc',

                'dcr'    =>    'application/x-director',

                'dir'    =>    'application/x-director',

                'dxr'    =>    'application/x-director',

                'dvi'    =>    'application/x-dvi',

                'gtar'    =>    'application/x-gtar',

                'gz'    =>    'application/x-gzip',

                'php'    =>    'application/x-httpd-php',

                'php4'    =>    'application/x-httpd-php',

                'php3'    =>    'application/x-httpd-php',

                'phtml'    =>    'application/x-httpd-php',

                'phps'    =>    'application/x-httpd-php-source',

                'js'    =>    'application/x-javascript',

                'swf'    =>    'application/x-shockwave-flash',

                'sit'    =>    'application/x-stuffit',

                'tar'    =>    'application/x-tar',

                'tgz'    =>    'application/x-tar',

                'xhtml'    =>    'application/xhtml+xml',

                'xht'    =>    'application/xhtml+xml',

                'zip'    => array('application/x-zip', 'application/zip', 'application/x-zip-compressed'),

                'mid'    =>    'audio/midi',

                'midi'    =>    'audio/midi',

                'mpga'    =>    'audio/mpeg',

                'mp2'    =>    'audio/mpeg',

                'mp3'    =>    'audio/mpeg',

                'aif'    =>    'audio/x-aiff',

                'aiff'    =>    'audio/x-aiff',

                'aifc'    =>    'audio/x-aiff',

                'ram'    =>    'audio/x-pn-realaudio',

                'rm'    =>    'audio/x-pn-realaudio',

                'rpm'    =>    'audio/x-pn-realaudio-plugin',

                'ra'    =>    'audio/x-realaudio',

                'rv'    =>    'video/vnd.rn-realvideo',

                'wav'    =>    'audio/x-wav',

                'bmp'    =>    'image/bmp',

                'gif'    =>    'image/gif',

                'jpeg'    =>    array('image/jpeg', 'image/pjpeg'),

                'jpg'    =>    array('image/jpeg', 'image/pjpeg'),

                'jpe'    =>    array('image/jpeg', 'image/pjpeg'),

                'png'    =>    array('image/png',  'image/x-png'),

                'tiff'    =>    'image/tiff',

                'tif'    =>    'image/tiff',

                'css'    =>    'text/css',

                'html'    =>    'text/html',

                'htm'    =>    'text/html',

                'shtml'    =>    'text/html',

                'txt'    =>    'text/plain',

                'text'    =>    'text/plain',

                'log'    =>    array('text/plain', 'text/x-log'),

                'rtx'    =>    'text/richtext',

                'rtf'    =>    'text/rtf',

                'xml'    =>    'text/xml',

                'xsl'    =>    'text/xml',

                'mpeg'    =>    'video/mpeg',

                'mpg'    =>    'video/mpeg',

                'mpe'    =>    'video/mpeg',

                'qt'    =>    'video/quicktime',

                'mov'    =>    'video/quicktime',

                'avi'    =>    'video/x-msvideo',

                'movie'    =>    'video/x-sgi-movie',

                'doc'    =>    'application/msword',
   
                'word'    =>    array('application/msword', 'application/octet-stream'),

                'xl'    =>    'application/excel',

                'eml'    =>    'message/rfc822',
                
                // ms office
                
                'rtf' => 'application/rtf',
                'xls' => 'application/vnd.ms-excel',
                'ppt' => 'application/vnd.ms-powerpoint',
    
                // open office
                'odt' => 'application/vnd.oasis.opendocument.text',
                'ods' => 'application/vnd.oasis.opendocument.spreadsheet'                

            );

foreach ($mimes as $extension => $mime_type){
  if($_FILES['uploadedfile']['type']==$mime_type){
  $allowed=true; 
   
  }
}

if($allowed!=true){
  echo "Sorry, this file is not allowed";
}
else{
  //save the file
}

I don’t see you validation extensions, only mime types
Second, and this is most important - you are trusting the mime types as reported by the uploading browser. This is not at all secure. You should use a script on the server to determine the mime type. Look into php’s fileinfo extension: http://us3.php.net/fileinfo

and by the way you have arrays as values of some mime types but in you script you only assume the values to be strings.

This is a list of MIMEs that I found recently, it will probably cover all the mie types that you need. I done a file uploading in a class, it basically works like this.

  1. File upload to temporary files folder (probably would have anti-virus software set up to scan that folder for a live site), if no file detected give an error.
  2. Extension checker function receives an array of allowed file extensions for whatever the purpose of the upload (eg for an avatar allowed extensions would be GIF, JPEG (JPG) and PNG.
  3. Extension checker function calls a function to get the extension from the file name, if more than one extension eg js.jpg detected an error is given.
  4. Extension checked against array of allowed extensions, if no matches an error is given.
  5. MIME of the file is compared with what the mime should be for the extension (list of extensions and mimes stored in database table), if it doesn’t match up an error is given.
  6. The final check, is that its size does not exceed what is allowed for the purpose with an error being given.
  7. If it passes them checks then it gets copied to the relevant folder and its upload gets logged in the database.

And that’s how I handle file uploads, chances are that there are other checks that I can make of the file (which I’ll look into later and add if needed). When the process does give an error, it abandons the upload attempt.

One thing you will have to decide is if you are going to store the uploaded files in the database (BLOB fields) or in the file system. I went for storing it in the file system as in the event my site gets popular enough (probably very unlikely) I don’t won’t a few thousand 8KB avatars taking up space in the database.

That’s how I handle uploads also. The important thing is to check mime types on the server instead of trusting the browser with reporting mime types. Also the check for multiple extension is not necessary, it is technically legal to have a file names file.jpg.txt and it would mean that it’s a .txt file named file.jpg

Checking for double extensions in names if probably down to personal preference. The only legitimate double extension file name that I can think of that anyone would allow would be .user.js for GreaseMonkey scripts. I personally would not want to risk anyone trying to upload say a .js.jpg or php.jpg and risk having any code executed.

The file extension is actually irrelevant and the MIME type is only slightly relevant. In both cases it is possible to lie in both of those as to what the file is. It is the actual content of the file itself that determines the type of file it is.

Unfortunately some file types place their marker at the start of the file and others place it at the end meaning that it is possible to combine two files into one. For example you can add a zip file into the end of a jpg image and successfully upload it past any tests for file extension, MIME type and even file examination to check the jpg marker at the start of the file and when people try to view that file it will open the zip file instead.

[/LIST]

felgall, is there any check that I can add to them checks whether someone is trying to upload two files as one?

If you were to run through a series of tests looking for the actual markers within the file that indicate what sort of file it is then finding two markers in the appropriate locations within the same file would detect that.

The tests you already proposed should take care of stopping someone accidentally uploading a wrong file. You then have to consider how likely that it is that someone would deliberately try to upload a different file (presumably one containing a virus or similar) in determining how much effort it is worth making to perform the additional tests to detect that - particularly considering that it adds an overhead to the uploading of all the legitimate files.

I was just pointing out what is possible so that you would be aware that it would be possible for someone to deliberately upload a file that could get past your tests.

felgall, something like this?

That page does have some info on the subject. The people there who refer to the file content seem to assume that the identifier is always the first few bytes which is true for many file types but not all. With some it is the last few bytes.

For example JPG files all start with ÿØÿàNUL DLEJFIF (where the italics are codes for two single characters). Looking for “JFIF” in position 7 is probably the best way to check for a JPG file. A GIF file will have GIF as the first three characters.

A zip file has the content index at the end of the file and always ends with the same string of characters PK followed by ENQ ACK NUL NUL NUL NUL EOT NUL EOT NUL à NUL NUL NUL ¬‡ NUL NUL NUL NUL (where all those three character codes identify individual characters). So looking for PK in the 22nd and 21st last characters of the file may be a suitable test for that.

Opening a number of files of the same type in Notepad++ is probably the easiest way to see for yourself how the files are actually structured as it displays all the non-printable characters as two and three character codes on a black background.

funny

I mean, you can’t just trust the file extension alone, always check the mime type.

Let’s do some testing
I just renamed exe file to jpg, and uploaded it using different browsers
Chrome: image/jpeg
Opera: image/jpeg
Firefox: image/jpeg
CURL: admin/has-no-clue
And only IE said application/octet-stream
Of course, mime-type is the thing we ought to trust to.

consider a case when someone renames a .exe file to .jpg and uploads.
In such the server hangs when it tries to make a thumbnail from it.

oh, really?
mine weren’t hang. what I am doing wrong?

to prevent a user from uploading some executable file by simply naming it image.jpg

what’s the problem with this executable?

and you guys forgot these scaring tales about executable php code that can be easily incorporated into legal jpeg files.