PHP and MetaData (File Properties)

Hi

I need a help regarding how to access/edit a metadata of a file through PHP. Now there are files like MP3 / AVI / DOC / Excel etc etc,

I want to read / write the file properties i.e. description etc.
please guide

Metadata is stored differently for different file types, so there’s not one simple routine to get the metadata for every file type.

For media files, you can start with getID3().

For Excel you can use libraries like PHPExcel; but as sk89q has said, every file type is different and there is no single solution

Great thanks a lot !

yes I have in my mind that each file type will have some different method to do that. Following are the file types for which I want to read/write the Meta Data

  1. AVI
  2. MP3
  3. PDF
  4. Excel
  5. Word

Please guide me if ID3 can be useful for AVI too.
Looking forward for further guidance !

Zeeshan

Have you thought about extending the SPL file info object?

Yes i can extend, but the thing is that, i need to know how i can get the meta data information of those file types i mentioned.

Something you can do, is find documentation such as RFC’s and work out how these files are structured. Then you can write your own classes in PHP to read out the data you want. This can take a while, and documentation isn’t always available for different file types, but you’ll learn a lot from doing this - knowledge which may come in quite handy later on.

getID3 can parse AVI too. Just check the big list that on the getID3 page.

great, thanks,

but what about rest of the file types ?

PDF, DOC, RTF, XLS etc etc

:wink:

can u please help me finding the RFC’s ??

What have you found so far?

Search for:

format specification”
or
format format”
Example:
“pdf specification”

If PHP is the only language you know, then this may be very challenging. Some formats are also very complicated, so this also may prove to be very time consuming. Just a warning.

I tried finding some RFC through google, like this

PHP PDF RFC, PDF RFC, DOC RFC, but didnt find something good so far. It will be great if you can provide some help if you can !

And yes, unfortunately, I only know PHP and Javascript, what I had in my mind is that there might be some ready PHP class to access the DOC / Excel and other documents metadata. I didnt know that I am asking some thing that is out of this world ! Lolz

Try searching for “specification” or “format.” Most formats don’t start with a RFC (request-for-comments) that you can find.

It’s not out of this world, but you’re going to have to work low level data types like 16-bit unsigned integers, null-padded strings, etc. If you’re not familiar with them, then it might be hard for you to wrap your head around it. Since PHP is high level, it’s also not particular easy to work with them. In C, you could just write a struct, but in PHP you will have to work with pack(), [url=http://php.net/unpack]unpack(), and any functions you might need to transform the bytes into a high level PHP data type.

okay thanks a lot for your great help !

Try this site: http://filext.com/

I have used it to get some of the markers in images in order to get the real file type, because an image could be a jpg and named for ex. filename.ext

Oh, and also there’s a cool function in PHP called unpack which can retrieve data specified by markers.

When I first got into understanding, interrogating and dismantling file structures, it was a real buzz. It was a whole new world of discovery. Anyway, here are some links…

PDF
http://www.adobe.com/devnet/pdf/pdf_reference.html

AVI - Look under external links at the bottom.

MP3 - Same as tip as above.
http://en.wikipedia.org/wiki/MP3

Word and Excel
…are a bit trickier as they’re constantly changing, and last I checked, Microsoft doesn’t like to be helpful in making the specification public. However, as of Office 2007, the default file type for many of the most common office applications such as word and excel, are Office Open XML (OOXML). Refer to the wikipedia article for more information: http://en.wikipedia.org/wiki/Office_Open_XML. So if you can, don’t offer support for word and excel documents prior to 2007.

Common misconception. MS have made the details of their proprietary formats public: visit http://msdn.microsoft.com/en-us/library/cc313118.aspx to find downloadable copies - they make dry reading, but are very detailed.

Or for XLS/XLSX metadata, see my response #2

More info about jpeg format @ http://computer.forensikblog.de/en/2006/10/reading_the_jpeg_quantization_table.html#more. The 010 editor that is mentioned in this post also has some templates for avi,mp3,…

The guy behind jpegsnoop also has a lot of info on jpeg

Also PHP metadata jpeg toolkit. You can check the code to see how its done