I am planning to upload all my original presentations in ppt’s and assignments in pdf format on my educational website. The content will be original as it will be made by me. So what would be the best way to put them online considering that i want to increase some content on my site. The files are having absolutely no security and copy pasting is allowed withing it. What would be the best way to put them online: Attach to a post with little content or copy paste the content as it is without caring about the user experience(offcourse i dont want to do this).
Is there any drupal module available for this purpose ??
Are those PDF’s that are viewable online, with copy-pasting enabled, are indexed by google ?
Also, does google apply some sort of OCR for the documents scanned as images ?? If yes, does it applies to attachments??
Google does index PDF and MS Office documents. If they are suitably tagged with proper formatting, styles and alt text for images, that will help. But just as with web pages, they will only index text content - they won’t OCR any scanned text. Why would you include scanned text in a PDF anyway? It will look awful, massively increase the file size and will make the file far less usable and accessible.
If the PDFs and presentations are linked to, then they can get spidered. Do a search for sample powerpoint presentations to see that .ppts are indexed. You have probably found PDFs indexed on your travels.
With PDFs you lose the markup (<h1>s etc that you can use on a web page), and I suspect that .ppts will be the same. The text does get indexed though. If you want to spend the time, you could create html versions of the PDFs and .ppts with good markup and exclude the search engines from spidering the PDFs. You could link to the PDFs from the web page. It depends how you want to set it up.
It seems that search engines like Google are getting better and better at indexing or actually crawling newer types of files but the good old recommendations are still valid which say text are always preferred. If the documents are truly important to you and that you plan receiving web traffic through people finding them while browsing the net, why not shifting towards the good old HTML pages? You could experience strong traffic results after a while through link building.
I have a lot of research papers and question papers which i scan and put online. Thats where i was expecting OCR.
Though they are scanned as images, i do put in proper meta data while creating PDF’s, like author name and keywords. So any is that any better ??