There is a company that sells a system (hardware and software) that does all that plus more but I forget what it is called. That system includes OCR and custom processing to convert documents into text and arrange the data based on the original format. You should at least search for existing software. You should look for something that can process the images using OCR and then pick out the data from the specific locations on the form and arrange it into a database.
If you develop one yourself then the scanning of the images into a disk file would be highly independent of the other processing.
Are you making the data available over the internet for everyone to see, or anyone with login credentials? Probably not so PHP is not relevant. People often use spreadsheets because they are familiar with them but often databases are a better choice.
You can’t get a PDF without first scanning in an image.
I don’t want to use OCR as I have never found that to be reliable.
Instead I just want to scan in old receipts, invoices, statements, etc so that I don’t have to keep all of that paper around.
My point about a spreadsheet versus PHP was that I am trying to build an application where I can manipulate all of this purchasing info so I know where my money goes.
In other words, I might want to simply click on a hyperlink in a spreadsheet to bring up a scanned copy of an old receipt, or maybe I might get fancy and build myself a private website where i could bring up the receipt in a web browser. Who knows, maybe even make it so the receipt could come up on a mobile device.
I would think an image is probably better for a web application than a PDF, yet PDF documents are so ubiquitous that they sem like an equally good choice.
What I do know, is that after I scan all of this paperwork and throw it out, I can’t easily convert a PDF to a PNG or visa-versa, so I better make sure that I scan things into the correct format the first time!!
There are hundreds of possibilities for satisfying the requirement to manipulate all of this purchasing info so I know where my money goes, spreadsheets and PHP are just two.
You can do the same using a database. Applications can be developed that work in both Android and Windows. It does not need to be a website application.
That depends on what the PDF actually is. It can be a container of images, in which PDF adds little value. If it is more than just images then you need to either enter all the data yourself or process the data using OCR.
That is a question in the original post; one of three questions. If that was not intended to be a question then you should have made that clear.
As I said, you must scan the forms to make them PDFs. So the question does not make sense. And you were not clear about what the PDF would be; the most likely format of the PDF would require OCR.
If you want to be able to access the images from multiple devices then a good solution would be to put the images in a cloud, such as Google Drive, Microsoft OneDrive or DropBox. Then you could create a database in a cloud, such as Microsoft Azure or Amazon Web Services. The database could store the identification of the images.
The mini scanner I bought will scan things in as a JPEG or a PDF. But it sounds like you are saying that if I scan in a paper receipt as a “PDF” it is really just an image that gets plopped into a PDF file?
If so, then I guess there is no benefit of scanning things in as a PDF right?
My point was that if I scanned receipts in as images (e.g. JPEG, PNG, BMP) then you could maybe more easily vie them on a mobile device?
My hope is to build a web applications where I can enter in the receipt details into a web form and it will be stored into a database, and in addition I will have the scanned recipts which you can view by clicking a link in the web app. Make sense?
Thanks for the development tips, but what I am trying to figure out here is which format is safer when scanning in paperwork?
In my mind I have this idea that a PDF file format is better quality and more portable than scanning things in as images, but maybe I am wrong?
I guess I am hesitant to scan in a receipt as an image (e.g. JPEG) because maybe the quality won’t be good enough or it will be too pixelated to read the fine details of the receipt? (JPEG is better for photographs, right?)
I figured scanning things in as a PDF would be better, however I am not using OCR software and I just need a “snapshot” of the original receipt that is good enough uality that a year from now I can open it up directly or via this web app I hope to build and the quality will be as good as looking at the original paper receipt so nothing gets lots other than an annoying piece of paper!!
A PDF file is like a HTML file except PDF files often have images stored within the file whereas HTML files rarely do. HTML files typically link to images that are separate files and PDF files rarely do and I do not know if they can do that. PDF files typically are not used for entering and modifying data by the user. You can think of PDF files as word-processing files that are typically not modified.
Since a PDF file can have embedded images it is not clear what scanning a document in as a PDF would accomplish. As I said, if the document is not processed by OCR then the document in the PDF would just be images.
Maybe, but I do not understand why. You would likely have to scroll around to see the contents. If the data was converted to text then it could be formatted for more convenient viewing but that would be much work.
I have been programming for nearly half a century. Obviously the internet did not exist when I first learned programming. When I first learned programming, computer memory consisted of magnets. I say that to make it clear that I can imagine applications that are not web applications. You are stuck on the idea of using a web application. I am trying to tell you that you do not need to make a web application. You will benefit from the use of a cloud but it does not need to be a web application.
I never said anything about entering data into a PDF.
And actually, YES, you can create PDF forms that handle data entry, but I never brought that up…
So maybe my perception that a PDF is better quality is wrong.
I just think of image format as for photos, and PDFs for text. But if when I scan in a receipt, my scanner software simply takes a snapshot as an image and then sticks that in a PDF, then there is no real benefit of having a PDF, other than if you were email it or something it might be easier for others to open and read, but that doesn’t apply here.
Isn’t it easier to open up an image in a mobile device versus a PDF, or is there no difference?
Yes, I agree that either way you’d likely have to scroll.
I am not building this for mobile, but just trying to cover my bases in case in 5 years computers disappear and all of these scanned images/PDFs need to work on a tablet or even mobile phone. Follow me?
If I was on Windows, I would do this in MS Access, but I am not.
So it seems to me the easiest way to get an application to work like I want is to build a web application that runs locally on my computer.
The Cloud has absolutely nothing to do with any of this.
Current smartphones are as powerful as computers that were called supercomputers a couple of decades ago. The main limitation of smartphones is their tiny display surface. Other than that, they are quite capable and the programming tools exist to support sophisticated applications. There are many applications available for viewing images in smartphones; even videos. VLC is available for Android. And within every Android system we have Linux underneath. I believe that the Apple iOS is built on a variation of Unix.