How should I scan in documents?

#1

I have a bunch of travel receipts that I am sick and tired of looking at, and I would like to build some little system where I can pull up the receipts on my computer.

For starters this will likely be a spreadsheet, but later on maybe I can build something in PHP?

Would it make more sense to scan things in as an image (.png/.jpeg/.bmp) or as a PDF?

And why do you say that?

I would hate to scan in thousands of receipts, and then after they have been shredded, discover that I should have scanned them in some other way?! :face_with_hand_over_mouth:

1 Like
#2

There is a company that sells a system (hardware and software) that does all that plus more but I forget what it is called. That system includes OCR and custom processing to convert documents into text and arrange the data based on the original format. You should at least search for existing software. You should look for something that can process the images using OCR and then pick out the data from the specific locations on the form and arrange it into a database.

If you develop one yourself then the scanning of the images into a disk file would be highly independent of the other processing.

Are you making the data available over the internet for everyone to see, or anyone with login credentials? Probably not so PHP is not relevant. People often use spreadsheets because they are familiar with them but often databases are a better choice.

You can’t get a PDF without first scanning in an image.

#3

@SamuelCalifornia,

Thanks for the reply.

I don’t want to use OCR as I have never found that to be reliable.

Instead I just want to scan in old receipts, invoices, statements, etc so that I don’t have to keep all of that paper around.

My point about a spreadsheet versus PHP was that I am trying to build an application where I can manipulate all of this purchasing info so I know where my money goes.

In other words, I might want to simply click on a hyperlink in a spreadsheet to bring up a scanned copy of an old receipt, or maybe I might get fancy and build myself a private website where i could bring up the receipt in a web browser. Who knows, maybe even make it so the receipt could come up on a mobile device.

I would think an image is probably better for a web application than a PDF, yet PDF documents are so ubiquitous that they sem like an equally good choice.

Follow me?

What I do know, is that after I scan all of this paperwork and throw it out, I can’t easily convert a PDF to a PNG or visa-versa, so I better make sure that I scan things into the correct format the first time!!

#4

There are hundreds of possibilities for satisfying the requirement to manipulate all of this purchasing info so I know where my money goes, spreadsheets and PHP are just two.

You can do the same using a database. Applications can be developed that work in both Android and Windows. It does not need to be a website application.

That depends on what the PDF actually is. It can be a container of images, in which PDF adds little value. If it is more than just images then you need to either enter all the data yourself or process the data using OCR.

#5

This thread seems to be getting off track…

I am merely asking the Pros and Cons of scanning in paper receipts/statements/invoices in as an image versus as a PDF.

Above, I gave examples of how I might use these scanned images, but I am NOT asking for help on designing or coding anything.

Thanks.

#6

That is a question in the original post; one of three questions. If that was not intended to be a question then you should have made that clear.

As I said, you must scan the forms to make them PDFs. So the question does not make sense. And you were not clear about what the PDF would be; the most likely format of the PDF would require OCR.

If you want to be able to access the images from multiple devices then a good solution would be to put the images in a cloud, such as Google Drive, Microsoft OneDrive or DropBox. Then you could create a database in a cloud, such as Microsoft Azure or Amazon Web Services. The database could store the identification of the images.

#7

I would not use a spreadsheet because in my limited experience find they are limited to a page at a time.

Instead I would use efficient jpg images of the document and learn how to upload the jpg to your site. This is not straightforward and there are numerous forum topics on the subject requesting help.

After the image upload is perfected then learn how to create a simple database table with half a dozen fields for image file name, receipt date, group, category, etc

Next step is to learn how to use the form to upload images and store to the table.

Final step is to create searches to find and display images either separately, by date, category etc

As previously mentioned this is a basic CRUD system, Create, Read, Update and Delete and a search should reveal numerous free options,

#8

Maybe I don’t understand what a PDF is…

The mini scanner I bought will scan things in as a JPEG or a PDF. But it sounds like you are saying that if I scan in a paper receipt as a “PDF” it is really just an image that gets plopped into a PDF file?

If so, then I guess there is no benefit of scanning things in as a PDF right?

My point was that if I scanned receipts in as images (e.g. JPEG, PNG, BMP) then you could maybe more easily vie them on a mobile device?

My hope is to build a web applications where I can enter in the receipt details into a web form and it will be stored into a database, and in addition I will have the scanned recipts which you can view by clicking a link in the web app. Make sense?

#9

@John_Betong,

Thanks for the development tips, but what I am trying to figure out here is which format is safer when scanning in paperwork?

In my mind I have this idea that a PDF file format is better quality and more portable than scanning things in as images, but maybe I am wrong?

I guess I am hesitant to scan in a receipt as an image (e.g. JPEG) because maybe the quality won’t be good enough or it will be too pixelated to read the fine details of the receipt? (JPEG is better for photographs, right?)

I figured scanning things in as a PDF would be better, however I am not using OCR software and I just need a “snapshot” of the original receipt that is good enough uality that a year from now I can open it up directly or via this web app I hope to build and the quality will be as good as looking at the original paper receipt so nothing gets lots other than an annoying piece of paper!!

Follow me?

Follow?

#10

A PDF file is like a HTML file except PDF files often have images stored within the file whereas HTML files rarely do. HTML files typically link to images that are separate files and PDF files rarely do and I do not know if they can do that. PDF files typically are not used for entering and modifying data by the user. You can think of PDF files as word-processing files that are typically not modified.

Since a PDF file can have embedded images it is not clear what scanning a document in as a PDF would accomplish. As I said, if the document is not processed by OCR then the document in the PDF would just be images.

Maybe, but I do not understand why. You would likely have to scroll around to see the contents. If the data was converted to text then it could be formatted for more convenient viewing but that would be much work.

I have been programming for nearly half a century. Obviously the internet did not exist when I first learned programming. When I first learned programming, computer memory consisted of magnets. I say that to make it clear that I can imagine applications that are not web applications. You are stuck on the idea of using a web application. I am trying to tell you that you do not need to make a web application. You will benefit from the use of a cloud but it does not need to be a web application.

#11

@SamuelCalifornia,

You seem to misread everything I say… :slight_smile:

I never said anything about entering data into a PDF.

And actually, YES, you can create PDF forms that handle data entry, but I never brought that up…

So maybe my perception that a PDF is better quality is wrong.

I just think of image format as for photos, and PDFs for text. But if when I scan in a receipt, my scanner software simply takes a snapshot as an image and then sticks that in a PDF, then there is no real benefit of having a PDF, other than if you were email it or something it might be easier for others to open and read, but that doesn’t apply here.

Isn’t it easier to open up an image in a mobile device versus a PDF, or is there no difference?

Yes, I agree that either way you’d likely have to scroll.

I am not building this for mobile, but just trying to cover my bases in case in 5 years computers disappear and all of these scanned images/PDFs need to work on a tablet or even mobile phone. Follow me?

If I was on Windows, I would do this in MS Access, but I am not.

So it seems to me the easiest way to get an application to work like I want is to build a web application that runs locally on my computer.

The Cloud has absolutely nothing to do with any of this.

#12

Yes you definitely asked about scanning data into a PDF.

Actually no. Do you have experience writing both desktop applications and web applications? I do. And I wish that web applications were as easy to develop as desktop applications.

I think the cloud has the advantages that you want a web application for. You just do not understand how a cloud can help.

#13

Sort of.

I do not know the quality of the devices used to scan or photograph so cannot comment on which would be better.

Have you tried making samples and comparing results?

Perhaps post samples and ask others for their comments.

I do believe a lot of programming work is involved to view the images on a mobile because mobiles are mainly used to view either web pages hosted on a server or a mobile App’.

#14

Current smartphones are as powerful as computers that were called supercomputers a couple of decades ago. The main limitation of smartphones is their tiny display surface. Other than that, they are quite capable and the programming tools exist to support sophisticated applications. There are many applications available for viewing images in smartphones; even videos. VLC is available for Android. And within every Android system we have Linux underneath. I believe that the Apple iOS is built on a variation of Unix.