IronOcr coordinates

So i’m trying to use IronOcr in .NET 5, and i’m having trouble understanding the coordinates it gives me.

Context: IronOcr is a Tesseract library. I’m using it to scan an image. The image is 688x688. I’m using the following code:

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Drawing;
using IronOcr;
using System.Text.RegularExpressions;

namespace ScratchConsole
{
    class Program
    {
        static void Main()
        {
            string file = @"a_valid_file_path.jpg";
            Image saveme = Image.FromFile(file);
            Graphics graphic = Graphics.FromImage(saveme); //This is for later manipulation of the original picture.
            var tess = new IronTesseract();
            using var Input = new OcrInput();
            var ContentArea = new Rectangle() { X = 300, Y = 200, Height = 300, Width = 388 };
            Input.AddImage(saveme, ContentArea);
            tess.Configuration.TesseractVariables["classify_font_name"] = "Arial";
            var res = tess.Read(Input);
            string text = res.Text;
            Console.WriteLine(text);
        }
    }
}

Note: I am limiting the scan area of the OCR with the rectangle, as shown in their example.
When i break at the Console.Writeline and inspect res, i can find the block that it found the line of text in (which is Blocks[1], or the second block). The Location of that block is:

res.Blocks[1].Location
{X = 1162 Y = 817 Width = 336 Height = 36}
    Bottom: 853
    Height: 36
    IsEmpty: false
    Left: 1162
    Location: {X = 1162 Y = 817}
    Right: 1498
    Size: {Width = 336 Height = 36}
    Top: 817
    Width: 336
    X: 1162
    Y: 817
    height: 36
    width: 336
    x: 1162
    y: 817

the dimensions of this dont look right, and neither do the X/Y coordinates. The original image is only 688x688, so how is IronOCR finding a bounding box starting at 1162,817? The original text, if i bound it in MS Paint, is roughly 150x20, and is somewhere around 495, 347 as a top-left coordinate…

Am I missing something obvious? Is there some scaling factor somewhere that I can use to translate back to my original image? All i’m trying to do is create a bounding box around a regex-defined phrase (well, either one of a pair of phrases) in order to erase it from the original image…

What is the DPI of the image? Can you set the image to be 300 DPI, recalculate your expected values and see if they then are more accurately reported in your code?

Sounds like it may be scaling the image up internally for increasing accuracy. I noticed that the ratio between 495 and 1162 is almost identical to the ratio between 347 and 817… roughly 0.425 or 42-43%. I would have then expected this to apply to the detected width and height but that appears to be slightly off (making your original bounding box 143 x 15 when you said it was 150x20). So I am not sure. I think adjusting the DPI may cause the library to alter the image scaling a bit and might make the numbers more accurate. This is because the accuracy of the library differs depending on the size and quality of the image from what I gather.

Unfortunately the tool that’s rendering these images is doing so at 96dpi. (They’re some basic CD labels, text only, no colors, simple fonts (everything’s in Arial, either bold or normal, and 10-12pt)…

Aha!

We were both suspecting a scaling system, and you were right, in a sort of way. The OcrInput has a TargetDPI parameter, that when inspected, was riding with a value of 225. 96/225 = 42 2/3%.

Adding Input.TargetDPI = 96 forced the system to keep the coordinate system intact.

Noice! I will say that looking at the documents, they did say that the library is tuned to have the 225 DPI by default and that adjusting it to something else may cause a bit of a performance reduction. I am going to guess that will only really matter for very large images, but something you should probably take note of going forward.

Glad it worked out for you in the end. :slight_smile:

1 Like