I have some long lines of data, and in the middle of each of these lines are occassional pseudo-fixed-length numbers. By that I mean that they will always occupy a fixed number of characters (e.g. 5 chars) but they may or may not be padded on the left with spaces (e.g. 2 spaces and 3 digits). For example, I might have some lines of data like this (The pound signs represent other alphanumeric data; I'm just highlighting the portion that I'm referring to):
Code:
#####12345#####
##### 123#####
##### 1234#####
I need a regex that will create a consistent back reference to just the number part of that and exclude the spaces. My first thought was, of course, to use something like this:
If that worked, the number would be put into backreference \1. However, it doesn't always work, since the pound signs represent other bits of alphanumeric data. I run into a problem when the data just beyond this number is also numeric--the expression above wouldn't be able to tell the difference. Example:
I would want to just match 123, but my expression above would spill into the next piece of data and give me 1234.
So what I really want to do is something more like this (this is only a pseudo-regular expression):
Code:
/\s*(\d{(5 - number of spaces matched)})/
Except I don't know if anything like that can be done. I even thought of compiling a grouping of a lot of different possibilities OR'ed together, but I'm not sure how to consistently retrieve the backreference there either. Something like this:
Code:
/(?:(\d{5})|\s(\d{4})|\s{2}(\d{3})|\s{3}(\d{2})|\s{4}(\d))/
That would match perfectly every time, but it also creates a new problem: the number would be stored in either \1, \2, \3, \4, or \5 depending on how many digits it was. I would like it to always be in the same place so that I can actually do something with it.
Let me know if any of this is unclear. Thanks in advance!
Bookmarks