Back Reference

Hi

I’m working on getting a back reference working to pull all the monetary values from a string like this:

<tr><td colspan="2">
                            <strong>
                        Charity Revenue
                            </strong>
                        </td><td align="right">
                                <strong>
                            18,904,000&nbsp;&nbsp;
                                </strong>
                            </td><td align="right">

                                <strong>
                            14,807,000&nbsp;&nbsp;
                                </strong>
                            </td><td align="right">
                                <strong>
                            13,174,044&nbsp;&nbsp;
                                </strong>
                            </td></tr>

The back reference I have so far to pull these values is:

([0-9]*,?)*

And it does pull all three value strings but I’m getting a mental block on how to restrict it so it only pulls out such patterns between the opening text:

Charity Revenue

and the final:

</tr>

Can you help?

I’m not sure what backreferences have to do with anything in your code? And, I’m not sure what the outer * does. In fact, your regex happily matches nothing for any string that begins with a non-digit. Earliest match trumps greediness:

“hello 5000” =~ /(\d*)/;
print $1

–output:–
<nothing>

You really shouldn’t use regexes to parse HTML unless it’s something simple. Use an HTML parser instead. Then the question becomes: how do I select the <tr> I want? Once you have the <tr>, you can grab all the <strong> elements in an array. Then you can skip the first <strong> element and print out the values of the other ones.

Finally, if you are interested in getting an answer in a specific programming language, you need to state what language that is.