SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Member
    Join Date
    Jul 2007
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    IE + regex + split() != fun

    Hi,

    I'm putting together something for a property selling web site to list the particulars of each property. I've loaded all the properties from the DB (*cough* large text file) into a huge 2D array and one column is a freeform(ish) field called "Remarks" that will contain the blurb.

    I've managed to convince the people that write the details to stick to a fairly rigid format to make it easier to parse out to the screen. So the general format of the field is:

    HEADING: description. HEADING: description leading to... HEADING: description. and so on

    where HEADING is all in capitals, immediately followed by a colon, and may contain spaces, hyphens, slashes or numbers. The 'description' can be pretty much any characters, or may be empty if there's a double heading such as FIRST FLOOR: BEDROOM 1:

    So I want to split this mammoth text field into its component "HEADING" and "description" parts such that I can iterate over them and apply formatting by shoving each bit into divs or table cells to separate them.

    The issue is slightly murkier because the HEADING is normally a room of the house and thus could be simple like "BEDROOM 3" or could be "W.C." or perhaps "SPARE/UTILITY ROOM". They also sometimes start a paragraph in the description field with "N.B:", which is infuriating but I guess I'll just have to treat it as a HEADING.

    The problem is JScript/IE (6, not sure about 7) and its handling of split and regexes. Firefox works a charm with this [horrible] expression:

    Code:
    regex = /(W\.C|N\.B|[A-Z1-9\s*\/-]+:)+(.*?)/g;
    remarkItems = properties[currPropIdx].remarks.split(regex);
    IE loses everything before each ':' and I just get the descriptions returned in remarkItems. From research on forums I think it's to do with the parentheses not being handled properly but I've hacked around with it in various forms for ages and cannot make it behave. I even tried exec() and failed.

    Can anyone with a bigger brain then me perhaps suggest a way of making this work more cross-browser, or point me in a new direction; perhaps using some other means of splitting this string up? It's probably monumentally easy and I'm just being stupid, but staring at the code, making adjustments and proclaiming "why!?" a lot hasn't got me very far

    Thanks in advance for any pointers.

    btw, here's some sample output of the field in question:

    THE ACCOMMODATION PROVIDES: Door to… ENTRANCE HALL: Carpet, stairs to first floor, radiator panel, door to… W.C: Low level w.c. LOUNGE: Approx 4.26m x 4.23m (14` x 13`9`). Carpet, gas fire as fitted, power points, radiator panel. KITCHEN: Approx 3.01m x 2.74m (9`9` x 9`). Wall and floor cupboards as fitted, stainless steel sink unit, `potterton` gas boiler for hot water and central heating, space for appliance, larder cupboard, power points. DINING ROOM: Carpet, radiator panel, power points, coved and artex ceiling. FIRST FLOOR: LANDING: Carpet, airing cupboard containing hot water cylinder, hatch to loft, door to… BEDROOM 1: Approx 4.26m into bay reducing to 3.74m in to bay x 3.44m (14` into bay reducing 12`3` into bay x 11`3`). Carpet, radiator panel, coved ceiling, recess cupboard, power points. BEDROOM 2: Approx 3.74m x 3.04m (12`3` x 10`). Carpet, radiator panel, recess cupboard, power points. BEDROOM 3: Approx 2.74m x 2.74m (9` x 9`). Carpet, radiator panel, fitted cupboards, power points. BATHROOM: Enamel bath and shower attachment, vanity wash hand basin, low level w.c, radiator panel, carpet. OUTSIDE: Lawn to front with off street parking for 2 cars, rear garden with lawn, brick sheds, shrubs, green house, fence borders. COUNCIL TAX: Band `C` VIEWING: By appointment with the Agents.

  2. #2
    I meant that to happen silver trophybronze trophy Raffles's Avatar
    Join Date
    Sep 2005
    Location
    Tanzania
    Posts
    4,662
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I can point you in a new direction to split this up: do it on the server side, with PHP or ASP or something else. It is NOT a good idea to be doing this with javascript. By the way, where are you getting this enormous lump of text from? Are you getting it from an Ajax request? Ideally you should use a database, then the people who will use this system will not have to stick to any format and it'll make everyone's lives easier. At the moment your system is a recipe for disaster.

  3. #3
    SitePoint Member
    Join Date
    Jul 2007
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the pointer Raffles.

    Quote Originally Posted by Raffles View Post
    It is NOT a good idea to be doing this with javascript...
    Hehehe, I hear ya. I've found that out the hard way

    Quote Originally Posted by Raffles View Post
    where are you getting this enormous lump of text from?
    A stupid text file of all things which I treat with the old-fashioned MICROSOFT JET OLE driver. The thing is, I'm doing this as a favour. The lump of inedible text has been spat out by a database in the client premises, uploaded via FTP to a 3rd party site as a CSV text file (it's the only output the system can manage as far as I've been told), and also uploaded to their ISP.

    This simple ISP offer web design services but without any interactivity because they have no skills in it and have no server side DB solution. I know, I know, don't ask... it wasn't my decision!

    Anyway, they write the rest of the site in HTML/CSS/ASP and leave the property page as a blank template. I was asked by the client to magic up a solution for taking the text file and making it into a clickable list of properties and show the relevant details on request. Foolishly I said it was possible, despite never coding ASP/VBSCRIPT before; I'm more comfortable with PHP. And never again shall I touch ASP, btw. Nasty taste in the mouth that took me back to the days of coding my Sinclair Spectrum

    So the ASP code grabs the text file as quickly as possible, does the JET thing on it and releases it so that someone else can read it (messy). From then on I populate the huge JS array with all the details of every property, show a short list of properties and then onclick, manipulate the DOM and drag the requested details from the array. If I could go back to the server for the data I would, but I can't query the text file very often because the ISP run M$ and only one person can read the file at a time... sheesh, talk about obstacles!

    As you can tell by now, this really is making the most of a truly awful, inflexible and error-prone system and trying to make some semblance of functionality out of it. I'm convinced the property industry in this country is still in the 1970s when it comes to IT.

    Having said all that, you may be onto something; if I can somehow coerce ASP to split the string up as it comes out of the file then I might be able to store it in the JS array already pre-split. But as I don't know how many HEADING/descriptions there are in each wodge of text it'll need to be the last set of indices in the array so I get a "ragged-right" array that I then parse from a known index until the end. Hmmmmm. Plenty of scope for 'index out of bounds' errors, methinks...

    In the meantime, ugly as it is, if anyone can come up with anything that would avoid me dirtying my hands in ASP and recoding the existing array format to shuffle things around on the page while trying to get ASP to split reliably, I'm all eyes.

    Many thanks.

    --
    Afterthought: or maybe I can split it in ASP and reconstitute it as a single string but with a more useful delimiter, like ||. That gets round the ragged-right array but am I any better off?
    Last edited by MrBloke; Jul 6, 2007 at 04:36. Reason: Afterthought

  4. #4
    I meant that to happen silver trophybronze trophy Raffles's Avatar
    Join Date
    Sep 2005
    Location
    Tanzania
    Posts
    4,662
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Wow. And you're doing this for FREE? Crazy.

    Anyway, your javascript problem arises from the fact that split() in IE doesn't capture the stuff in grouping parentheses, i.e. it doesn't allow you to keep the splitting delimiter, like you've found. I think you'll just have to resort to a more complex parsing system. Perhaps try splitting only with a colon and then analysing each piece returned. For instance, if the piece is in all capitals (use regex for that) then you know it's a heading. If it's all spaces, or two adjacent pieces are all capitals, you know you've got a case of an empty description. If the piece contains lowercase letters, you know it's a description for the heading in the previous piece.

    I think this would be a lot easier if you could do this line by line. If the people you're doing this could put each heading and its description one on each line, you could just split() based on newlines and then loop through these bits and split each one on a colon. You then know that the first of these bits is the heading. The remaining bit(s) is the description and if there's more than one bit you know the description contains a colon, so you join() these bits.

  5. #5
    SitePoint Member
    Join Date
    Jul 2007
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Raffles View Post
    Perhaps try splitting only with a colon and then analysing each piece returned.
    Hmmm, interesting. Doing it piece by piece might work. Maybe I was just trying to be too clever doing it all in one regex.

    Quote Originally Posted by Raffles View Post
    If the people you're doing this could put each heading and its description one on each line, you could just split() based on newlines
    *sigh* That would be text-based parser heaven: a useful delimiter! From what I can gather with the pigeon-tech talk I have with the guy who uses the client database, it's super primitive. They type stuff into a field on a screen - including the newlines - and hit save, but between inserting into the database, or being output into the CSV file they get stripped out.

    Thanks for your help. I'll probably go with the piece-by-piece approach if it's not too slow, but if anyone has any other suggestions, all things are considered.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •