RegEx: Two of (Newline + Whitespace) + alphanumeric. Can't get it!

I thought one of these would work, but none of them work. Starting with my favorite:

// 2 or more ( newline + 0 or more (spaces or tabs) )
// + any alphanumeric
const gRecordDelim = /(\n[ \t]*){2,}(?=\w)/g;

// const gRecordDelim = /(\n[ \t]*\n)/g;
// const gRecordDelim = /(\n[ \t]*){2,}(?=[A-Za-z0-9])/g;
// const gRecordDelim = /(\n[ \t]*){2,}\n/g;
// const gRecordDelim = /\n{2,}/g;
// const gRecordDelim = /([ \t]*\n[ \t]*){2,}/g;

It’s a split pattern in javascript:

const recs = text.split(gRecordDelim);
recs.forEach((rec) => console.log("RECORD: " + rec))

Here’s my data.

const text = `
Orblie Rapitulnik
orbliek.jpg
orbliek.com
There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable.

    Qang Le Toenthal
    Qang.jpg
    Qangle.io
    Contrary to popular belief, Lorem Ipsum is not simply random text.`
  


		
		Sivan Gomez
		sivan-gomez-crop.jpg
		sigcolors.com
		It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.

I’m getting the splits at “Qang” and “Sivan” as expected, and multiple blank newlines are counted as one blank newline, as expected, but it’s splitting at the empty lines, which i don’t want.

image

also posted to

This reads as:
The Pattern: a Newline, followed by any number of spaces or tabs.
The pattern twice in a row.
Lookahead Assert: Any word character.

First of all, If yourItems are separated by an empty line… split the string on the newline character occuring twice.

Second of all, your records from your data follow a regular pattern: A blank line, followed by 4 lines of data.

Split the sting on the line breaks, and parse by number:

RecordNum*5 : Blank line
RecordNum*5+1 : Title
RecordNum*5+2: Image
RecordNum*5+3: URL
RecordNum*5+4: Description
1 Like

Yes! That’s what i meant it to be. And i believe that should split my data as desired. But it’s not working.

Thanks for your suggestion, but your solution doesn’t seem to handle arbitrary white space after the newline. My question was meant to show arbitrary white space. I edited it.

The blank lines may be 1 blank, 2 blank, or anything higher.

Late here, so sorry if I have got the wrong end of the stick. Are you saying you don’t want the blank line included in the resulting split array. e.g. just two items not three.

The capture will be captured in the split, which can be useful

How about using a non capturing group instead?

/(?:\n[ \t]*){2,}(?=\w)/g
1 Like

@rpg_digital , Your solution works perfectly. Great!

Meaning? The matched text will be included in the output? But here we’re talking about a split. I thought that means everything in the pattern will be treated as a delimiter, which you would expect to be removed in a split. But you mean we have to say so explicitly?

Your regex appears identical to mine, except for the extra ?:
What’s that?

i plugged your regex into this:

const gRecordDelim =  /(?:\n[ \t]*){2,}(?=\w)/g
const recs = text.split(gRecordDelim);
recs.forEach((item) => console.log("RECORD: " + item))

i got this:

RECORD: 
Orblie Rapitulnik
orbliek-crop-2.jpg
orbliek.com
Contrary to popular belief, Lorem Ipsum is not simply random text. 
RECORD: Qang Le Schoenthal
    Qang.jpg
    Qangle.webflow.io
    There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. 
RECORD: Sivan Gomez
		sivan-gomez-crop.jpg
		sigcolors.com
		It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.
1 Like