RegEx: Two of (Newline + Whitespace) + alphanumeric. Can't get it!

johnywhy · December 3, 2022, 11:44pm

I thought one of these would work, but none of them work. Starting with my favorite:

// 2 or more ( newline + 0 or more (spaces or tabs) )
// + any alphanumeric
const gRecordDelim = /(\n[ \t]*){2,}(?=\w)/g;

// const gRecordDelim = /(\n[ \t]*\n)/g;
// const gRecordDelim = /(\n[ \t]*){2,}(?=[A-Za-z0-9])/g;
// const gRecordDelim = /(\n[ \t]*){2,}\n/g;
// const gRecordDelim = /\n{2,}/g;
// const gRecordDelim = /([ \t]*\n[ \t]*){2,}/g;

It’s a split pattern in javascript:

const recs = text.split(gRecordDelim);
recs.forEach((rec) => console.log("RECORD: " + rec))

Here’s my data.

const text = `
Orblie Rapitulnik
orbliek.jpg
orbliek.com
There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable.

    Qang Le Toenthal
    Qang.jpg
    Qangle.io
    Contrary to popular belief, Lorem Ipsum is not simply random text.`
  


		
		Sivan Gomez
		sivan-gomez-crop.jpg
		sigcolors.com
		It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.

I’m getting the splits at “Qang” and “Sivan” as expected, and multiple blank newlines are counted as one blank newline, as expected, but it’s splitting at the empty lines, which i don’t want.

also posted to

m_hutley · December 4, 2022, 12:46am

This reads as:
The Pattern: a Newline, followed by any number of spaces or tabs.
The pattern twice in a row.
Lookahead Assert: Any word character.

First of all, If yourItems are separated by an empty line… split the string on the newline character occuring twice.

Second of all, your records from your data follow a regular pattern: A blank line, followed by 4 lines of data.

Split the sting on the line breaks, and parse by number:

RecordNum*5 : Blank line
RecordNum*5+1 : Title
RecordNum*5+2: Image
RecordNum*5+3: URL
RecordNum*5+4: Description

johnywhy · December 4, 2022, 1:42am

Yes! That’s what i meant it to be. And i believe that should split my data as desired. But it’s not working.

Thanks for your suggestion, but your solution doesn’t seem to handle arbitrary white space after the newline. My question was meant to show arbitrary white space. I edited it.

The blank lines may be 1 blank, 2 blank, or anything higher.

rpg_digital · December 4, 2022, 2:01am

Late here, so sorry if I have got the wrong end of the stick. Are you saying you don’t want the blank line included in the resulting split array. e.g. just two items not three.

The capture will be captured in the split, which can be useful

How about using a non capturing group instead?

/(?:\n[ \t]*){2,}(?=\w)/g

johnywhy · December 4, 2022, 2:49am

@rpg_digital , Your solution works perfectly. Great!

Meaning? The matched text will be included in the output? But here we’re talking about a split. I thought that means everything in the pattern will be treated as a delimiter, which you would expect to be removed in a split. But you mean we have to say so explicitly?

Your regex appears identical to mine, except for the extra ?:
What’s that?

i plugged your regex into this:

const gRecordDelim =  /(?:\n[ \t]*){2,}(?=\w)/g
const recs = text.split(gRecordDelim);
recs.forEach((item) => console.log("RECORD: " + item))

i got this:

RECORD: 
Orblie Rapitulnik
orbliek-crop-2.jpg
orbliek.com
Contrary to popular belief, Lorem Ipsum is not simply random text. 
RECORD: Qang Le Schoenthal
    Qang.jpg
    Qangle.webflow.io
    There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. 
RECORD: Sivan Gomez
		sivan-gomez-crop.jpg
		sigcolors.com
		It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout.

system · March 5, 2023, 10:28am

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Javascript Regex making Dot match new lines JavaScript	4	10733	October 8, 2014
Javascript regex help JavaScript	4	1056	October 8, 2014
Regex allow alphanumeric + spaces PHP	13	12704	October 8, 2014
Preg_Split syntax to keep from matching empty space delimiters PHP	4	4597	December 6, 2014
Need help with regex (spaces) JavaScript	15	2047	June 26, 2011

RegEx: Two of (Newline + Whitespace) + alphanumeric. Can't get it!

Related topics