Nothing personal, but that regex looks a bit scary to me.
The goal of a good regex is to not only match what you want it to match, but to also not match what you want it to not match.
For me it often helps to indent groupings and “translate” when I’m analyzing them. eg.
/
(\/ a forward slash
([\w-]+) one or more hyphen or word characters
)? zero or one of the previous
(
(\/ a forward slash
([a-z]+ one or more alpha characters
(\. a dot
([a-z]+) one or more alpha characters
)
)? zero or one of the previous
)? zero or one of the previous
)? zero or one of the previous
$/gi
All those "zero or one"s don’t make me feel comfortable.
A key to crafting good regex is to have a thorough understanding of what patterns you will be working with. Not all data sets will have enough similarities between its members to allow for an easy regex pattern, indeed, for some no pattern at all may be possible.
Maybe some can read regex directly, but for me, just as I often do to analyze, I also often “translate” when crafting a pattern.
My first step is to get what is hopefully all possible examples that might occur, then rough list out the “must have”, “always has”, “will never have” etc.
For example, your dataset as posted is
/jquery.js
/login
/login/file.css
/login/
login/random/path/to/file.css
- may or may not begin with a slash
- always followed by alpha chars
- may be followed by either a dot or a slash
- always ending with either a slash or an alpha char
Then I look for patterns.
“between” slashes are always only alpha chars. That can be a character set [a-z]
How many? I’m guessing that at least one would always be there, so this would probably be OK unless you need to be more precise. [a-z]+
The strings with file extennsions are the only ones that have a dot, and they are always only the ending group. They are also always only alpha chars followed by a dot followed by alpha chars. So that can be a grouped like ([a-z]+\.[a-z]+)
and because they are always only last ([a-z]+\.[a-z]+)$
When something may or may not be there, the choices are “zero or one” (a ?) and “zero or more” (a *)
So a preliminary (but not yet good enough) pattern might be
/(\/)?([a-z]+|[a-z]+\.[a-z]+)$/
This would match
/jquery.js
/login
but not match
/login/file.css
/login/
login/random/path/to/file.css
The possible ending slash that is never preceding by a string containing a dot can be matched by changing the pattern to
/(\/)?([a-z]+(\/)?|[a-z]+\.[a-z]+)$/
So the pattern will now match
/jquery.js
/login
/login/
but not match
/login/file.css
login/random/path/to/file.css
The ones that are left needing to be matched are one or more sub-folders. When they are there, they are always only alpha chars enclosed by slashes.
Matching the alpha chars is straight forward, but what to do about the slashes?
If after the alpha chars, what about the ones before, and vice-versus?
Perhaps the easiest would be to modify this part of the pattern
[a-z]+(\/)?
Making that “zero or more” should do the trick. So
/(\/)?(([a-z]+(\/)?)*|[a-z]+\.[a-z]+)$/
Analyzing, this translates to
/
(\/)? beginning with zero or one slash
(
([a-z]+ one or more alphas
(\/)? zero or one slash
)* zero or more of the previous
| or
[a-z]+\.[a-z]+ one or more alphas followed by a dot followed by one or more alphas
)
$/
NOTE
Still not “done” but should be enough to give you the idea of how you could craft a regex. If you can’t figure out what’s wrong with it after a few tries, post back.