Wasn’t aware of hostname, that is obviously preferable. I see it isn’t supported in Opera though.
To the Liagapi555, the negative square bracket notation isn’t a group match pattern
So [^abc]
means do not match those characters e.g. a ‘c’, an ‘a’ or a ‘b’. It doesn’t mean do not match ‘abc’
The http/https pattern isn’t needed twice with an or ‘I’ between them
https? would do the trick for that bit, with the question mark making the ‘s’ optional, 0 or 1 times
The reason deleting that chunk from your regex still works is because you haven’t set a start’s with ^
e.g. ^https?:\/{2}
If you want to experiment with regular expressions, regex101.com is very handy. It will even tell you the number of steps taken in your matches
This is a good site for learning about regex’s
https://www.regular-expressions.info/tutorial.html
Just playing around in regex, this was my attempt. Note I may well have not considered all edge cases, so it could possibly fail.
^https?:\/{2}(?:w{3}\.)?([^\/]+)\/?([^\/]+)?
Breakdown:
Start with http or https followed by 2 forward slashes. {2} indicates the number of characters
^https?:\/{2}
Then an optional group pattern of www. (?: ) is a non capturing group and the question mark after that again makes it optional
(?:w{3}\.)?
Then match everything up to an optional forward slash. This time in a capturing group using a negative character set [^\/]
(anything that is not a forward slash)
([^\/]+)\/?
Lastly an optional negative character set inside a second capturing group, that will match anything that isn’t a forward slash (i.e. up to a possible next forward slash)
([^\/]+)?
As I say this may well fail. What if /test instead is /test.php?firstname=name
We could change the last bit to the following to exclude dots as well
([^\/.]+)?
I think regex’s are great, but the last bit does illustrate what Paul says, that if there is native tool like hostname, that is probably the better route.