Well I know this may not be the best subforum to post a regular expression question, but I dont see anywhere that it will fit so I have to do it here. The fact is that I want to write several different regular expressions to match the following possible 5 types of URL routes on my site:
Here the area can be the main site, admin control panel, or clan/group control panel. The controller, action, params and page numbers should be very self-explanatory if you are familiar with MVC and pretty URLs. For the page number pattern, it should always match whenever it seems the keyword āpage-ā, and in this case it will match the last type of URL always.
The question is, how do I write the regular expression that will match exactly such URL routes? I am still at a beginner level to regular expression, and its syntax confuses me a lot. Thanks.
Are you writing your own router? Is it one that comes as part of a framework? I can recommend FastRoute to save you a lot of trouble, and you wonāt build one faster. Or is this just an exercise in RegExp?
Why not adopt the Php Framework approach and rewrite everything to index.php?
.htaccess
<IfModule mod_rewrite.c>
RewriteEngine On
# !IMPORTANT! Set your RewriteBase here and don't forget trailing and leading
# slashes.
# If your page resides at
# http://www.example.com/mypage/test1
# then use
# RewriteBase /mypage/test1/
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?/$1 [L]
</IfModule>
<IfModule !mod_rewrite.c>
# If we don't have mod_rewrite installed, all 404's
# can be sent to index.php, and everything works as normal.
# Submitted by: ElliotHaughin
ErrorDocument 404 /index.php
Yes, I can give you an example for the five possible routes:
/site, /admin, /(in this case no area is provided, it defaults to /site)
/site/account, /admin/user,
/site/pm/create, /admin/user/create
/site/pm/read/1, /admin/user/edit/2, /site/vm/view/1/2(in the last case, it has two parameters)
/site/pm/page-5, /admin/user/page-6
In these cases, site and admin are āareasā; account, user, pm and vm are controllers; create, read and edit are actions; the numbers 1, 2 are parameters(note the last example in fourth route has 2 parameters, its not common but it can happen); page-5 and page-6 are page numbers(identified by keyword or prefix page-).
The routes are listed by increasing priority. If the page- keyword is found, it matches fifth route. otherwise, it will match fourth route, third route, second route and first route in this particular order.
For areas, they can only be certain words like āsiteā, āadminā, āmodā, āclanā and āinstallā. The controllers, actions must be alphanumeric strings, while the parameters and page numbers should be integer numbers. Those are the only restrictions.
Yes I am writing my own router, and I am doing this for a reason. The objective is not to build a site quickly, as I am creating my own framework as part of my practices. So I wont be using third party libraries, but I may look into the FastRoute library to see what advice/tips I may get from it. Thanks.
It seems that you dont understand what I am doing at all, this is not about URL rewrite. Of course, everything is being redirected to index.php, I have a .htaccess file that does this. I have a front controller inside the index.php file, which handles all requests and routes to different app/page controllers and actions based on the URL provided. I have already accomplished this, and now I am trying to build routers and routes that will match with browser URLs. You are talking about what happens before routing, not routing itself.
I see, thanks for the article, Antnee. And can someone please help me by giving an example of how the regular expression should look like? Maybe the 4th or 5th route? I think I can come out with how to write the others if I have 1-2 examples, thanks.
Iām not sure this will directly answer your question @Hall_of_Famer, but this article from Hugo Giraudel went up yesterday. Regex isnāt something I do anything with myself, but it looks a good primer on the subject.
If you know what exactly each segment of URL should mean, isnāt that will be easier to just split them?
Something like this (sample code, havenāt tested):
Well the issue is that the URL can be more flexible, sometimes actions and params do not exist, sometimes there are two params, and sometimes there are page numbers that need to identified for pagination. There is a reason why I am using regular expression for this.
Iām not a regex ninja, but howās this for you? /\/(?<area>[\w\d\-_]+)?(\/(?<controller>[\w\d\-_]+))?(\/(?<action>[\w\d\-_]+))?(\/(?<param1>[\w\d\-_]+))?(\/(?<param2>[\w\d\-_]+))?/
I tried it against all of these examples and it looks like itās working to me:
/
/site
/admin
/site/account
/admin/user
/site/pm/create
/admin/user/create
/site/pm/read/1
/admin/user/edit/2
/site/vm/view/1/2
/site/pm/page-5
/admin/user/page-6
If you do preg_match($pattern, $route, $matches); you should find that you have a $matches array that has named groups, so you would be able to check for $matches['area'], $matches['controller'], $matches['action'], $matches['param1'] and $matches['param2']
I think there will be a very slight speed gain by using regular expressions but will be far more complicated to administer any changes. The script is only called once and far more time will be spent debugging.
@megazoidās approach is sleek and not only effective but also caters for the complete range of URIs. (I wrote a script which was far more verbose and also rigid )
So how is a script meant to identify which is which? Whether you use Regex or not, you need rules that define your structure.
Let me take a stab, and see if you agree.
The first element is always the site.
If there are 2 or more elements, element 2 is always the controller.
If there are 3 or more elements, element 3 is always the action.
If there are 4 or more elements; all elements except the last are parameters; the last element is a parameter unless it begins with the word āpageā followed by a hyphen.
Note: Assuming this ruleset is correct, then megazoidās code is perfectly functional except he needs to add a check for the āpageā option.
I think allowing page number to be standalone part of URI scheme doesnāt make sense at all.
There is params already for all action options, including page numbers.
Rather than having each page that is going to do page-numbers individually run the logic for finding the page number, I would put it in the generalized router.
Ok, maybe it depends on how exactly actions will be implemented. I assumed that action is a function with input arguments, where each argument represents parameter from URL, for example:
URL: /site/shop/category/25/5
//controller
class shop {
//action
function category($id, $page){
//$id = 25, $page = 5
}
}
In such case there is no need to have ālogic for finding the page numberā, itās just passed like a regular parameter.
Check out my response. The Regex is a mess, but it works for all of the examples.
Personally, Iād use parse_url() to get info about the full URL, and then just explode() the path and get the segments out because itās just easier to follow whatās happening, but as an exercise in regex, my solution will work