URL routing and regular expressions

[quote=“StarLion, post:20, topic:199059”]
Your solution will work, as long as there are exactly 0, 1, or 2 parameters
[/quote]And yet it matches every example provided, so that’s requirements :heavy_check_mark:

My point was just that it’s a horribly complicated Regex that is difficult to maintain. I was not condoning using it!

Its a very nice regex pattern, thanks a lot. The parameters should be only in numbers though, so I guess the subpattern should look somewhat different from area, controller and action. In this case I think this pattern should not match the URLs with page numbers, since ‘page-5’ or ‘page-6’ are alphanumeric characters, while the parameters should be only number.

Just as StarLion mentioned, the URLs with and without page number are two separate classes of patterns, and I plan to have different patterns for them. In general, the URL is matched with the pattern with page number first, and if fails it will check the one without page number. But its a very good example, thanks a lot.

Well in general your assumptions are correct, although I am also thinking that element 3 can be a subcontroller, in cases like /admin/user/manage/edit/1. In this case, manage is a subcontroller inside the actioncontroller user, and edit is the action. This makes parameter 4 action, and parameter 5 onwards parameters.

Also element 1 may be omitted if the area is ‘site’, in this case the new element 1 is controller, element 2 is action, element 3 and onwards are parameters. I do not need regex for these since I can write my own, after I get two examples(one for route 4, and another for route 5 in my original post). But this is a major reason why I dont want to use megazoid’s function, since in cases with customized routes I have to write a series of if…else in this function, which becomes less flexible. Not mentioning I am writing a framework, so client application developers may want to define their own customized routes. I explain in my own response to his post why it will be less flexible.

Yeah it handles that, but in a way that isnt very flexible. The idea is that I am building a framework in which there are several different kinds of Routes stored as collections in the Router object. The Router will loop through each route(subclass of abstract Route class) in order and attempts to match the URL with the pattern of the route. For instance ParameterizedRoute matches the fourth route in my example, and PaginatedRoute matches fifth route in my example. Each route implements the abstract method interpret($url) from the parent Route class to generate values such as Controller, Action, Parameters etc. Since page number may appear in the place of parameters, before parameters or after parameters, its very easy and flexible.

All I need to do is to subclass Route and create concrete specific route class. It also opens door for more complex route in future, or even to change the delimiter from forward slash ‘/’ to anything else, like ‘-’, ‘&’, etc. Since client users may want URL schemes totally different from the set I give to them, they will find it very easy to add their new custom routes. With your function, this becomes a lot more difficult to extend and maintain. The function can become very big with cascades of if statement, also it’s not even encouraged for each application developer to touch the codebase at core framework.

So you need a variation and a second one for if this one doesn’t match. I was trying to give you a single solution :slight_smile: Easily done then

Yeah, thats about right. The Parameterized-Route and Paginated-Route will have different regex patterns that match different URL. Thank you so much.

Unless you can say that

Will ALWAYS be true, there is no way to programatically determine the structure. (Is item 5 a parameter or a action? You cant know.)
IE:
If a non-numeric parameter is ever used,
/aplace/controller1/subdo/doathing/1/3

Is ambiguous from
/aplace/controller1/doathing/imaparam/3

And this is where routers in most frameworks (including the aforementioned FastRoute) are great; you declare them like

$router->get('/site', ['controller\\site', 'indexAction']);
$router->get('/site/account', ['controller\\site', 'accountAction']);
$router->get('/site/pm/read/{:id}', ['controller\\site\\pm', 'readAction']);
$router->post('/site/account', ['controller\\site', 'createAccountAction']);

In the third example you’d probably have something like this in your code:

<?php
namespace app\controller\site;

class pm {
    public function readAction($id)
    {
        // $id comes from {:id} in the route and is automatically passed to this method
    }
}

It’s much easier to manage and to understand what’s going on, without some magical do-everything Regex. I would definitely recommend a solution like this over just a simple regex match.

Also, just to note, there are usually options in these routers for despatch. In my example I provide an array of the namespaced class and then the method to run, but other acceptable options include:

$router->get('/site', [$app, 'indexAction']);
$router->get('/site/pm/read/{:id}', function($id){ /* Do something with $id */ });

Well its not ambiguous, as I stated above the parameters can only be integer, while action must be alphanumeric(and in fact must start with a letter, not number). Using regular expression, it will be possible to distinguish the two.

Well I would still prefer using regular expression. I dont want to have to declare and add several routes to the router whenever I create a new controller, as with the FastRoute example I have to do this with every controller and every action. So yeah, I would stick to regular expression. If possible, can I have an example how the paginated route should look like with regular expression? Thank you so much.

Well there is a problem with your regular expression as I have tested, as its being way too generous and ambitious. I tried even false URLs such as /1 or /-, which defines the area as ‘1’ but the area must start with a letter(not digit or any other symbols). Yet it still matches the URL, as preg_match returns true. so I wonder, if its possible to make sure that the regular expressions for each part be alphanumeric beginning with letter for area, controller and action, while for parameters it must be number. Thanks.

I think there is a reason why a lot of “routing” requires “convention”

It is precisely the limited flexibility that mandatory adherence to following convention requires that bothers me. But following this discussion I’m beginning to think it may be a necessary evil.

Interesting view. This is how I see it. Being a URL is a URI, it is an identifier. And, as an identifier, it must have some form of convention. It must be standardized, and more importantly, logical for the application (and the developers and users) to understand it, in order to return the proper response. So the convention is most definitely a must and totally unavoidable and that is actually a good thing, because it means we developers can enforce the convention. :smile:

The interesting thing about this discussion is, we are mainly talking about how to handle URLs with a GET method. The fun starts, when handling URLs with other methods. For instance, Antnee noted this route.

$router->post('/site/account', ['controller\site', 'createAccountAction']);

How do you handle deletes and updates, as an HTML form only knows the post method? Do you then just call AccountAction instead of the createAccountAction and do some logic with an extra parameter sent by the form? Or do you do a totally different URL scheme?

$router->post(‘/site/account/update’, [‘controller\site’, ‘updateAccountAction’]);

which would make the previous route

$router->post(‘/site/account/create’, [‘controller\site’, ‘createAccountAction’]);

The problem with all this “convention” is the fact the developer must make the decision or come up with a system, where some of the decision making is left up to the software user (Do you want Ids before or after the object’s name?). And still, even then, there are rules (cough…SEO) out there in userland, which help make the decisions for convention easier too. So, it isn’t all tooo bad. :smile: Well, all is ok, as long as Google doesn’t decide the URLs need to be different or follow a set standard. LOL! (Oh, and obviously CUD operations don’t need to follow any SEO rules, so we are back to square one with the decision making of which convention to use. Decisions, decisions… LOL! :smiley: )

Scott

I can see routing methods are hot topic more often than I expected because this seemed to me a pretty simple thing - but maybe it isn’t? I think I am a rare one here because most often I keep all my routes in a database table and I don’t have any regexes or other complicated mechanisms - I just look up the table for the URI or action and get the result. If a route does not exist then when run on a development machine, the system scans the controllers and adds the missing route automatically based on a convention - after that I am free to change the URL scheme as I wish.

I don’t see much difference between GET and POST for routing. I prefer to keep separate actions for what they do so I’d have createAccountAction, updateAccountAction, deleteAccountAction, and so on.

But, that is the easy part. Should the URL for a POST entail the actual parameter for the CUD action i.e. /site/account/create or /site/account/update or does the actual method/action need to be a hidden input field? And we are talking RPC, what if you also wanted to support ROA? Decisions…decisions… :smiley:

I like @megazoid’s getRoute method, because it can be used with a standard logic (determined by the convention), to match the request to the behavior needed from the application. I am personally not certain why there is a need for regex at all, if you have a clear convention set up. If you use Hall_of_Famer’s convention and split up the REQUEST_URI into an array, then the first index is the area, the second index is the controller, the third index is the action and so on and so forth. The URI actually can determine what your application should do directly. No need for a routing table and any hardcore matching algorithm actually, if a general convention is followed. And, the convention must be there to begin with anyway. Right? :blush:

Scott

Normally I would say yes, just for the sake of consistency. If the action parameter is passed via URL for GET then doing the same for POST seems logical because the purpose of sending the action parameter doesn’t change - it’s not as if you are sending it via POST to save it in the database or send it anywhere - it’s still used for routing just like in GET.

There are cases where this is hard to avoid. For example, there is a listing of items with a checkbox next to each one and below you have buttons “delete”, “show”, “hide”. You can only have one form which must invoke three different actions depending on the button. Then the action part needs to be sent via POST to a common controller and then dispatched further accordingly. Or, the router needs to be configured to handle such cases, which is indeed an interesting issue! If js is taken for granted this can be avoided because we can change the form’s action dynamically depending on the button.

1 Like

Good point!

Scott

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.