Quick Regex Help

While I know how to tell if a string matches a pattern, what I want is to get certain portions of the string back in an array. For example:

pattern: ‘catalog/[0-9]+/entry/[0-9]+/edit’
string: ‘catalog/12/entry/45/edit’

What I need to get is the values of each variable block in order to use them:

array(12, 45)

How would I go about doing this? preg_match just returns the entire string. Sorry I am such a dunderhead when it comes to regex. =/

Maybe one of these can help.

Still trying to figure them out myself…

$pattern = '#([0-9]{1,2})#';  // change 2 to 3 for numbers up to 999
$string =  'catalog/12/entry/45/edit';

preg_match_all($pattern, $string, $matches);

var_dump( $matches[0]);

//array
//  0 => string '12' (length=2)
//  1 => string '45' (length=2)

Thanks, that was close enough that I could compose an answer. However, I need the static parts to be validated as part of the match, so I had to use “#Admin/Author/([0-9]+)/Book/([0-9]+)/Edit#” as the pattern. The reason is because the pattern “#Admin/Publisher/([0-9]+)/Book/([0-9]+)/Edit#” means something completely different, even though it has the same two variable blocks. But it works, so I’m good now. Thanks.

Oh, right, I had removed all of that stuff to make it more generic …

On a side note you could name those pattern matches like so…

preg_match_all(“Admin/Author/(?P<author_id>[0-9]+)/Book/(?P<book_id>[0-9]+)/Edit”, $string, $matches);

Which would allow you to access the two matches using a key rather than int:

var_dump($matches);

//array
// 0 => ‘1’
// 1 => ‘2’
// ‘author_id’ => ‘1’
// ‘book_id’ => ‘2’

I had long forgotten that, thanks for the reminder Lobsterdore! Nice one.

Wow, thanks, that’s even better than what I had! =)

Now, is there a preg function to reverse this? For example, take the pattern, and an assoc array of keys/values and get a full url?

$vars = array(
‘authorid’ => 132,
‘bookid’ => 5
);

$pattern = “Admin/Author/(?P<author_id>[0-9]+)/Book/(?P<book_id>[0-9]+)/Edit”;

// call some method and get

$result = “Admin/Author/132/Book/5/Edit”;

Thanks in advance.

[COLOR=#464646][FONT=Helvetica Neue]$result = "Admin/Author/{$vars['authorid']}/Book/{$vars['bookid']}/Edit";

You don’t use Regex for that.[/FONT][/COLOR]

Um no. It has to be a routine that will work with any pattern/var-array. I won’t know ahead of time what the var names in the pattern are. What I want is, foreach var as key => val, if key_is_in_pattern then replace…see what I mean? Sure, if it were just this one known pattern it’d be easy.

Let me elaborate on what I’m trying to do…


class Route
{

	private $_pattern;
	private $_controller;
	private $_action;
	private $_vars;

	public function __construct($pattern, $controller, $action)
	{
		$this->_pattern = $pattern;
		$this->_controller = $controller;
		$this->_action = $action;
	}

	public function IsMatch($url)
	{
		return preg_match('#' . $this->_pattern . '#', $url, $this->_vars);
	}

	public function BuildUrl(array $vars)
	{
		// dis be havin no code mon, ya mind slapping a bit a mojo here?
		// $vars be an assoc array, which should be matchin da pattern vars
		// exception out if dey not match...
	}

	public function GetVar($key)
	{
		return $this->_vars[$key];
	}

	// ...

}

If they’re always integer values, you could easily use str_replace, since the pattern is pre-determined.

foreach ($vars as $key=>$value)
$pattern = str_replace(‘(?P<’.$key.‘>[0-9]+)’, $value, $pattern)

Or am I not understanding the issue?

You got it. But what I think I may do is something more like this, so that any regex block can be used for a given variable:


$routeCollection->Add(
    "Admin_Book_Edit", // route name
    new Route(
        "Admin/Author/{authorid}/Book/{bookid}/Edit", // the pattern with var stand-ins
        array( // the regex needed to validate the pattern as well as values incoming to the BuildUrl method
            'authorid' => '[0-9]+',
            'bookid' => '[0-9]+'
        ),
        "Book", // controller to use
        "Edit", // action to use
        "Admin" // area to use
    )
);

This decouples the validation from the actual variable in a way that would allow me to build a proper regex for url comparison and for url building.

Thanks.

I have not tested this…don’t currently have PHP installed for testing at the moment.
Made use of preg_quote to protect against rogue special characters.


<?php

class Route
{
  protected
    $sourcePattern,
    $parsedPattern,
    $defaults,
    $options;

  public function __construct ( $pattern, array $defaults = [], array $requirements = [], array $options = [] )
  {
    // Process $pattern
    $this->sourcePattern = $pattern;
    $pattern = preg_replace_callback( '~\\\\\\{([-\\w\\d_]+)\\\\\\}~',
      function ( $match ) use ( $requirements )
        {
          if ( isset( $requirements[ $match[1] ] ) )
            return sprintf( '(?P<%s>%s)', $match[1], $requirements[ $match[1] ] );

          return sprintf( '(?P<%s>[^/\\\\\\\\]+)', $match[1] );
        }, preg_quote( $pattern, '~' ) );
    $this->parsedPattern = "~$pattern~";

    $this->defaults = $defaults;
    $this->options = $options;
  }

  public function buildURI ( array $params )
  {
    $output = preg_replace_callback( '~\\{([-\\w\\d_]+)\\}~',
      function ( $match ) use ( $params )
        {
          if ( isset( $params[ $match[1] ] ) )
            return $params[ $match[1] ];

          return $match[0]; // No match
        },  $this->sourcePattern );

    return $output;
  }
}

# PHP 5.4 array syntax, swap for array() for version less then 5.4

$r = new Route(
  'Admin/Author/{authorid}/Book/{bookid}/Edit',
  [ 'controller' => 'Book', 'action' => 'Edit', 'area' => 'Admin' ],
  [ 'authorid' => '\\d+', 'bookid' => '\\d+' ],
  []
);

$u = $r->buildURI( [ 'authorid' => 20, 'bookid' => 100 ] );

Thanks. I’ll take a look at that tomorrow, but just so you will know, the controller and action values will never be variables in my system. I prefer a much more granularly defined set of routes, leaving only true variables. These variables will usually ( but not always ) be ids of some sort, for which a reasonable default value cannot be set, therefore, I don’t entertain the concept of default values either.

Well sure…that is handled by the pattern you use, as long as there is no {controller} with in the pattern its static and does exactly what you said you want. I just didn’t want to put dedicated method parameters for those. Instead used an array. Does the same thing you want, but is also overloadable.

Could also add a second form of syntax to your patterns, one that does not define variables by static input “controller”, “action”, “area”. Then you won’t have to repeat those in a method call, already defined in the pattern. DRY etc.

Hey logic_earth, thanks for actually discussing this and not just tossing out code. It’s been a while since I’ve actually had a proper discussion. If you will, allow me to explain my stance on this.

Take the following urls:

Author/Retire/42
Book/Publish/42

While I could use a generic pattern like “{controller}/{action}/{id}” there are problems with this approach. For starters, the pattern will resolve as a match for Book/Retire/42, which doesn’t make sense. While true that post resolution error checking will catch this, I find it a rather unacceptable scenario.

Consider how these would be registered:


$routes-Add(
	'route_name',
	'{controller}/{action}/{id}',
	null, // no reasonable defaults
	array(
		'controller' => 'author|book',
		'action' => 'list|add|edit|remove|publish|retire|retitle|rename'
	)
);

As you can see, there is no consideration for which action values are paired with which controller. The only solution is to make two routes:


$routes-Add(
	'author',
	'Author/{action}/{id}',
	null, // no reasonable defaults
	array(
		'controller' => 'author',
		'action' => 'list|add|edit|remove|retire|rename'
	)
);

$routes-Add(
	'book',
	'Book/{action}/{id}',
	null, // no reasonable defaults
	array(
		'controller' => 'book',
		'action' => 'list|add|edit|remove|publish|retitle'
	)
);

Now, things make a bit more sense. But as you can see, the idea of a more granular style of routing is beginning to creep in. The next issue is that not all of the given actions will require an id. While most commercial systems account for this, in coding my own, I found it difficult to make considerations for the trailing ‘/’ separator. Assuming I can’t justify the extra time and coding required, we need more routes:


$routes-Add(
	'author_with_id',
	'Author/{action}/{id}',
	null, // no reasonable defaults
	array(
		'controller' => 'author',
		'action' => 'edit|remove|retire|rename'
	)
);

$routes-Add(
	'author_without_id',
	'Author/{action}',
	null, // no reasonable defaults
	array(
		'controller' => 'author',
		'action' => 'list|add'
	)
);

$routes-Add(
	'book_with_id',
	'Book/{action}/{id}',
	null, // no reasonable defaults
	array(
		'controller' => 'book',
		'action' => 'edit|remove|publish|retitle'
	)
);

$routes-Add(
	'book_without_id',
	'Book/{action}',
	null, // no reasonable defaults
	array(
		'controller' => 'book',
		'action' => 'list|add'
	)
);

We’ve now become much more granulated than we started out. But there is one last issue that really stands out. When integrating legacy code, we often come across controller and action names that do not follow proper conventions. We could spend some time refactoring this code, but depending on how large the code base is, it would probably be a lot easier to simple remap things such that our urls can be used without changing anything except the mapping code. In the above examples, the controller builder would look for AuthorController and BookController respectively. In order to decouple a given url from the intended controller and action, we must also remove these dependencies from the pattern. Doing so forces us to create a route for each controller/action combination we have in the system. This is not entirely unreasonable. If you think about how we handle dependency injection, mapping a particular interface to a particular class, then it makes sense to map a particular url to a particular controller and action. While increasing the number of routes we must define, it greatly reduces the complexity of the code, internal to the router itself.


// simple mapping with one variable
$routes->Add(
	'book_publish',
	'Book/{bookId}/Publish',
	array('bookId' => '\\d+')
	'Book',
	'Publish'
);

// remapping a controller and action
$routes->Add(
	'customer_remove',
	'Customer/{customerId}/Remove',
	array('customerId' => '\\d+')
	'Cust', // using legacy controller name
	'Delete' // using legacy action name
);

One final note is that most commonly available systems are written with general development in mind. They simply can not afford to make any assumptions on coding standards, methodologies, and conventions. However, these system are laden with code and conventions that may never be used in a professional environment, as most companies will define their own standards and code around them.

Hopefully, with all this mind, you can now understand why I see no reason to entertain the notion of default values for pattern variables, or including controller and action as variables in a pattern.

Sorry to bother you guys again, but I need more help. Yeah, I know, I’m a retard…

Given the pattern: foo/{fid}/bar/{bid}/woot
I want to get an array that merely includes the text in the {} in a non-assoc array.

preg_match_all(‘#{([a-z][A-Z])+}#’, $pattern, $matches) just doesn’t work. =/

I have NO CLUE what I’m doing here…

The purpose is to build a list of expected substitutions.

$matches should contain ‘fid’ and ‘bid’, without the {}'s.