Extracting Text with Regex

I am trying to extract the asin number from an amazon url. I spent all day writing 4 functions with regular expressions. (others are for other extractions) and was feeling pretty confident until I loaded wamp this morning and ran the functions. Everything seems to work ok except the output prints “Array” for each record instead of the text it should be out putting. On all four functions

I could have swore I used a similar function for this before. I am sure its something silly I missed. I am still a nub but here is my code can anyone see what I am doing wrong?


<?php

	function get_asin($url)
		{
                	preg_match('/\\/dp\\/(.*)\\/ref=tag/', $url, $matches[1]);
                	return $matches;
		}	
?>

<?php 

	$url = "http://www.amazon.com/Vulli-Sophie-the-Giraffe-Teether/dp/B000IDSLOG/ref=tag_rso_rs_edpp_url"; 
	echo get_asin($url);
?>


am i even using the right function? regex isn’t one of my strong points

(.) is only to be used as a very last resort when you don’t what you’ll need to match exactly. In your case it seems you do: characters and digits.
So instead of (.
) I’d go for ([a-zA-Z0-9]+). The plus also ensures that there needs to be at least one character to match. If you know the number of characters that needs be matched you can replace the + with {n}, where n is the number of characters that needs to be matched.

As for the return value, you need to use $matches in the preg_match function, and then return $matches[1], not the other way around :wink:

BTW. If you use ~ for delimiters instead of / you don’t have to escape slashes in the regex, which makes it much more readable IMHO.

So:


function get_asin($url)
{
  preg_match('~/dp/([a-zA-Z0-9]+)/ref=tag~', $url, $matches);
  return $matches[1];
}

:slight_smile:

About the array,
$matches[0] will return /dp/B000IDSLOG/ref=tag which is the full pattern
and $matches [1] will return B000IDSLOG which is only the regex

Thanks guys, I did log in yesterday morning to read what you said and then played with it for a bit before work. Didn’t get to the PC last night, but I did manage to get everything going and working properly.

I think the biggest problem is that I wasn’t being strict enough with my searching parameters, I thought if I typed in / it would match the first one not the last one. That being said

ScallioXTX

I actually had my original expression (idk what its actually called) to search for only the 9 digit string of uppercase letters and numbers, because amazon actually has different /dp/ variables for different countries, but the way you have shown would probably be stricter policy and safer?

and the ~ do make it easier to read, thanks bro

Yes, because it will only match if the characters that are there do actually adhere to the pattern your looking for, whereas (.*) will match all kinds of garbage thus giving the impression you found something, but you didn’t; you found a string of garbage where a string of 9 characters and digits should have been :slight_smile: