PHP parse street address with abbreviations from text input for search

I need to parse a street address in PHP a string that might have abbreviations.
This string comes from a text input.
The fields I need to search are:

  • street (alphanumeric - might have
  • building (alphanumeric - might have
  • number (alphanumeric - might have
  • area (numeric from 1 to 5)
  • other (unknown field & used to search in all the above fields in the database)

For example users submits one of this text text:

  1. street Main Road Bulding H7 Number 5 Area 1
  2. st Main Road bldg H7 Nr 5 Ar 5
  3. stMain bldgh7
  4. ar5 unknown other search parameter
  5. street Main Road h7 2b
  6. street main street str main road

The outcome I would like to see as a array:

  1. [street]=>Main Road [building]=>h7 [number]=>5 [area]=>1
  2. [street]=>Main Road [building]=>h7 [number]=>5 [area]=>5
  3. [street]=>Main [building]=>h7
  4. [area]=>5 [other]=>unknown other search parameter
  5. [street]=>Main Road [other]=>h7 2b
  6. [street]=>Main Street&&Main Road

My code so far, but dosen’t work with all the examples:

<?php
//posted address
$address = "str main one bldg 5b other param area 1";
//to replace
$replace = ['street'=>['st','str'],
            'building'=>['bldg','bld'],
            'number'=>['nr','numb','nmbr']];
//replace
foreach($replace as $field=>$abbrs)
    foreach($abbrs as $abbr)
        $address = str_replace($abbr.' ',$field.' ',$address);
//fields
$fields = array_keys($replace);
//match
if(preg_match_all('/('.implode('|',array_keys($fields)).')\s+([^\s]+)/si', $address, $matches)) {
    //matches
    $search = array_combine($matches[1], $matches[2]);
    //other
    $search['other'] = str_replace($matches[0],"",$address);
}else{
    //search in all the fields
    $search['other'] = $address;    
}
//search
print_r($search);

Code tester: http://ideone.com/j3q4YI

I think one of the issues with using str_replace() and your list of abbreviations is that it will replace all instances wherever they appear in the string. So for example if your street is called “Main Street”, isn’t that going to end up being called “Main Streeteetreet”?

start - Main Street
check for ‘st’, replace with ‘street’ - Main streetreet
check for ‘str’, replace with ‘street’ - Main streeteetreet

  • new after your reply -

It’s a pain having to deal with abbreviations, or more to the point, dealing with the string not being split into words. Being able to explode the string by spaces and pick off one at a time would make it much easier to deal with. Perhaps you could step through a character at a time until you get a meaningful keyword:

word = ""
while (count < length of (address)) { 
  word = word + next character of address
  is word a keyword or abbreviation? 
    // deal with keyword
  count = count + 1
  wend

Once you’ve got the minimum possible abbreviation for a keyword, you can then look at the next few characters to see how far the keyword is abbreviated, then split that out of the address string. Continue looking through the address string until you see another keyword, and the bit between will be the data for the prior keyword.

It would be easier to have some rules, and I suspect this might be a bit of code that you’ll get working for all the samples that you can think of, and a user will break in about ten seconds.

Oh, cross-edit, added loads after your post. Adding spaces makes it better, but your sample strings don’t seem to need spaces after keywords so it won’t always work. Then again, I can’t think of a way that you can code for every eventuality.

Thank you! Just updated the new code, I added a white space after the replaced word. Still not working for cases 3 to 6.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.