How to filter data?

Hello,

I have some data like following:

Name: Some Name
Address: Line 1
Line 2
Line 3
City: Some City
State: Some State
Post Code: 12345
Phone: 123457980
Mobile: 123457980
E-Mail: some@someone.com
Comments: Some comments line 1
Some comments line 2
Some comments line 3

The above is sample test data and it is contained with alot of other data which I don’t need. So i have managed to seperate this data from the other data.

But the problem I am facing is on how to parse the multiline data ? How do I do it ? I was earlier using the explode function using chr(13) to seperate the lines and then I was using explode again with ": " to seperate names and values. It was working fine but the multiline values are causing issues.

How to resolve it ? Please help.

Thanks.

Hmm…anyone ???

So i have managed to seperate this data from the other data.

Maybe if you explain how you did that, did it involve a regex?

If you have managed to this, and no one elements are involved, would the structure always be the same?

i.e. Wouldn’t the comments always begin at index (10) and end at the maximum index?


<?php
$segment = array(
  0   => 'Name: Some Name',
  1   => 'Address: Line 1',
  2   => 'Line 2',
  3   => 'Line 3',
  4   => 'City: Some City',
  5   => 'State: Some State',
  6   => 'Post Code: 12345',
  7   => 'Phone: 123457980',
  8   => 'Mobile: 123457980',
  9   => 'E-Mail: some@someone.com',
  10  => 'Comments: Some comments line 1',
  11  => 'Some comments line 2',
  12  => 'Some comments line 3'
);

Hi,

Earlier as each field was a single line I was able to seperate it by looking for "Name: " for starting and "Comments: " for ending. Then sing substr function I would retrieve the data. But as now comments are multilined it is causing issues for me to detect the ending.

And yes data may not stay same and due to the address field being multiline…so there can be sometimes 1 line, 2 or 3…so that would change the index of comment…and other fields…

Thanks

I was wondering about how you went about splitting this text from the original as you seem to intimate - which I presume, perhaps wrongly, contains multiple such text records.

I just want to avoid you doing one kind of split, then having to do another just to get rid of the offending line ends.

If your original split used a regex, then perhaps that can be modified rather than create a second or third one - or, maybe we can treat the whole thing as lines to array elements and work though them.

How often does this have to be done? eg once a day or once a second?

Actually its a email coming to me in different formats and I have piped that email to my script which then filters the required data and adds it to database. Several people are sending me emails in several different formats and hence I am creating different filters for different formats. I will post my code:

<?php
if (isset($_POST["btnTest"]))
{
	//$fields = array ("Your Name:", "Your Address:", "Telephone:", "Telephone:", "Email:", "Friend's Name:", "Friend's Address:", "Post Code:", chr(13), chr(13));
	$fields = array("TITLE:", "NAME:", "EMAIL:", "DESIRED_LOCATION:", "CAPITAL_TO_INVEST:", "TELEPHONE_NUMBER:", "ALT_TELEPHONE_NUMBER:", "BEST_TIME_TO_CALL:", "PREFERRED_METHOD_OF_CONTACT:", "ADDRESS_LINE_1:", "ADDRESS_LINE_2:", "ADDRESS_CITY:", "ADDRESS_REGION:", "ADDRESS_POSTCODE:", "ADDRESS_COUNTRY:", "WHEN_WANTS_TO_START_NEW_BUSINESS:", "COMMENTS:", "SOURCE:", "DATE:", Chr(13), Chr(13));

        if(!empty($_FILES["filemail"][name]))
        {
		$email = file_get_contents($_FILES["filemail"][tmp_name]);
	}
        else { $error[] = "Invalid E-Mail Message File"; }

	if (empty($error))
	{
		for ($i = 0; $i <= count($fields) - 1; $i++)
		{
			$find_pos = strpos($email, $fields[$i], ($find_pos + strlen($fields[$i])));
			if (empty($find_pos)) { $pos[] = "0"; } else { $pos[] = $find_pos; }
		}
		
		$find = array (chr(13), "<br />", "<br>", "<br/>");
		$rplc = array ("", "", "", "");
		
		for ($i = 0; $i <= count($pos); $i++)
		{
			$data[] = str_replace ($find, $rplc, substr ($email, $pos[$i], $pos[$i + 1] - $pos[$i]));
		}

		for ($i = 0; $i <= count($data) - 1; $i++)
		{
			$exp2 = explode (": ", $data[$i]);
			if (trim($exp2[0]) <> "") { echo "Field: " . str_replace ($find, $rplc, strtolower($exp2[0])) . " - " . "Value: " . $exp2[1] . "<br>\
"; }
		}
	}
}
?>
<h3 align="center">E-Mail Filter System</h3>
<?php if (!empty($error)) { require_once ("error.php"); } ?>
<form method="post" enctype="multipart/form-data">
<table border="0" cellpadding="5" cellspacing="0" align="center">
<tr>
	<td align="right">E-Mail:</td>
	<td><input type="file" name="filemail" /></td>
</tr>
<tr>
	<td align="right">Debug Mode:</td>
	<td>
		<input type="radio" value="0" checked name="debug" /> No
		<input type="radio" value="1" name="debug" /> Yes
	</td>
</tr>
<tr>
	<td colspan="2">&nbsp;</td>
</tr>
<tr>
	<td colspan="2" align="center"><input type="submit" name="btnTest" value="Test" /></td>
</tr>
</table>
</form>

UPDATE: I have modified the code a bit, it now gets the address fine, but the last comments are still an issue…

What I have done now is I have specified the list of fields that I need which it finds the positions of then using those positions I extract the data and process the thing. Seems to be fine but I still think its not achieving the goal.

Please check.

Thanks.

So essentially, the challenge is to take 3 example strings such as:


$email[0]  = "Your Name: Joe Bloggs
Your Address: 1 Big Street
Myhamlet
Mytown";

$email[1]  = "Your Name: Joe Bloggs<br>
Your Address: 1 Big Street<br>
Myhamlet<br>
Mytown<br>";

$email[2]  = "Your Name: Joe Bloggs<br />
Your Address: 1 Big Street<br />
Myhamlet<br />
Mytown<br />";

And output that as 3 array elements exactly the same as this:


$insert[0]  = array(
"Name" => "Joe Bloggs"  ,
"Address" => "1 Big Street, Myhamlet, Mytown" ,
);
$insert[1]  = array(
"Name" => "Joe Bloggs"  ,
"Address" => "1 Big Street, Myhamlet, Mytown" ,
);
$insert[2]  = array(
"Name" => "Joe Bloggs"  ,
"Address" => "1 Big Street, Myhamlet, Mytown" ,
);

Have I grasped this? Otherwise edit it so that it is more lifelike.

Hi,

Yes you got it.

Thanks.

Given this is incoming:


$email[0]  = "Your Name: Joe Bloggs
Your Address: 1 Big Street
Myhamlet
Mytown";

$email[1]  = "Your Name: Jean Bloggs<br>
Your Address: 2 Big Street<br>
Myhamlet<br>
Mytown<br>";

$email[2]  = "Your Name: Jim Bloggs<br />
Your Address: 3 Big Street<br />
Myhamlet<br />
Mytown<br />"; 

Heres a quick and dirty way to get the records without line ends.


foreach( $email as $e){
$e= strip_tags($e);
$bits[] = explode( PHP_EOL, $e);
$elements = array();
$records = 0;

foreach( $bits as $k=>$bit){
$ctr=0;
  foreach( $bit as $k=>$val){

    if( false == strpos( $val , ':' )) {
        $elements[$records][--$ctr] .= ', ' . $val ; 
    } else {
    $elements[$records][] = $val ;
    }
$ctr++;
}
$records++;
}
}


Which then leaves the task of replacing the numerical key 0 with the string key “Name”, there must be several ways of doing that…


var_dump( $elements); // gives
array
  0 => 
    array
      0 => string 'Your Name: Joe Bloggs' (length=21)
      1 => string 'Your Address: 1 Big Street, Myhamlet, Mytown' (length=44)
  1 => 
    array
      0 => string 'Your Name: Jean Bloggs' (length=22)
      1 => string 'Your Address: 2 Big Street, Myhamlet, Mytown' (length=44)
  2 => 
    array
      0 => string 'Your Name: Jim Bloggs' (length=21)
      1 => string 'Your Address: 3 Big Street, Myhamlet, Mytown' (length=44)

Maybe there is an alternative idea there?
Perhaps as you iterate through the multi-array to insert them into the db you can detect “Your Name:” - remove it and insert it into the field name?

Hi,

Is there a size of string that strpos can find ? I mean can’t it find big strings like: “Preferred Method of receiving information:” ??

Thanks.

Well here is a single record which when taken as a single array from $elements as previously posted you can then go on and create an array and then unmap the keys on a record like this:


$email[0]  = "Your Name: Joe Bloggs
Your Address: 1 Big Street
Myhamlet
Mytown
Preferred Method of receiving information: Carrier Pidgeon
";


// a dictionary of keys, existing  to preferred shorter terms
$mykeys = array(
'Your Address' => "Address",
"Your Name" => "Name",
"Preferred Method of receiving information" => "PrefMethod",
);

// This is what could be at the heart of a loop going through
// $elements from the previous code I posted

foreach( $elements[0] as  $el ){
list($key, $value) = explode(":", $el);
$result[$mykeys[$key]] = $value;
}


To give you this result:


// var_dump( $result );
array
  'Name' => string ' Joe Bloggs' (length=11)
  'Address' => string ' 1 Big Street, Myhamlet, Mytown' (length=31)
  'PrefMethod' => string ' Carrier Pidgeon, ' (length=18)

I am not sure if this is any better than the method you already have in place, which you admit you are happy with, but somewhere in there is a means whereby you can get rid of the extra line ends that you dont want.

You could leave your existing code and just loop again through your results, or try and do it in one hit - as you read the email upload.

Edit:

I cannot escape the suspicion that a really well crafted regex would eliminate some of this complexity, but it is just not my strong point - sorry - maybe someone will come to our rescue?

I think you can yes, I was using strpos only to discover if the line in question contained a potential key or not.

If someone enters a colon into Line 2 of their address, then this falls down badly.

Hi cups,

Thanks for your code. Though it works great but the issue is that when I was doing single lines only then its was easy to find the end of the data. But as the multilines have came in I am not able to determine the end of the data.

Can you please tell me how to extract the desired data from the all of data ?

eg:

lorem ipsum

Your Name: Joe Bloggs
Your Address: 1 Big Street
Myhamlet
Mytown
Preferred Method of receiving information: Carrier Pidgeon
Commets: Lines 1
Lines 2
Lines 3

lorem ipsum dolor sit amet

How would your code work for this kind of data ?

Thanks.

I did ask earlier that you give me a more realistic data sample :expressionless:

Give it a try.

Hi,

I tried your code and it has same issue as mine it also won’t find the last field’s ending and it would include last line also. So finding the ending is important.

Thanks.

You’re going to need to take into account comments which contain the colon character too.

This doesn’t address my previous comment, in fact, you could just use a carefully crafted RegEx to split these values up but…


<?php
error_reporting(-1);
ini_set('display_errors', true);

$segments = array( 
  0   => 'Name: Some Name', 
  1   => 'Address: Line 1', 
  2   => 'Line 2', 
  3   => 'Line 3', 
  4   => 'City: Some City', 
  5   => 'State: Some State', 
  6   => 'Post Code: 12345', 
  7   => 'Phone: 123457980', 
  8   => 'Mobile: 123457980', 
  9   => 'E-Mail: some@someone.com', 
  10  => 'Comments: Some comments line 1', 
  11  => 'Some comments line 2', 
  12  => 'Some comments line 3' 
);

$sorted = array();

foreach($segments as $segment){
  
  $isNewSegment = preg_match('~^([A-Z][^:]+):([^$]+)~', $segment, $matches);
  
  if($isNewSegment){
    
    list($key, $value) = array($matches[1], $matches[2]);
    
    $sorted[$key] = $value;

    continue;
  }
  
  if(isset($key)){
    $sorted[ $key ] .= sprintf(' %s', $value);
  }
  
}

print_r(
  $sorted
);

/*
  Array
  (
      [Name] =>  Some Name
      [Address] =>  Line 1  Line 1  Line 1
      [City] =>  Some City
      [State] =>  Some State
      [Post Code] =>  12345
      [Phone] =>  123457980
      [Mobile] =>  123457980
      [E-Mail] =>  some@someone.com
      [Comments] =>  Some comments line 1  Some comments line 1  Some comments line 1
  )
*/

[ot]Ha! Just spotted Cups OT.

I cannot escape the suspicion that a really well crafted regex would eliminate some of this complexity, but it is just not my strong point - sorry - maybe someone will come to our rescue?
[/ot]

Hi,

Your code has issues also, its showing Address and Comments wrongly. It should show: Line 1 Line 2 Line 3 but its showing Line 1 Line 1 Line 1

I am just not able to find the length of last comments line…rest is all i am able to do.

Thanks.

Yeah, a small typo. :slight_smile:


<?php
error_reporting(-1);
ini_set('display_errors', true);

$segments = array( 
  0   => 'Name: Some Name', 
  1   => 'Address: Line 1', 
  2   => 'Line 2', 
  3   => 'Line 3', 
  4   => 'City: Some City', 
  5   => 'State: Some State', 
  6   => 'Post Code: 12345', 
  7   => 'Phone: 123457980', 
  8   => 'Mobile: 123457980', 
  9   => 'E-Mail: some@someone.com', 
  10  => 'Comments: Some comments line 1', 
  11  => 'Some comments line 2', 
  12  => 'Some comments line 3' 
);

$sorted = array();

foreach($segments as $segment){
  
  $isNewSegment = preg_match('~^([A-Z][^:]+):([^$]+)~', $segment, $matches);
  
  if($isNewSegment){
    list($key, $value) = array($matches[1], $matches[2]);
    $sorted[$key] = $value;
    continue;
  }
  
  if(isset($key)){
    $sorted[$key] .= sprintf(' %s', $segment);
  }
}

print_r(
  $sorted
);

/*
  Array
  (
      [Name] =>  Some Name
      [Address] =>  Line 1 Line 2 Line 3
      [City] =>  Some City
      [State] =>  Some State
      [Post Code] =>  12345
      [Phone] =>  123457980
      [Mobile] =>  123457980
      [E-Mail] =>  some@someone.com
      [Comments] =>  Some comments line 1 Some comments line 2 Some comments line 3
  )
*/