preg_match_all ignore words

Hello,

I try to create a regex to capture emails ending by .info/.con containing no aaa/bbb.

Is this the correct syntax ?

// search email ending in .com/.info containing no aaa/bbb
preg_match_all('#^(?!.*aaa)(?!.*bbb).*@.*\\.(?:com|info)$#im', $html, $emails);

To get this:

caaac@ccc.com = no
ccc@ccbbb.com = no
cccc@cccc.com = good (address syntax correct + term absent before or after the @)

I tested the code it seems to work to see here.

But when I test it with a string no longer works.

$string = "blah blah blah [email]email@address.com[/email] blah blah blah [email]email2@aaaaess.com[/email] blah blah  [email]email3@address.com[/email] [email]embbbil@adress.com[/email]";
preg_match_all("#^(?!.*dns)(?!.*host).*@.*\\.(?:com|info)$#im", $string, $matches);

Thank you in advance for your help. Cordially

So you want everything that ends in .com or .info;
Contains a @.
Does not contain aaa or bbb.

My first impulse is ‘you need 3 steps; identify addresses; identify invalid strings; array_diff.’

So you want everything that ends in .com or .info;
Contains a @.
Does not contain aaa or bbb.

Yes, exactly!

I already done in 2 stages.

  1. preg_match_all to extract e-mails
  2. filters words aaa / bbb with if (! preg_match (…))

I would love to simplify the code with a single preg_match_all …

I apologize for my English …

Thank you for your help, kindly

I’m not a PREG guru so i cant give you a single preg_match that’ll do what you want… here’s how it works in my head.


$emails = preg_grep('/.*@.*(\\.com|\\.info)/i',explode(' ',$html));
$emails = preg_grep('/(aaa|bbb)/i',$emails,PREG_GREP_INVERT);

Note: Untested.

Merci pour votre réponse.

Je sais le faire en plusieurs étapes (extraction puis filtre) cependant j’aurais vraiment aimé réunir tout le traitement dans une seule regex pour simplifier le code, l’optimisé niveau ressource et temps d’éxecution…

This syntax works fine [U]SEE HERE[/U] except for a string that includes spaces.

e.g:

$string = "email1@address.com blah email2@aaaaess.com blah email3@address.info embbbil4@adress.com";
preg_match_all("#^(?!.*aaa)(?!.*bbb).*@.*\\.(?:com|info)$#im", $string, $matches);

Spaces preceding or following is the cause of the problem but I do not know how to solve the problem.

sorry translation problem ! :wink:

Thank you for your reply.

I can do in several steps (extraction and filter), however I really enjoyed meeting all the processing in a single regex to simplify the code, the optimized level resource and execution time …

This syntax works fine SEE HERE except for a string that includes spaces.

e.g:

$string = "email1@address.com blah email2@aaaaess.com blah email3@address.info embbbil4@adress.com";
preg_match_all("#^(?!.*aaa)(?!.*bbb).*@.*\\.(?:com|info)$#im", $string, $matches);

Spaces preceding or following is the cause of the problem but I do not know how to solve the problem.

Cordially

Keep things as simple as possible, you’ll thank yourself later when you need to modify this; trust me.

Well, if your code works on individual strings (which i dont understand why it manages to work on test@myaaatest.com , but thats a gap in my knowledge), why not $matches = preg_grep(<YourPatternHere>,explode(’ ',$string)); ?

Here’s the solution:

#(?<=^|\\s)(?![\\w@]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*@[a-z0-9-_.]*\\.(?:com|net|org|info|biz)(?=\\s|$)#im

Function:


function get_emails($str){
  preg_match_all('#(?<=^|\\s)(?![\\w@]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*@[a-z0-9-_.]*\\.(?:com|net|org|info|biz)(?=\\s|$)#im', $str, $output);
    if(is_array($output[0]) && count($output[0])>0) {
      return array_unique($output[0]);
    		}
}

Simple and optimized code as I love him!

Cordially