SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Enthusiast
    Join Date
    Apr 2012
    Posts
    70
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Find similar patterns and extract data

    I'm trying to extract "three" and/or "(3)" from all of these patterns. The strings are never the same size and the words are sometimes incorrect, so I can't do an easy preg_match syntax on it.

    There are a lot of similar words that proceed the string though. A perfect sentence is "PRIMARY TERM: This leaseshall remain in force for a primary term of three (3) years from the effective date hereof, and as long thereafter..."

    My initial thoughts are to use similar_text and preg_match, but I haven't thought up a good way just yet. Any ideas how this could be done?

    PHP Code:
    $String '.... remain in force for a primy term OL three (3) years from the effective date hereof, and as lo....';
    $String2 '.... *JE~ in force for a primary torm of three (*B years from the effective date hereof, .......';
    $String3 '.... remain in farce for a primary term OL threA (3 years from the effective date hereof, and as lo....'

  2. #2
    SitePoint Guru bronze trophy
    Join Date
    Feb 2013
    Posts
    742
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Seems a common value is "(" so look for the word before that.
    Note: not doing any spell checking here.

    PHP Code:
    <?php 
    $String 
    '.... remain in force for a primy term OL three (3) years from the effective date hereof, and as lo....';
    $String2 '.... *JE~ in force for a primary torm of three (*B years from the effective date hereof, .......';
    $String3 '.... remain in farce for a primary term OL threA (3 years from the effective date hereof, and as lo....'

    $words explode(" "$String3);
    $keys = array();
    foreach(
    $words as $k => $word){    
        if (
    strpos($word,'(') !== false) {
            
    $keys[] = $k-1;
        }

    foreach(
    $keys as $k){ 
        echo 
    "{$words[$k]}<br />";
    }
    ?>

  3. #3
    SitePoint Enthusiast
    Join Date
    Apr 2012
    Posts
    70
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmm I should have put a 4th string in there as it could be "three ^*B) yars fr0m". I need to figure out how many years it is. Odds are, the three or 3 will come out. Even a good chance both will, so then I'll compare if three = 3 then it's definitely 3 years, but if either are a number then I'll use it.

    Also, the string is just part of a huge document so "(" could come anywhere else. It's OCR too, so it decides where ( comes and gos. My real goal is to match a similarity of "remain in force for a primary term of (*submatch) from". Sorry I could have been more specific from the beginning.

  4. #4
    SitePoint Enthusiast
    Join Date
    Apr 2012
    Posts
    70
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well here's how I did it...

    PHP Code:
    $Matches = array(1=>'one'2=>'two'3=>'three'4=>'four'5=>'five'6=>'six'7=>'seven'8=>'eight'9=>'nine'10=>'ten');
    $Patterns = array('primary term of''remain in force for''this lease shall'); 
    foreach(
    $Patterns as $Pattern) {
        
    $String preg_match("/$Pattern/"$Document) ? substr(next(explode($Pattern$Document)), 0100) : '';    // LOOK FOR AT LEAST 1 PATTERN, GRAB THE NEXT 100 CHARACTERS
        
    if($String) { break; }
    }
    foreach(
    $Matches as $Int => $Word) {
        
    $Year preg_match("/$Int/"$String) || preg_match("/$Word/"$String) ? $Int ''// SEE WHICH WORD OR NUMBER EXISTS
        
    if($Year) { break; }



Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •