SitePoint Sponsor |
|
User Tag List
Results 1 to 24 of 24
Thread: simplify this using regex?
-
May 21, 2006, 15:46 #1
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
simplify this using regex?
Hello,
I have this code which basically grabs the date after several variations of the text last updated: so I can store it into mysql date format in the DB.
The problem is its not very effecient when it comes to the dates since different websites have them formatted differently.
Here is the current code:
PHP Code:$upos = strpos($whois['data'], "updated date:");
if (!($upos === false)) {
$upddate = substr($whois['data'], $upos+14, 11);
} else {
$upos2 = strpos($whois['data'], "record last updated on");
if (!($upos2 === false)) {
$upddate = substr($whois['data'], $upos2+24, 11);
} else {
$upos3 = strpos($whois['data'], "last updated on:");
if (!($upos3 === false)) {
$upddate = substr($whois['data'], $upos3+16, 12);
} else {
$upos4 = strpos($whois['data'], "last updated on");
if (!($upos4 === false)) {
$upddate = substr($whois['data'], $upos4+16, 11);
} else {
$upddate = "";
}
}
}
}
The dates are all formatted differently so the 11 characters doesnt always catch all the dates and cuts some info off. I run them thru strtotime but that doesnt always work because the date is cut off.
Sometimes the dates are 12-jan-2006 or sat, jan 12, 2006 or jan 12, 2006 so on and so forth.
Is how I am doing this the best way or would using multiple regex be better for this, if it is can you show me an example of one that would work?
-
May 21, 2006, 16:47 #2
What exactly are you trying to do?
Location: Alicante (Spain)... Hot and Sunny...
Texas Holdem Poker Probability Calculator | DNS test
Avatars | English Spanish Translation | CAPTCHA with audio
Email | PHP scripts | Cruft free domain names | MD5 Cracker
-
May 21, 2006, 16:49 #3
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Pull out the date out of here:
Last Update: 12-jan-2006
Record Last updated on sat, jan 12, 2006
Last updated on jan 12, 2006
Updated date: 12-jan-2006
These are just examples but basically its these 4 phrases followed by a date of some kind, its that date that I want.
-
May 21, 2006, 20:07 #4
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Anybody have any ideas or suggestoins on this?
-
May 22, 2006, 00:04 #5
The following works for your examples:
PHP Code:$timestamp = strtotime(
preg_replace(
'/^.*(\d?\d-[a-z]{3}-\d{4}|[a-z]{3} \d?\d, \d{4}).*$/is',
'$1',
$whois['data']
)
);
Location: Alicante (Spain)... Hot and Sunny...
Texas Holdem Poker Probability Calculator | DNS test
Avatars | English Spanish Translation | CAPTCHA with audio
Email | PHP scripts | Cruft free domain names | MD5 Cracker
-
May 22, 2006, 08:20 #6
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
It is very very close but these are some of the results I have came up with:
Code:Updated date: 12-jan-2006 :: 2-jan-2006
Code:Last Update: 12-jan-2006 :: 2-jan-2006
I am not sure if this is searching for just a date but if it is I need it to include the last updated: and different forms of that because there are many dates on the page I am looking for only the one that deals with the updates.
-
May 22, 2006, 09:02 #7
this will work for the date formats you posted:
PHP Code:$pattern = '#update.*?((?:\d{1,2}-.*?-\d{4})|(?:[a-z]{3},\s+.*?\d{4})|(?:[a-z]{3}\s+\d{1,2},\s+\d{4}))#i';
preg_match_all($pattern, $whois['data'], $matches);
// matches will be in the $matches[1] array
-
May 22, 2006, 09:05 #8
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Ok I got it this far but i am going to need help finishing it:
PHP Code:$test[] = "asdfasdfasdfadsasdf last update: 12-jan-2006 asdfasdfasdfasdfasdfasdf";
$test[] = "fsdfasdfasdfasdfasdfa record last updated on sat, jan 12, 2006 vavsdfasdfasdfasdfasdfasdfasdf";
$test[] = "dafasdfasdfadf last updated on jan 12, 2006 asdfasdfasdfasdfasdfasdf";
$test[] = "<table><tr><td>Other Junk 15-jan-2006</td><td>Updated date: 12-jan-2006 </td></tr></table>";
$test[] = "<TR><TD width='30%' valign='top'>Creation Date:</TD><TD width='70%'>Sep 18 2004 </TD></TR>";
foreach($test as $searchstr)
{
$search = array ('~updated date: (.*?)~si',
'~last updated on (.*?)~si',
'~last update: (.*?)~si',
'~Creation Date:</TD><TD width=\'70%\'>(.*?)~si');
$replace = '$1';
$text = preg_replace($search, $replace, $searchstr);
echo "Search: ".htmlentities($searchstr)."<br />Result: ";
print_r(htmlentities($text));
echo "<br /><br />";
}
It is only removing the words in the array its not removing everything including those words except the date.
I need to keep 12 characters involved in the (.*?) starting from that location on.
So the end result should be the date in watever format it is in every time.
-
May 22, 2006, 09:07 #9
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Thank you aamonkey let me test that out real quick with the example I posted.
-
May 22, 2006, 09:11 #10
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Awsome I tried this code:
PHP Code:$test[] = "asdfasdfasdfadsasdf last update: 12-jan-2006 asdfasdfasdfasdfasdfasdf";
$test[] = "fsdfasdfasdfasdfasdfa record last updated on sat, jan 12, 2006 vavsdfasdfasdfasdfasdfasdfasdf";
$test[] = "dafasdfasdfadf last updated on jan 12, 2006 asdfasdfasdfasdfasdfasdf";
$test[] = "<table><tr><td>Other Junk 15-jan-2006</td><td>Updated date: 12-jan-2006 </td></tr></table>";
$test[] = "<TR><TD width='30%' valign='top'>Updated Date:</TD><TD width='70%'>Sep 18 2004 </TD></TR>";
$pattern = '#update.*?((?:\d{1,2}-.*?-\d{4})|(?:[a-z]{3},\s+.*?\d{4})|(?:[a-z]{3}\s+\d{1,2},\s+\d{4}))#i';
foreach($test as $searchstr)
{
preg_match_all($pattern, $searchstr, $matches);
// matches will be in the $matches[1] array
echo "Search: ".htmlentities($searchstr)."<br />Result: ";
print_r($matches[1]);
echo "<br /><br />";
}
Code:Search: asdfasdfasdfadsasdf last update: 12-jan-2006 asdfasdfasdfasdfasdfasdf Result: Array ( [0] => 12-jan-2006 ) Search: fsdfasdfasdfasdfasdfa record last updated on sat, jan 12, 2006 vavsdfasdfasdfasdfasdfasdfasdf Result: Array ( [0] => sat, jan 12, 2006 ) Search: dafasdfasdfadf last updated on jan 12, 2006 asdfasdfasdfasdfasdfasdf Result: Array ( [0] => jan 12, 2006 ) Search: <table><tr><td>Other Junk 15-jan-2006</td><td>Updated date: 12-jan-2006 </td></tr></table> Result: Array ( [0] => 12-jan-2006 ) Search: <TR><TD width='30%' valign='top'>Updated Date:</TD><TD width='70%'>Sep 18 2004 </TD></TR> Result: Array ( )
-
May 22, 2006, 09:21 #11
sure, just change the pattern to this:
PHP Code:$pattern = '#update.*?((?:\d{1,2}-.*?-\d{4})|(?:[a-z]{3},?\s+.*?\d{4})|(?:[a-z]{3}\s+\d{1,2},\s+\d{4}))#i';
-
May 22, 2006, 09:24 #12
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I have never seen more then 50 - 60 characters between them, should that be ok?
That pattern worked perfectly by the way thank you so much!
-
May 22, 2006, 09:35 #13
Originally Posted by Xiosen
Location: Alicante (Spain)... Hot and Sunny...
Texas Holdem Poker Probability Calculator | DNS test
Avatars | English Spanish Translation | CAPTCHA with audio
Email | PHP scripts | Cruft free domain names | MD5 Cracker
-
May 22, 2006, 09:38 #14
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Because I dont understand regex or whats needed for it. You asked me and I replied with 4 examples, I said theres 4 phrases and its that date AFTER it that I want. I never said that I wanted every date.
I appreciate you taking the time, im sorry that we had a misunderstanding but I cant pretend its what I wanted when it will not work.
-
May 22, 2006, 11:00 #15
- Join Date
- May 2004
- Location
- Braga, Portugal
- Posts
- 596
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
If you already made the jump to PHP 5 you can use strptime() which will give you an array you can use however you wish
~ Daniel Macedo
-
May 22, 2006, 11:25 #16
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
I wish I had made the jump lol, sounds alot better.
aamonkey or someone else reading this thread I one more modification it looks like for example look at the current example:
Code:Search: asdkflaskdfasldflk last updated on sept 18 2008 asdfjaklsdfjalk;sfd Result: Array ( [0] => ept 18 2008 ) Search: kadlkfajlkdfjlds last updated on september 17, 2008 asdfaksjdflaskd Result: Array ( [0] => ber 17, 2008 )
I guess whats the best way to detect the whole month whether its written in full or abbreviated.
-
May 22, 2006, 11:29 #17
- Join Date
- May 2004
- Location
- Braga, Portugal
- Posts
- 596
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Check the PEAR PHP_Compat to see if it's already there...
~ Daniel Macedo
-
May 22, 2006, 11:31 #18
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Code:Sorry, but we didn't find anything that matches "strptime"
-
May 22, 2006, 11:31 #19
$pattern = '#update.*?((?:\d{1,2}-.+?-\d{4})|(?:[a-z]{3,},?\s+.*?\d{4})|(?:[a-z]{3,}\s+\d{1,2},\s+\d{4}))#i';
-
May 22, 2006, 12:03 #20
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
Your code worked perfect for that last part!!
Now I got a new one problem (I think its the last haha), what do I do if its already in YYYY-MM-DD format such as:
Code:Search: asdfaksdfjlaskdf updates: 2008-07-27 00:04:28 alsdjfalksdjflaksjfdkasdf Result: Array ( )
EDIT: Oops aamonkey I didnt receive a notification of your last post let me try that out!! Thank you so much!!!!!!!!!
-
May 22, 2006, 12:09 #21
do you want the HH:MM:SS to be grabbed, too?
-
May 22, 2006, 12:10 #22
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
No thank you just the date if you could! Your a lifesaver!
-
May 22, 2006, 12:13 #23
$pattern = '#update.*?((?:\d{1,2}-.+?-\d{4})|(?:[a-z]{3,},?\s+.*?\d{4})|(?:[a-z]{3,}\s+\d{1,2},\s+\d{4})|(?:\d{4}-\d{2}-\d{2}))#i';
-
May 22, 2006, 12:14 #24
- Join Date
- Jun 2005
- Posts
- 257
- Mentioned
- 0 Post(s)
- Tagged
- 0 Thread(s)
awsome! I owe ya big time! Your like a regex god.
Bookmarks