Hi,
In the article ‘How to Avoid 404s and Redirect Old URLs in PHP’ in step three it says:
We’ll place our redirection code in another file named redirect.php, to keep the functionality separate from the 404 content.
Add the following code at the top of your 404.php file just after the
<?php
include('redirect.php');
Now create redirect.php in the website root and add the following code:
<?php
// current address
$oldurl = strtolower($_SERVER['REQUEST_URI']);
// new redirect address
$newurl = '';
This code gets the url that the user tried but apache/IIS could not find, so apache/IIS has loaded the 404.php page and passed the original url request via the $_SERVER[‘REQUEST_URI’] parameter. If you have done like the article suggested and create a redirect.php then you would start with the $oldurl = strtolower($_SERVER[‘REQUEST_URI’]); in the first line.
I then wrote some code that you might like to use as it will automatically assign the best match rather than you having to do it via an associative array. All it needs is a list of valid links on your newest site.
I have modified the code to be less ridged; it no longer needs an exact match. It will do an exact match or a regular expression search on the text that makes up the path of the url. So try doing this with the code I provide:
- Create the 404.php and have your webserver set non-found urls to this page
- Ensure that the redirect.php is the first include (the first thing in) the redirector file and ensure that you set the $old_url to strtolower(htmlentities($_POST[‘requested_url’]));
- Then create your correct urls; those that you want old links to redirect to in the $new_urls array. List them with their full path include the ‘http://’ or ‘https://’, the domain and the directory path and the file with extension name like: http://www.mysite.com/contact_us.php
- Then make sure all the code I have done is in the redirect php; in the code below I include steps 2,3 so you can simple copy this in its’ entirety to the redirect.php
The redirect.php code:
<?php
$old_url = strtolower(htmlentities($_POST['requested_url']));
/* Change these to your site urls that you want to redirect
*to if an appropriate match is found
*/
$new_urls = array(
'http://www.mysite.com/blog.php'
, 'http://www.mysite.com/contact.php'
, 'http://www.mysite.com/article1/story.php'
, 'http://www.el.net/sign_in.php'
, 'http://www.mysite.com/article2/story.php'
, 'http://www.mysite.com/article1/story.php'
, 'http://www.mysite.com/blog/article1/story.php'
);
$o_Url = new GetMatchedUrl($old_url, $new_urls);
$redirection_url = $o_Url->run();
/* Change this to the default page you want
* to redirect to if no redirect is set
*/
$default_redirect = 'http:www.mysite.com';
$o_Redirector = new Redirector($default_redirect);
$o_Redirector->setRedirect($redirection_url);
$o_Redirector->go();
/*******************************/
/********** CLASSES *************/
/*******************************/
Class Redirector {
protected $redirect;
protected $o_Session;
protected $default_redirect;
function __construct($default_redirect) {
$this->default_redirect = $default_redirect;
}
public function setRedirect($redirect) {
$this->redirect = $redirect;
}
public function go() {
$this->redirect();
}
protected function redirect() {
header('Location: '.$this->redirect);
exit();
}
public function getRedirector(array $urls = null){
$default_urls = array(
'http://www.google.ca'
,'http://ca.yahoo.com/'
,'http://www.bing.com/?cc=ca'
);
if($urls != null){
shuffle($urls);
return $urls[0];
} else {
shuffle($default_urls);
return $default_urls[0];
}
}
}
Class GetMatchedUrl {
protected $old_url;
protected $new_urls;
protected $host;
protected $hit_list;
public function __construct($old_url, $new_urls){
$this->old_url = $old_url;
$this->new_urls = $new_urls;
$this->host = '';
$this->path_segments = array();
$this->hit_list = array();
}
public function run(){
$result = $this->parseUrl();
if($result == 'Search Terms Found'){
$match = $this->setClosestMatch();
}
return $match;
}
protected function parseUrl(){
$old_url_parts = parse_url($this->old_url);
$this->host = $old_url_parts['host'];
$this->path_segments = explode('/', $old_url_parts['path']);
/* Trim extensions from the end
* of url for easier comparison
*/
$last_part_num = count( $this->path_segments) - 1;
/* This approach won't work if you
* have domains like http://www.mysite.com/first.part.php
*/
$last_part = explode('.', $this->path_segments[$last_part_num]);
foreach($last_part as $part){
switch($part){
case 'php':;
case 'htm' :;
case 'html':;
break;
default:
// reset the last part to the page name minus the .php
$this->path_segments[$last_part_num] = $part;
}
}
if($last_part_num > 0){
// path has search terms
return 'Search Terms Found';
} else {
// path does not have search terms
return 'No Search Terms';
}
}
protected function setClosestMatch(){
$hit_list = array();
foreach($this->new_urls as $url){
$hit_list[$url] = 0; //set the current url's hit list to empty (no matches yet)
$url_parts = parse_url($url);
$path_segments = explode('/', $url_parts['path']);
$last_part_num = count($path_segments) - 1;
$last_part = explode('.', $path_segments[$last_part_num]);
foreach($last_part as $part){
switch($part){
case 'php':;
case 'htm' :;
case 'html':;
case 'shtml':;
break;
default:
// reset the last part to the page name minus the .php
$path_segments[$last_part_num] = $part;
}
}
/*
* Check which item has the most hits
*/
foreach($this->path_segments as $old_part){
if(in_array($old_part, $path_segments)){
$hit_list[$url]++;
} elseif(preg_grep("/$old_part/", $path_segments)){
// The simple regex matches potential partial parts of a url
// i.e. old url www.oldsite.com/contact_us.htm will be
// matched to www.newsite.com/ contact.php
// or it will match www.same_domain_but_changed_page_names.net/contact.shtml
$hit_list[$url]++;
}
}
}
// sort array to get the highest number of hits to the end of the array
asort($hit_list);
// push the array pointer to the last array item
end($hit_list);
// return the web address of the new site which best matches
return key($hit_list);
}
}
?>
I configured my test apache server with a custom 404.php page and then used a redirector.php file included at the top of it. and ran the same code as I show above only with my domains urls and then tested a bunch of changed domain names, it succesfully matched most of the time. When it didn’t it gave me the custom 404.php error that allows them to click a link to the main site. It worked nicely.
Hope this helps.
Steve