Using index.php to hide your programming language

Hi there, I’m using the book “Build your own database driven website using PHP and MySQL”. It suggests that by making an individual folder, creating a file index.php, which will be called as the default file, you can mask which programming language you’re using.

So for example:

mywebsite/mycode/ will appear

instead of
mywebsite/mycode/register.php

because the index.php will be a default it won’t show, and you can therefore, use inserts, to bring code into the index.php file, keeping your functionality, but removing the extension of the filename masking your language.

Ok so that all makes sense to me (hopefully to you too at this stage). But my question is, if I use this technique, isn’t there a possibility that I could lose some SEO value? I’m a complete noob, so I’m just thinking out loud really and seeing if anyone has an input.

Can you explain? Hiding your extension will not make your code automatically more secure. All php, unless there is a bug, gets interpreted by the browser prior to load. Nobody can view your source and see table names. They can use SQL injection in your input boxes to see table names and whatnot. They can also see your code if they hack the server/domain, which then you are SOL.

I think what you’re saying is that the book suggests that you use the include() statement to use code from other files? I can’t really see any benfit to doing this - a better way to to hide your file extensions would be to use the .htaccess file to rewrite your URLs to exclude them.

Include() is a function that you can add things to a file without needing to type. On my site, you’ll see I have a (YUI) menu bar. View the source, you’ll see a big unordered list making my menu. Surely I don’t type or copy/paste that in every new file.

I simply do:

include("path/file.html");

which spits out the menu when you load the page. If you open menu.html up, it contains only the menu code, not <html> and <body> tags.

Does this help?

I see the benefits of includes. That’s not the issue.

I see the benefits of not telling the world which programming language you are using. That’s not the issue.

I understand that this is not a failsafe way of protecting my site.

There is not one single failsafe way to do this, but this is a basic way to implement first level security, and a good habit, they suggest.

I’m not really deep enough into it so start thinking about how to tie up security so that nobody can hack my “hello world” esqe scripts. I’m sure hackers would find it utterly untesting.

I guess my question isn’t clear even to myself. I was probably saying, should I make a new page for each page, or should I use index.php for all say 5 pages on a basic website.

I suppose my own answer is make seperate pages so that they can be bookmarked. And then the obvious answer is if I still want to use the technique shown I make a folder called contact, about etc etc etc, each can then contain an index.php applying the principles I’m being taught.

My apologies for the ramble it appears I untangled my own brain.

Thanks for helping anyway, I know this is probably very unclear to you all. Mind dump!

and before you say it, yes I understand that you wouldn’t use a folder structure like that, and since you’re probably not going to in the real world, it then makes the htaccess solution a more obvious solution. :slight_smile:

Correct, making directories would be overkill. You could have contact.php but have it redirect to yoursite.com/contact/ if you wished. I think you then want the SEO pitfalls of doing that. Which I don’t know, maybe we can move ths to the SEO sub forum.

I think initially my thoughts were

index.php

if home
includes (somecontent)

if about
includes (somecontent)

thereby having only one page with various bits of content. That’s when I thought how does this effect SEO.

I think I’ve got my head around it now, but then again if anyone has an input I’ll happily listen as I’m an ubernoob. If you want to move it to a new forum that’s cool by me, I didn’t mean to place it incorrectly :slight_smile:

Search engines see what the browser sees on render not what we see as developers. see post #4. The include function takes file x and puts it into file y on render just like if you copied the contents of file y and pasted it in file x. Thus making it appear to everybody except the server appear as one file.

I think this is fine in PHP. It seems as though you don’t understand how include() works, vs seo. Have you read the manual’s include info?

I guess I was just confused, because index.php would surely be what the SE cached and it could potentially be different depending on different circumstances. i.e. it could be set to show different content dependant on differing variables. I know you would normally use a DB for this, but I’m just trying to understand a bit at this stage.

If it’s possible to have two different bits of content, then which does the SE see, I guess the default one, therefore, not including the other in it’s indexing right?

Search engines use the URL to identify the page. To get different content served the URL must vary in some way, otherwise how would your script know what content to send?

So the fact that the same physical file on the server – index.php – is executed doesn’t matter, as long as the URLs are different for different pages (i.e. there is a rewrite rule directing them to index.php)

Having said that Matt Cutts has said in the past there can be some benefit to having a .html or other extension to look more like a “page” rather than a folder. It’s not that SEs won’t understand, it’s more that when users are looking at SERP they feel more confident seeing a .html extension. They know what they’ll get.

Hi,
I don’t know if this will help, but I think I understand what you want. I use this on htaccess:

(you can name this file anything you want, it doesn’t have to be index)


<Files index>
ForceType application/x-httpd-php
AcceptPathInfo On
</Files>

Take the index.php page and using an editor or ftp and remove the .php extension…

The code above will show /index/ as a directory even though it’s really a page.

Then on the /index/ page use url array to get the specific document you need using if case…

like this:


<?php

    //when using url_array...the document is always [0]..then each after is 1,2,3.. and so on..

    $url_array = explode("/",$_SERVER['PATH_INFO']);

    $doc_title = $url_array[1];

if ($doc_title = "home") {
include("/docroot/to/home.php");
} elseif ($doc_title = "about") {
include("/docroot/to/about.php");
}else{
include("/docroot/to/default.php");
}

?>


Or you can use case switch and achieve the same thing.


      <?php

    $url_array = explode("/",$_SERVER['PATH_INFO']);

    $doc_title = $url_array[1];

switch ($doc_title) {
    case 'home':
        include("/docroot/to/home.php");
        break;
    case 'about':
        include("/docroot/to/about.php");
        break;
    default:
       include("/docroot/to/default.php");
}
?>          

Then in your browser address bar you see this yoursite.com/index/home/

What will output will be the home.php page.

or

yoursite.com/index/about/ will show your about.php page… if you type in just yoursite.com/index/ the default.php page will show.

I don’t pretend to know everything, but it’s what I used before and it worked well for me.

Thanks,
Kevin

I just thought about what cranial-bore wrote and maybe this will work better for people to see.

have yoursite.com/index/about/index.html

The /about/ will still show about.php but the index.html will be url_array[2] which means nothing. So the person looking at your url on the search engine page will think it’s an .html file even though it’s .php.

It’s really not being totally dishonest, but I use it and don’t find problems with it as long as the program isn’t doing anything that will harm.

My 2 cents, please correct me if wrong :).

Thanks,
Kevin

It depends on where we are talking. by default a search engine will index http://www.example.com/ faster than http://www.example.com/directory/file.[html|php|asp], unless there is a ton of inbound links to that page. If there is a lot of links to that page, then it will gain more weight than your root.

A search engine treats http://www.example.com/ the same as http://www.example.com/index.[html|php|asp] (as well as default.asp). The search engine knows that most domains route http://www.example.com/ through an index.[html|php|asp] (or default.asp for some asp servers).

I personally think it is a bad idea to hide/shove stuff in one file and pull stuff up by variable. If you have an about and contact section, make those different pages versus magically appearing.

Think about this: if you have your contact info hidden and only shown on click nobody can bookmark it:(

Hard to say without seeing code.

I am not sure what I think about Kevin’s posts. Might be a pain to link, add a directory.

I’d suggest throwing the book away - author’s an idiot for making that suggestion. Reasons

  1. mod_rewrite can be used just as effectively than 3 dozen index.php files to “hide” server technology. More importantly, it allows you to determine paths dynamically, something multiple indexes will not do.
  2. Making an index.php to “hide” is worthless anyway – “/path/to/dir” resolves, but so does “/path/to/dir/index.php” which betrays your server tech.
  3. Unless you override it in php.ini and the webserver’s own settings, the webserver will announce the parsing engine(s) in use.
  4. Multiple points of code entry is less secure than one landing file.
  5. Masking the web parsing language is not the end all, be all of securing any site. Frankly, as concerns go it’s pretty low on the list, especially for PHP because PHP is so widespread if you do hide your parsing language PHP is going to be their first guess anyway.

Hi,
Some great tips here.
I would like to clarify. I don’t do this method to hide the programing language. I do it for search engine friendly urls. Instead of:
index.php?this=that&those=these&for=nothing
You have /index/that/these/nothing… and so on.
You all are awesome and I enjoy coming here for some valuable information!
Thanks,
Kevin

mod_rewrite is still a better solution since it allows you to arbitrarily determine the path to the site pages without having to make an actual file to serve them. The search engine cannot know the difference.

Lots of great info guys, thanks so much. I’m not sure it all is exactly relating to my initial hazy enquiry, but it has all certainly added to my knowledge and is much appreciated. :slight_smile:

I’m guessing that people who enter a .html or feel more comfortable with it are people who are computer literate, and I’m thinking in terms of usability it would likely be best, not to use it (shorter domain name - easier to read links).

Therefore, it would be best to have some type of auto redirect i.e. if I type sitepoint.com/about/hallelujah.html it redirects to sitepoint.com/about/hallelujah to correct the error of someone deciding all pages are .html etc.

I think it was probably more relevant a few years ago when most people had to type the extension. Still, you should always take account of peoples needs/habits to ensure maximum usability :slight_smile:

It’s soooo funny, but… right after posting what I posted… google ripped me a new one and now almost all of my sites aren’t even in the top 100 !!! LOL :rofl:
Most were consistently in the top 20 for almost a year and now … pfft… LOL

Sometimes, Google…sometimes! :x:nono:

Just to add my views, I always go through one main index.php file - for everything. My modules are loaded using module=<name> and every action within each module is selected using a switch.

Also in my main index.php I use a switch for any actions which aren’t in modules - eg login, logout etc.

I gave up trying to obfuscate using the index.php hidden under a directory name… i kept getting my file paths confused :smiley: