SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 31
  1. #1
    SitePoint Enthusiast cajebo's Avatar
    Join Date
    Aug 2003
    Location
    Dayton Ohio
    Posts
    62
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Question Urls without extensions

    Greeting all, and Happy Valentines day.

    Iíve been through quite a number of places on the interwebs this morning searching for the htaccess rules, or set-up, that will allow for stipping of the .php or .html from the url displayed in the address bar in a browser.

    Iíve struck out so far.

    Has anyone a suggestion or definitive set of .htaccess rules for invoking this? Even the ones from an article here from years ago failed me.

    The domain will be php based, static files, and have but 10 to 15 pages. So, all I'm after really is having/forcing something like
    Code:
    xxxxxxxx.com/all-about-company.php
    to be displayed as
    Code:
    xxxxxxxx.com/all-about-company/
    in the address bar when loaded.

    Iím on a linux server.

    Thanks in advance for any crumbs.

    Michael
    Let me be confident enough to be humble.

  2. #2
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    First off, you are approaching it backwards.

    Think of it like so, you want to go to /all-about-company/ and have it serve up all-about-company.php

    For example:
    Code:
    RewriteEngine on
    # if a directory or a file exists, use it directly
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    
    # take the url and append .php to it
    RewriteRule ^([a-z0-9-]+)$ $1.php

  3. #3
    SitePoint Enthusiast cajebo's Avatar
    Join Date
    Aug 2003
    Location
    Dayton Ohio
    Posts
    62
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thanks cpradio for the quick reply. And I see we're 'neighbors', as I'm punching keys here from Miamisburg.

    Will this addition to the htaccess file then 'strip' the extension from the page loaded?


    And my naivety is the result of being hand-held in these matters from use of WP for a number of years, and that I didn't really worry about it the development of smaller sites.
    Let me be confident enough to be humble.

  4. #4
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    RewriteRule ^([a-z0-9-]+)$ $1.php
    I'd like to add something real quick, because I'm only now discovering that it's been a recurring topic on these forums for a while.

    For rewrites such as this, it's actually standard -- and recommended in the Apache documentation -- to match on (.*) rather than ([a-z0-9-]+). The former is simpler, is more accurate (match any URL), and has no drawbacks.
    "First make it work. Then make it better."

  5. #5
    SitePoint Enthusiast cajebo's Avatar
    Join Date
    Aug 2003
    Location
    Dayton Ohio
    Posts
    62
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I smell the grok that may be coming, but still a bit wobbly. The snippet above does allow the following to resolve correctly in the browser address bar:

    Code:
    <!doctype html>
    <html>
    <head>
    <meta charset="UTF-8">
    <title>Hello world</title>
    </head>
    
    
    <body>
        <h2>Hello world</h2>
    
    <p>These are two links to check the stripping of the url's extension</p>
    
    <ul>
        <li><a href="test">Test page</a></li>
        <li><a href="test2">Test page 2</a></li>
    </ul>
    </body>
    </html>
    Whereas, I have two files named "test.php" and "test2.php"

    Is that the magic for which I seek, simply stripping the extension off the 'link' and let the snippet resolve the files with .php and then load them, but do so with the link url, without the extension?

    If so, I can see the validity of cpradio's comment that I was looking at it 'backwards'.

    Is the other method un-doable though, just curious.
    That is given
    Code:
    <li><a href="test.php">Test page</a></li>
    is there not a way in which that link is loaded, but with the .php stripped in the address bar? Again, just curious.

    Thanks again, to both cpradio and Jeff


    Michael
    Let me be confident enough to be humble.

  6. #6
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cajebo View Post
    Is that the magic for which I seek, simply stripping the extension off the 'link' and let the snippet resolve the files with .php and then load them, but do so with the link url, without the extension?
    That is indeed the magic.

    Quote Originally Posted by cajebo View Post
    is there not a way in which that link is loaded, but with the .php stripped in the address bar? Again, just curious.
    Sort of, yes. You can have Apache send a redirect response. That will cause the browser to re-fetch the page at the new URL, thereby changing the URL in the address bar. But there's a problem in this situation. Using both this redirect rule along with the earlier rewrite rule is likely to create an infinite loop. A .php URL would redirect to a bare URL, then the bare URL is rewritten to a .php URL, but .php URLs redirect to the bare URL, and so on. I suspect there's some trick to get around that, but until someone here can figure out what that trick is, you may have to skip this feature for now.
    "First make it work. Then make it better."

  7. #7
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Jeff Mott View Post
    has no drawbacks.
    XSS? no drawbacks? I hardly think so Try doing an XSS attack with my implementation Granted this one isn't appending the match to a URL parameter, so the deed is less likely, but I digress, better to give examples that if used differently won't open up such attacks, than provide ones that could permit such attacks.

  8. #8
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Off Topic:

    My Rant
    Sorry, but this irks me, please note that ALL uses of (.*) shown at http://httpd.apache.org/docs/2.2/rewrite/intro.html (assuming this is the reference you are referring to) is for PATHS and FILENAMES, definitely never used with VARIABLES.

    Granted, we are talking file names here so I will concede you can use it, however, I still never recommend it. As you are allowing more than what may be intended. It may be intended that you only consider the root directory, using (.*) permits ALL directories, that could very well be a security issue if you have admin pages, or files in folders you did not want to be a part of this rewriterule.


    Back to the topic at hand, Jeff is correct you will end up in an infinite loop if you try to redirect *.php to its non-extension form, and you have a rule that internally redirects the non-extension form to .php. You will want to program the links to all of your pages to not use the extension so people browsing will never see the extension.

  9. #9
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Off Topic:

    Quote Originally Posted by cpradio View Post
    XSS?
    Can you give an example how (.*) would permit XSS?

    Quote Originally Posted by cpradio View Post
    It may be intended that you only consider the root directory, using (.*) permits ALL directories
    Indeed. If it's not your intention to match any URL, then that's a case where you shouldn't match on any character. BUT if you do want to match any URL, then matching any character is exactly what you need.
    "First make it work. Then make it better."

  10. #10
    SitePoint Enthusiast cajebo's Avatar
    Join Date
    Aug 2003
    Location
    Dayton Ohio
    Posts
    62
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Thanks again, to both cpradio and Jeff

    All's well.

    Thanks again for the assistance.

    To cpradio for the snippet, and to pointing out the control of 'not thinking backwards'

    And to Jeff for clarification and explanation.

    the toy site is here for now: http://ourperfectnight.com/testing-rewrite/


    Cheers from Southwest Ohio,


    Michael
    Let me be confident enough to be humble.

  11. #11
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Jeff Mott View Post
    Can you give an example how (.*) would permit XSS?
    Code:
    RewriteEngine on
    # if a directory or a file exists, use it directly
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    
    # take the url and append .php to it
    RewriteRule ^(.*)$ index.php?name=$1
    Sample URL (if field is not properly protected, it will output a script tag that loads an external JavaScript file):
    Code:
    mydomain.com/%3Cscript+type%3D%22text%2Fjavascript%22+src%3D%22myotherdomain.com%2Fmyscript.js%22%3E%3C%2Fscript%3E

  12. #12
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cajebo View Post
    All's well.

    Thanks again for the assistance.

    To cpradio for the snippet, and to pointing out the control of 'not thinking backwards'

    And to Jeff for clarification and explanation.

    the toy site is here for now: http://ourperfectnight.com/testing-rewrite/


    Cheers from Southwest Ohio,


    Michael
    Glad it worked, it is a common mistake that everyone makes when getting started with RewriteRules. We all tend to think it takes *.php and redirects to /*/, not sure why that is, but I know I made that mistake early on too

  13. #13
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cajebo View Post
    All's well.

    Thanks again for the assistance.

    To cpradio for the snippet, and to pointing out the control of 'not thinking backwards'

    And to Jeff for clarification and explanation.

    the toy site is here for now: http://ourperfectnight.com/testing-rewrite/


    Cheers from Southwest Ohio,


    Michael
    Glad it worked, it is a common mistake that everyone makes when getting started with RewriteRules. We all tend to think it takes *.php and redirects to /*/, not sure why that is, but I know I made that mistake early on too

  14. #14
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    Code:
    RewriteEngine on
    # if a directory or a file exists, use it directly
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    
    # take the url and append .php to it
    RewriteRule ^([a-z0-9-]+)$ index.php?name=$1
    Sample URL (if field is not properly protected, it will output a script tag that loads an external JavaScript file):
    Code:
    mydomain.com/%3Cscript+type%3D%22text%2Fjavascript%22+src%3D%22myotherdomain.com%2Fmyscript.js%22%3E%3C%2Fscript%3E
    The part I bolded above is crucial. Your scenario assumes that we failed to use htmlspecialchars, and that is what would really allow XSS. The URL is always going to be provided by the user, and therefore tainted, whether we rewrite it or not.
    "First make it work. Then make it better."

  15. #15
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cajebo View Post
    All's well.

    Thanks again for the assistance.

    To cpradio for the snippet, and to pointing out the control of 'not thinking backwards'

    And to Jeff for clarification and explanation.

    the toy site is here for now: http://ourperfectnight.com/testing-rewrite/


    Cheers from Southwest Ohio,


    Michael
    I apologize for the tangent in the conversation. I'm still toying with ways to solve your second request, to remove .php from the address bar. I don't have a solution yet, but here's a quick preview of what I'm toying with.

    Code:
        # If the request is not a subrequest (that is, not a rewritten URL)
        <If "%{IS_SUBREQ} == 'false'">
    
            # Then redirect to the bare URL
            RedirectMatch ^(.*)\.php$ $1
    
        </If>
    "First make it work. Then make it better."

  16. #16
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Jeff Mott View Post
    The part I bolded above is crucial. Your scenario assumes that we failed to use htmlspecialchars, and that is what would really allow XSS. The URL is always going to be provided by the user, and therefore tainted, whether we rewrite it or not.
    Yes it is, but I'd still argue it is better to assume the worse and provide something similar that matches plenty but excludes <, >, etc to harden their code than to provide a match everything and allow it to have another path of entry to a vulnerability.

    You could always use [a-z0-9/\s-]+ and that will in most cases be everything you need to support from a file name stand point. Why capture < and >?

  17. #17
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    I'd still argue it is better to assume the worse and provide something similar that matches plenty but excludes <, >, etc to harden their code
    I'd argue that you're trying to plug a hole... but missing the hole. .* does not allow XSS, nor does avoiding .* prevent XSS.
    "First make it work. Then make it better."

  18. #18
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Jeff Mott View Post
    I'd argue that you're trying to plug a hole... but missing the hole. .* does not allow XSS, nor does avoiding .* prevent XSS.
    Yes, I can agree with that, but it still teaches a good paradigm, validating your input or only taking what you need. It is one more layer of validation, stripping out what isn't necessary (granted, it only strips out the invalid if it runs your rewriterule, so I concede that argument). However, I can't think of any valid attempts of naming a file with < or > in it, so I still think it stands to reason to eliminate those characters. I don't know, maybe it was education that taught me to only capture what you need, not everything just to parse it out.

    I've yet to run into a situation where I needed to capture everything. I've always known at least one restrictions or character I could eliminate. Just my opinion, @cajebo ;, you are more than welcome to capture everything, I just strongly recommend against it unless you know for 100% fact you are not opening yourself to an XSS attack of some sort (and if you run any third party software, you can't make that guarantee). Granted you could still have one, but at least you could argue security through obscurity (no one knows where your data is being redirected to -- but that isn't a very fair answer either).

  19. #19
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    I just strongly recommend against it unless you know for 100% fact you are not opening yourself to an XSS attack of some sort
    I really wanted to leave the discussion as it was, but this kept nagging at me. It's worth reiterating that if you're vulnerable to an XSS attack, and you avoid using .*, then you're still vulnerable to an XSS attack. .* is neither the cause, nor is avoiding it the cure.
    "First make it work. Then make it better."

  20. #20
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,653
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Folks,

    IMHO, Jeff Mott is unaware of the security issues as well as poor logic (newbies end up with loopy code for lack of understanding exactly what (.*) does and how it does it).

    [rant #1]
    The use of "lazy regex," specifically the EVERYTHING atom, (.*), and its close relatives, is the NUMBER ONE coding error of newbies BECAUSE it is "greedy." Unless you provide an "exit" from your redirection, you will ALWAYS end up in a loop!
    [/rant #1]

    Arguing is not going to resolve the significant difference of opinion and only provides a platform for spreading confusion. Please allow Jeff his own opinion and continue to create mod_rewrite code correctly, i.e., with the best specification possible in your regex because it will avoid the silly problems which will get dumped on your receiving scripts.

    I've said my peace and will continue my rants against the inappropriate use of (.*) but will not engage in silly discussions of why everyone should use it indiscriminately. I grant that there are those times when (.*) is the best regex to use but only if you know what you're doing with it and, as evidenced above, that is not always the case.

    'Nuf said.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  21. #21
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dklynn View Post
    IMHO, Jeff Mott is unaware of the security issues as well as poor logic (newbies end up with loopy code for lack of understanding exactly what (.*) does and how it does it).
    You've also just accused the authors of Apache itself of being unaware of security issues and unaware of what .* does, because they use it in exactly the same way that you rant against.

    Whether there's a security issue is not a matter of personal opinion. If a security issue exists, then you should be able to demonstrate it.

    Quote Originally Posted by dklynn View Post
    ... continue to create mod_rewrite code correctly, i.e., with the best specification possible in your regex
    You once again insinuated that the authors of Apache itself have been doing it incorrectly. Seems to me that the burden of proof is on you to show that Apache has been doing it wrong all this time.
    "First make it work. Then make it better."

  22. #22
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    I won't proceed with the opinionated arguments further anymore, I'll just agree that we won't see eye to eye on this, however, I will point out (.*) will not work in this scenario for any URL that ends in a /

    test/
    testing/
    my-page/

    Will all report a 404 as it tries to load test/.php, etc.

  23. #23
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,275
    Mentioned
    18 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cpradio View Post
    however, I will point out (.*) will not work in this scenario for any URL that ends in a /

    test/
    testing/
    my-page/

    Will all report a 404 as it tries to load test/.php, etc.
    That kind of URL would fail the rewrite conditions (not a real file or directory), so neither of our rewrites would execute.
    "First make it work. Then make it better."

  24. #24
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Jeff Mott View Post
    That kind of URL would fail the rewrite conditions (not a real file or directory), so neither of our rewrites would execute.
    Mine would (with a small change)... as that directory doesn't exist and I won't match the / (forgot I put the $ on it ... doh!)

    Code:
    RewriteEngine on
    # if a directory or a file exists, use it directly
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    
    RewriteRule ^([a-z0-9-]+) $1.php
    Edit:

    Or

    Code:
    RewriteEngine on
    # if a directory or a file exists, use it directly
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    
    RewriteRule ^([a-z0-9-]+)/?$ $1.php
    Last edited by cpradio; Feb 15, 2013 at 12:11. Reason: added another possible solution that still uses $

  25. #25
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,143
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Whoa! Interesting thought

    Does (.*) produce a directory traversal attack?

    Such as mydomain.com/../test.php


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •