SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Rewrite encoded URL

    Hello.
    For an unknown reason some sites link to mine with encoded url (info got from google webmaster tools).
    Something like http://domain.tld/file.php%3Fid%3DUSERid%26cat%3Dsmth instead of http://domain.tld/file.php?id=USERid&cat=smth
    I've tried several things (conditions, rules) from htaccess (httpd.conf) to rewrite encoded urls.
    An approach is to RewriteRule ^file.php(.*)$ index.php?qs=$1 and handle $_GET['qs'] from index.php
    Is it possible to Rewrite the encoded url? Or at least to match %3F and the others.
    Thanks

  2. #2
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,650
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Marcianos,

    First, WELCOME to SitePoint!

    Second, yes, but not the way you're doing it because a RewriteRule can only examine the {REQUEST_URI} string (i.e., NOT the query string).

    Third, why does it matter whether the query string is encoded like that? Apache can handle that and so can PHP. IMHO, just create the proper URIs for your website and let Google worry about storing them on their server.

    If you want to continue, please have a look through the tutorial article linked in my signature and know that encoded characters must be contained (in their natural form) in a character range definition (except the ? which is ONLY permitted as the demarcation between a URI and query string). If you still have questions, come back here (please PM me, too, as I'm not hear as often as I had been when on staff).

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  3. #3
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    First, WELCOME to SitePoint!
    Thank you David!

    Second, yes, but not the way you're doing it because a RewriteRule can only examine the {REQUEST_URI} string (i.e., NOT the query string).
    Maybe I was not clear. From Google Webmaster Tools I am getting some 404 errors it says the referrers are third site.
    I never understand when a string is encoded and displayed decoded by a web browser, client mail or whatever and then sent to the server as I see or as it have received it or as it likes.
    The fact is that GWT displays this address http://domain.tld/file.php%3Fid%3DUSERid%26cat%3Dsmth getting a 404 error from my server. I cannot fix the origin of the problem (external sites) so, to not to lose backlinks I want to rewrite that string to a good url


    Third, why does it matter whether the query string is encoded like that? Apache can handle that and so can PHP. IMHO, just create the proper URIs for your website and let Google worry about storing them on their server.
    I cannot realize my Apache does not handle that. URIs are ok. As Google says, 'bad' adresses are from other sites
    Last edited by TechnoBear; May 23, 2012 at 09:02. Reason: Example URL delinkified

  4. #4
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    David, I found something may be a key.
    In my httpd.conf
    RewriteCond %{HTTP_HOST} ^domain\.com
    RewriteRule ^(.*)$ http://www.domain.com$1 [R=permanent,L]


    More precisely, closest problematic uri is http://www.domain.tld/file.php%3Fid%3DUSERid%26cat%3Dsmth (with www) (=> 404)
    I removed 'www' and the returned url was OK.
    What do you suggest?
    Thank you
    Last edited by TechnoBear; May 23, 2012 at 09:03.

  5. #5
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just looking around I see the same issue at ubuntuforums.org but backwards.
    Regular URIs are w/o 'www'.
    In FF, if I substitute ubuntuforums.org/showthread.php?p=1195677 by ubuntuforums.org/showthread.php%3Fp%3D1195677 the thread is not displayed but main page (its '404' page).
    If I add www. the URI returns the first one and the thread is displayed

  6. #6
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,650
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Marcianos,

    Quote Originally Posted by Marcianos View Post
    Hello.
    For an unknown reason some sites link to mine with encoded url (info got from google webmaster tools).
    Something like http://domain.tld/file.php%3Fid%3DUSERid%26cat%3Dsmth instead of http://domain.tld/file.php?id=USERid&cat=smth
    I've tried several things (conditions, rules) from htaccess (httpd.conf) to rewrite encoded urls.
    An approach is to RewriteRule ^file.php(.*)$ index.php?qs=$1 and handle $_GET['qs'] from index.php
    Is it possible to Rewrite the encoded url? Or at least to match %3F and the others.
    Thanks
    The point of my first post coding comment was that the (.*) above CANNOT access the query string. Fortunately, you've dropped that.

    Quote Originally Posted by Marcianos View Post
    David, I found something may be a key.
    In my httpd.conf
    RewriteCond %{HTTP_HOST} ^domain\.com
    RewriteRule ^(.*)$ http://www.domain.com/$1 [R=permanent,L]


    More precisely, closest problematic uri is http://www.domain.tld/file.php%3Fid%3DUSERid%26cat%3Dsmth (with www) (=> 404)
    I removed 'www' and the returned url was OK.
    What do you suggest?
    Thank you
    Add the / as above OR avoid the question of the / between the domain and path by

    [code]...
    RewriteRule .? http://www.domain.com%{REQUEST_URI} [R=301,L][code]

    After all, your (.*) is already available as the {REQUEST_URI} variable.

    Quote Originally Posted by Marcianos View Post
    Just looking around I see the same issue at ubuntuforums.org but backwards.
    Regular URIs are w/o 'www'.
    In FF, if I substitute ubuntuforums.org/showthread.php?p=1195677 by ubuntuforums.org/showthread.php%3Fp%3D1195677 the thread is not displayed but main page (its '404' page).
    If I add www. the URI returns the first one and the thread is displayed
    Most hosts have configured their DNS servers to include both the www and non-www version of their client domains - yours has apparently not. Therefore, check my signature's tutorial Example Code section for code to force either the www or non-www requests.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  7. #7
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    RewriteRule ^file.php(.*)$ index.php?qs=$1
    The point of my first post coding comment was that the (.*) above CANNOT access the query string. Fortunately, you've dropped that.
    In this particular case there's no query string because Apache does not recognize '?', there's only a %3F. So (.*) gets what is after file.php and send it as a query string (qs) It is working fine now as a partial solution.

    RewriteRule ^(.*)$ http://www.domain.com/$1 [R=permanent,L]
    Actually I already have '...com$1' but it has be '...com/$1' if the rule is placed in .htaccess instead of httpd.conf

    I substituted those both httpd.conf lines by your Code Generator (nice!)
    RewriteCond %{HTTP_HOST} !www\. [NC]
    RewriteRule .? http://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]


    Another (not all) directives in httpd.conf

    ServerAlias www.domain.com ...
    # Handle TRACE request
    RewriteCond %{REQUEST_METHOD} ^TRACE
    RewriteRule .? - [F]

    # retain query string
    RewriteCond %{QUERY_STARING} !''
    RewriteRule .? %{REQUEST_URI}? [QSA,L]

    <Directory /path/to/www>
    Options Indexes IncludesNOEXEC FollowSymLinks +ExecCGI
    allow from all
    AllowOverride All
    ...
    </Directory>
    ...
    There are also some directives in .htaccess (waiting moving) like IndexIgnore, Limit GET POST, Limit PUT DELETE, Enable GZIP, Expire headers, cache headers and rewrite rules from old to new site and shorcuts.

    Also both domain.com and www.domain.com have 'A' records to the same IP

    Problem persists
    Thank you!
    Last edited by ScallioXTX; May 25, 2012 at 09:13. Reason: delinkified example urls

  8. #8
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,650
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Marcianos,

    Quote Originally Posted by Marcianos View Post
    In this particular case there's no query string because Apache does not recognize '?', there's only a %3F. So (.*) gets what is after file.php and send it as a query string (qs) It is working fine now as a partial solution.

    WRONG! Apache knows that %3F is the ? character and will treat everything after it as a query string. Because the query string is NOT available to the regex in a RewriteRule (only available in a RewriteCond statement and only when specified), your (.*) will NEVER match anything (unless you've enabled MultiViews which, IMHO, is a dumb thing to do). Test it and look at the value (null every time) that $1 returns.

    Actually I already have '...com$1' but it has be '...com/$1' if the rule is placed in .htaccess instead of httpd.conf


    I substituted those both httpd.conf lines by your Code Generator (nice!)
    RewriteCond %{HTTP_HOST} !www\. [NC]
    RewriteRule .? http://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]


    Another (not all) directives in httpd.conf

    ServerAlias www.domain.com ...
    # Handle TRACE request
    RewriteCond %{REQUEST_METHOD} ^TRACE
    RewriteRule .? - [F]

    # retain query string
    RewriteCond %{QUERY_STARING} !'' # string not empty
    RewriteRule .? %{REQUEST_URI}? [QSA,L]

    # that accomplishes exactly nothing!
    <Directory /path/to/www>
    Options Indexes IncludesNOEXEC FollowSymLinks +ExecCGI
    allow from all
    AllowOverride All
    ...
    </Directory>
    ...
    There are also some directives in .htaccess (waiting moving) like IndexIgnore, Limit GET POST, Limit PUT DELETE, Enable GZIP, Expire headers, cache headers and rewrite rules from old to new site and shorcuts.

    Also both domain.com and www.domain.com have 'A' records to the same IP

    Problem persists
    Thank you!
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  9. #9
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    David,

    I copied-pasted "staring" from .... http://datakoncepts.com/mrg.php :-)
    Anyway, despite I don't get what for this rule, I included it just to see if the 'encoded' was decoded by Apache. No way.
    I don't know what else to try, maybe a creating test.domain.com and starting from almost no rules.
    Thank you,
    Marcianos
    Last edited by ScallioXTX; May 26, 2012 at 03:26.

  10. #10
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have a subdomain for testing
    httpd.conf is the dafault from server admin control panel. No .htaccess file

    SuexecUserGroup "#567" "#524"
    ServerName test.domain.com
    DocumentRoot /path/to/www/dir
    ErrorLog /path/to...
    CustomLog /path/to... combined
    ScriptAlias /cgi-bin/ /path/to.../cgi-bin/
    DirectoryIndex index.html index.htm index.php index.php4 index.php5
    <Directory /path/to/www/dir>
    Options -Indexes +IncludesNOEXEC +FollowSymLinks +ExecCGI
    allow from all
    AllowOverride All
    AddHandler fcgid-script .php
    AddHandler fcgid-script .php5
    FCGIWrapper /path/to.../fcgi-bin/php5.fcgi .php
    FCGIWrapper /path/to.../fcgi-bin/php5.fcgi .php5
    </Directory>
    <Directory /path/to.../cgi-bin>
    allow from all
    </Directory>

    RewriteEngine on
    RemoveHandler .php
    RemoveHandler .php5
    IPCCommTimeout 46


    I've uploaded test.php containing
    <? if(isset($_GET['qstring']))echo $_GET['qstring']; ?>

    From FF: h t t p://test.domain.com/test.php?qstring=12345
    FF displays '12345'.

    From FF: h t t p://test.domain.com/test.php%3Fqstring%3D12345
    FF displays 'The requested URL /test.php?qstring=12345 was not found on this server'

  11. #11
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In this particular case there's no query string because Apache does not recognize '?', there's only a %3F. So (.*) gets what is after file.php and send it as a query string (qs) It is working fine now as a partial solution.

    WRONG! Apache knows that %3F is the ? character and will treat everything after it as a query string. Because the query string is NOT available to the regex in a RewriteRule (only available in a RewriteCond statement and only when specified), your (.*) will NEVER match anything (unless you've enabled MultiViews which, IMHO, is a dumb thing to do). Test it and look at the value (null every time) that $1 returns.
    Well, sorry, I guess I misspoke.
    These are the facts:

    If I FF to www.domain.com/dir/file.php%3Fi%3Dbob
    I get a 404 error.

    In .htaccess I add
    RewriteRule ^dir/file\.php(.*) index.php?qs=$1

    Retry www.domain.com/dir/file.php%3Fi%3Dbob from web browser
    index.php is displayed.
    FF URI: www.domain.com/index.php?qs=%3fid=bob (yes, %3f and '=')

    I add these lines into index.php to handle qs
    PHP Code:
    if(isset($_SERVER['QUERY_STRING']))    {
        
    $qs urldecode(str_replace("qs=%3f",NULL,$_SERVER['QUERY_STRING']));
        if(
    strstr($qs,"id"))    {
            
    header("Location: file.php?$qs");
            exit;
        }

    That acts like asking for www.domain.com/dir/file.php?id=bob
    Last edited by ScallioXTX; May 26, 2012 at 03:26.

  12. #12
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,650
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Marcianos,

    I confess to a lack of objectivity with regard to altering links you have no control over. I don't consider this to be much of a problem at all because:

    1. You can't change the incoming links (at their source)

    2. Apache can (and DOES) "translate" encoded characters like the ones you've shown to deal with them

    Therefore, your efforts are like a tempest in a teacup!

    That's not to say that they MUST be ignored, though, as I thought I'd explained that encoded characters can be readily altered by matching within regular expressions by including the ACTUAL character within a character range definition. Similar to the 'change a character' or 'change an extension' sample codes within my signature's tutorial, it would look like this:
    Code:
    # you cannot match %3f as that's ? and does not exist in
    # either the {REQUEST_URI} or {QUERY_STRING}
    # ? is the marker used to separate the {REQUEST_URI} from the {QUERY_STRING}
    
    # replace %3d with =
    RewriteCond %{QUERY_STRING} (.*)[=](.*)
    RewriteRule .? %{REQUEST_URI}?%1=%2 [L]
    
    # replace %26 with &
    RewriteCond %{QUERY_STRING} (.*)[&](.*)
    RewriteRule .? %{REQUEST_URI}?%1&%2 [L]
    Beyond that (which is probably not thoroughly explained in the tutorial, EVERYTHING you need is there.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  13. #13
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    1. You can't change the incoming links (at their source)
    I agree.

    Therefore, your efforts are like a tempest in a teacup!
    I also agree but you know, at this time it is something like we say in Spanish 'to remove the sting'.

    Code:
    # replace %3d with =
    RewriteCond %{QUERY_STRING} (.*)[=](.*)
    RewriteRule .? %{REQUEST_URI}?%1=%2 [L]
    
    # replace %26 with &
    RewriteCond %{QUERY_STRING} (.*)[&](.*)
    RewriteRule .? %{REQUEST_URI}?%1&%2 [L]
    does not work as expected from FF, Chromium, Safari. Nothing changes (->404)
    I also tried individual replace of url encoded chars (? = &) None of them were substituted by their decoded value (or at least displayed the page as the substitution was interpreted by the browser but not displayed)

    I understand '?' is the separator of REQUEST_URI and QUERY_STRING and it is not 'detected'

    From my example (partial solution) it seems Apache does not interpret %3F as '?' (nor as a string to match in reg exp)
    RewriteRule ^dir/file\.php(.*) index.php?qs=$1 applied to www .domain.com/dir/file.php%3Fi%3Dbob
    sends to the user's browser the query string qs=%3fid=bob
    I have to understand that the error 404 is displayed because Apache does not find the file dir/file.php%3F...
    BTW, it is a mess for me to understand what is received as encoded and then displayed or processed as decoded (or viceversa) from both client and server.
    Thank you,
    M

  14. #14
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,650
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    M,

    Did I really do that? Sorry, change the [L]'s to [R=301,L] so that you'll see the redirections.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  15. #15
    SitePoint Member
    Join Date
    May 2006
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by dklynn View Post
    M,

    Did I really do that? Sorry, change the [L]'s to [R=301,L] so that you'll see the redirections.

    Regards,

    DK
    David,

    I had already changed those flags. I added the rules at bottom of httpd.conf of test.domain.com (post #10)
    test.domain.com%3Fqstring=12345 -> "Not Found test.domain.com?qstring=12345"

    Thanks,
    M.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •