SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Enthusiast
    Join Date
    Aug 2006
    Posts
    79
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Problem with non-ascii characters in a link

    Hi,
    I have a link like this:

    http://uzman-bilgisayar.simpg.net/Güvenlikeri-p42.html

    Because of the u with dots ... it comes out like this:

    http://uzman-bilgisayar.simpg.net/G%C3%BCvenlikeri-p42.html


    This should get directed to page 42 of the website by my .htaccess
    but it doesn't.

    I have tried to remove the "%" with :


    PHP Code:
    $mn_link2 str_replace('%','',$mn_link[2]); 
    echo 
    "<li><a href='$mn_link2'>$mn_name[2]</a></li>"
    But still I can not get a clean link

    Anyone know why ?

    PS
    I tried the same code to take out 'e' from same link
    And that works.

    i.e.

    PHP Code:
    $mn_link2 str_replace('e','',$mn_link[2]); 
    echo 
    "<li><a href='$mn_link2'>$mn_name[2]</a></li>"
    I tried to take out u with dots : ü

    PHP Code:
    $mn_link2 str_replace('ü','',$mn_link[2]); 
    echo 
    "<li><a href='$mn_link2'>$mn_name[2]</a></li>"
    But that didn't work


    I also tried:

    PHP Code:
    $mn_link2 str_replace('%C3%BC','',$mn_link[2]); 
    echo 
    "<li><a href='$mn_link2'>$mn_name[2]</a></li>"
    But again ... that didn't work


    Does anyone how I can clean up this url data before
    using it as a link ?

    BTW - I want the Turkish spelling in the link text - just not in the
    link url itself ... and it can be replaced with anything as it is the "-p42.html"
    that is important for the redirect.

    Thanks.



    .
    Last edited by Mittineague; Nov 23, 2013 at 19:07. Reason: delinking example URLs

  2. #2
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,134
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Post your .htaccess rewrite rule too so we can compare the whole process.

  3. #3
    SitePoint Enthusiast
    Join Date
    Aug 2006
    Posts
    79
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK

    Here is my .htaccess

    Options +SymLinksifOwnerMatch
    RewriteEngine On

    # BELOW IS STUFF TO BLOCK SPAMMING ATTACKS
    ######################################################
    # Block out any script trying to set a mosConfig value through the URL
    RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]

    # Block out any script trying to base64_encode crap to send via URL
    RewriteCond %{QUERY_STRING} base64_encode.*(.*) [OR]

    # Block out any script that includes a <script> tag in URL
    RewriteCond %{QUERY_STRING} (<|%3C).*script.*(>|%3E) [NC,OR]

    # Block out any script trying to set a PHP GLOBALS variable via URL
    RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]

    # Block out any script trying to modify a _REQUEST variable via URL
    RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})

    # Send all blocked request to homepage with 403 Forbidden error!
    RewriteRule ^(.*)$ index.php [NC,L]
    #
    ######################################################

    # GETTING RSS FILE BY PAGE NUMBER
    # http://villarentfethiye.simpg.net/rss_feed-5.xml

    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^[\.0-9,:\/-a-z]+rss_feed-([0-9]+)\.xml$ http://simpg.net/rss_feed.php?rss=$1 [NC,QSA,L]

    # GETTING SUPPORTING PAGES BY PAGE NO
    # http://some-name.mobi6.net/greatest-gadget-p13.html

    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^[\.0-9,:\/-a-z]+-p([0-9]+)\.html$ http://simpg.net/info.php?p=$1 [NC,QSA,L]

    # GETTING MAIN PAGE BY URL NAME
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ 404.php?url=$1 [L]


    Hope that helps.


    PS - The Feed re-direct is not working.
    That is a subject of a different thread.


    .

    .

  4. #4
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,134
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Okay, if I had to venture a guess, the a-z won't encompass the non-ascii characters.

    I was able to get it to work using:
    Code:
    RewriteRule ^(.*?)-p([0-9]+)\.html$ info.php?p=$2 [NC,QSA,L]
    , but I imagine there may be a better solution...

  5. #5
    SitePoint Enthusiast
    Join Date
    Aug 2006
    Posts
    79
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK - that's fixes it.,

    Thanks.


    I thought it might also fix the rss_feed re-direct problem

    So I changed that rule to:

    Code:
    # GETTING RSS FILE BY PAGE NUMBER 
    # http://villarentfethiye.simpg.net/rss_feed-5.xml
    
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^(.*?)rss_feed-([0-9]+)\.xml$ rss_feed.php?rss=$2 [NC,QSA,L]
    But when I click on my RSS image I get taken to this address:
    http://simpg.net/info.php?a=villarentfethiye&rss=5

    I am testing it on this page: Mysite

    That &rss=5 is the correct page number - so it is nearly working !!


    The complete .htaccess file is this:

    Options +SymLinksifOwnerMatch
    RewriteEngine On

    # BELOW IS STUFF TO BLOCK SPAMMING ATTACKS
    ######################################################
    # Block out any script trying to set a mosConfig value through the URL
    RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]

    # Block out any script trying to base64_encode crap to send via URL
    RewriteCond %{QUERY_STRING} base64_encode.*(.*) [OR]

    # Block out any script that includes a <script> tag in URL
    RewriteCond %{QUERY_STRING} (<|%3C).*script.*(>|%3E) [NC,OR]

    # Block out any script trying to set a PHP GLOBALS variable via URL
    RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]

    # Block out any script trying to modify a _REQUEST variable via URL
    RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})

    # Send all blocked request to homepage with 403 Forbidden error!
    RewriteRule ^(.*)$ index.php [NC,L]
    #
    ######################################################

    # Redirect old file path to new file path
    # Redirect vacationvillasfethiyerental.villarentfethiye.simpg.net http://example.com/newdirectory/newfile.html
    #
    # To block an IP address:
    # RewriteCond %{REMOTE_ADDR} ^(A\.B\.C\.D)$
    # RewriteRule ^/* http://www.domain.com/sorry.html [L]

    # Re-direct for broken images
    # RewriteCond %{REQUEST_FILENAME} !-f
    # RewriteRule ^images/.*\.jpg$ /images/default.jpg [L]

    # GETTING RSS FILE BY PAGE NUMBER
    # http://villarentfethiye.simpg.net/rss_feed-5.xml

    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^(.*?)rss_feed-([0-9]+)\.xml$ rss_feed.php?rss=$2 [NC,QSA,L]

    # GETTING SUPPORTING PAGES BY PAGE NO
    # http://some-name.mobi6.net/greatest-gadget-p13.html

    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^(.*?)-p([0-9]+)\.html$ info.php?p=$2 [NC,QSA,L]

    # GETTING MAIN PAGE BY URL NAME
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ 404.php?url=$1 [L]


    Can you see what I have done wrong ??


    Thanks again.


    .

  6. #6
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,134
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Yeah, but I'm struggling figuring out how I'd fix it.

    In short, here is what you have happening:

    Initial URL: villarentfethiye.simpg.net/rss_feed-5.xml

    Gets caught by
    Code:
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^(.*?)rss_feed-([0-9]+)\.xml$ rss_feed.php?rss=$2 [NC,QSA,L]
    Which produces the following path: villarentfethiye.simpg.net/rss_feed.php?rss=5

    And that gets caught by
    Code:
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]
    And produces the final result of: simpg.net/info.php?a=villarentfethiye&rss=5

    We need to prevent that final Rewrite Rule from executing against your RSS feed. @dklynn ; Got any advice?

  7. #7
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,134
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Looking at this one more time, the following thought crossed my mind, try changing:
    Code:
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]
    To:
    Code:
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteCond %{QUERY_STRING} ^rss= [NC, OR]
    RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]
    Or: (this may be a better solution)
    Code:
    RewriteCond %{REQUEST_FILENAME} !-f #do not run this rule if the requested file actually exists
    RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
    RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
    RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

  8. #8
    SitePoint Enthusiast
    Join Date
    Aug 2006
    Posts
    79
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi.

    I thought that the "L" meant LAST command, so it should not
    execute any more redirect anyway ??

    Maybe it is just Last line in that bunch of commands,
    meaning redirect now.

    Anyway, I tried both the suggestions and unfortunately I get 500
    Internal Server Errors on both.

  9. #9
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,653
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    jekko,

    I used to believe that the Last flag meant "}", i.e., end of the current RewriteRule (with its RewriteCond's) statement. NOT SO. It tells Apache to immediately update the {REQUEST_URI} and begin another pass through the .htaccess (from the one in the DocumentRoot).

    As for your original question, the Internet used to be "ASCII-centric." Apache certainly took that to heart and looks at encoded characters in a different way.

    From my experience answering similar questions (space in the URI, etc), I've discovered that you can use accented/encoded characters within a character range definition (be sure to escape a space with a /). I've not tested the series of accented characters but I'd be willing to bet that they can be defined in a range just as easily as a-z. Don't worry if characters are encoded in the URI, Apache knows what they look like when they're decoded.

    For more information on URIs, have a look at the geeky Uniform Resource Identifiers (URI): Generic Syntax. It's well worth the effort to read.

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •