SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Zealot
    Join Date
    Jan 2009
    Posts
    144
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    mod_rewrite and special characters (%)

    I have a site in "exostic" language. For SEO purposes, I have in url special characters. I don't want to change url because than I would loose SEO benefit of having title in url.

    Example:
    www.imedomene.com/ποδόσφαιρο
    =>
    http://www.imedomene.com/%CF%80%CE%B...B9%CF%81%CE%BF
    which is actually
    http://www.imedomene.com/article.php...B9%CF%81%CE%BF

    I would like to make rewrite_rule ^/?(/anything)$ /article.php?url=$1 [L]

    The problem is when inside everything is %.
    I tried RewriteRule ^/?(.*)$ /article.php?url=$1 [L], I have googled half day and checked hundreds of sites and still didn't find solution. Is even possible without any server configuration (I don't have access as I use sharing hosting) solve this with .htaccess?

    I hope anybody can help me!

  2. #2
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,095
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    That RewriteRule looks OK to me. Normally I'm not too fond of .* because it's kinda evil, but when you need to handle a lot of exotic chars it certainly is simpler than listing all of those special characters.

    Are you sure the server has mod_rewrite and did you put RewriteEngine On in your .htaccess before the RewriteRule ?
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  3. #3
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    meee,

    Non-Latin languages are difficult to work with in mod_rewrite (obviously). HOWEVER, IF Apache is configured to recognize your language (as it appears to be), then you can use the same technique as [a-z] by using your language's equivalent, i.e., [a-zφ-ρ] (please don't laugh, I have no idea the order of the characters in your alphabet - just use your first and last letters separated by a hyphen).

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator

  4. #4
    SitePoint Zealot
    Join Date
    Jan 2009
    Posts
    144
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    dklynn the problem is that I have hosting in a country where latin charaters are used, I don't believe Apache is configured to recognize my website language.

    I tried now to use
    RewriteRule ^/?(.*)/([-a-zA-Z_&0-9&,!]+)$ /article.php?url=$1 [L]
    www.domainname.com/الشباكيةالمغربية/articleId (articleid as number)
    instead of
    www.domainname.com/الشباكيةمربية
    This is a bit closer and works, but I need then to include article id in url.

    Than I tried also
    www.domainname.com/الشباكيةالمغربية/ (no need to include articleId, just /)
    RewriteRule ^/?(.*)/(.*)$ /article.php?url=$1 [L]
    which is the nearest solution. This also display a page without error, excepting css (I don't know why). Anyway I am not enough familier to replace second .* with a rule nothing but just /. I hope anybody can help with this.

    Tnx a lot!

  5. #5
    SitePoint Zealot
    Join Date
    Jan 2009
    Posts
    144
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I am also wondering why RewriteRule ^/?(.*)/([-a-zA-Z_&0-9&,!]+)$ /article.php?url=$1 [L] works and ^/?(.*)$ /article.php?url=$1 [L not? Why if I include /articleId starts to work?

  6. #6
    SitePoint Zealot
    Join Date
    Jan 2009
    Posts
    144
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I found a solution. If anybody else need, this is what I did:
    RewriteRule ^/?([^/.]+)$ /article.php?url=$1 [L]

  7. #7
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,672
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    meee,

    Sorry for the delay, I lost my ADSL modem to a fault.

    The language is not determined by where the server is located but by the languages selected. While I'm not using languages on my test server, its (default) httpd-languages.conf file is:
    Quote Originally Posted by httpd-languages.conf
    #
    # Settings for hosting different languages.
    #
    # Required modules: mod_mime, mod_negotiation

    # DefaultLanguage and AddLanguage allows you to specify the language of
    # a document. You can then use content negotiation to give a browser a
    # file in a language the user can understand.
    #
    # Specify a default language. This means that all data
    # going out without a specific language tag (see below) will
    # be marked with this one. You probably do NOT want to set
    # this unless you are sure it is correct for all cases.
    #
    # * It is generally better to not mark a page as
    # * being a certain language than marking it with the wrong
    # * language!
    #
    # DefaultLanguage nl
    #
    # Note 1: The suffix does not have to be the same as the language
    # keyword --- those with documents in Polish (whose net-standard
    # language code is pl) may wish to use "AddLanguage pl .po" to
    # avoid the ambiguity with the common suffix for perl scripts.
    #
    # Note 2: The example entries below illustrate that in some cases
    # the two character 'Language' abbreviation is not identical to
    # the two character 'Country' code for its country,
    # E.g. 'Danmark/dk' versus 'Danish/da'.
    #
    # Note 3: In the case of 'ltz' we violate the RFC by using a three char
    # specifier. There is 'work in progress' to fix this and get
    # the reference data for rfc1766 cleaned up.
    #
    # Catalan (ca) - Croatian (hr) - Czech (cs) - Danish (da) - Dutch (nl)
    # English (en) - Esperanto (eo) - Estonian (et) - French (fr) - German (de)
    # Greek-Modern (el) - Hebrew (he) - Italian (it) - Japanese (ja)
    # Korean (ko) - Luxembourgeois* (ltz) - Norwegian Nynorsk (nn)
    # Norwegian (no) - Polish (pl) - Portugese (pt)
    # Brazilian Portuguese (pt-BR) - Russian (ru) - Swedish (sv)
    # Turkish (tr) - Simplified Chinese (zh-CN) - Spanish (es)
    # Traditional Chinese (zh-TW)
    #
    AddLanguage ca .ca
    AddLanguage cs .cz .cs
    AddLanguage da .dk
    AddLanguage de .de
    AddLanguage el .el
    AddLanguage en .en
    AddLanguage eo .eo
    AddLanguage es .es
    AddLanguage et .et
    AddLanguage fr .fr
    AddLanguage he .he
    AddLanguage hr .hr
    AddLanguage it .it
    AddLanguage ja .ja
    AddLanguage ko .ko
    AddLanguage ltz .ltz
    AddLanguage nl .nl
    AddLanguage nn .nn
    AddLanguage no .no
    AddLanguage pl .po
    AddLanguage pt .pt
    AddLanguage pt-BR .pt-br
    AddLanguage ru .ru
    AddLanguage sv .sv
    AddLanguage tr .tr
    AddLanguage zh-CN .zh-cn
    AddLanguage zh-TW .zh-tw

    # LanguagePriority allows you to give precedence to some languages
    # in case of a tie during content negotiation.
    #
    # Just list the languages in decreasing order of preference. We have
    # more or less alphabetized them here. You probably want to change this.
    #
    LanguagePriority en ca cs da de el eo es et fr he hr it ja ko ltz nl nn no pl pt pt-BR ru sv tr zh-CN zh-TW

    #
    # ForceLanguagePriority allows you to serve a result page rather than
    # MULTIPLE CHOICES (Prefer) [in case of a tie] or NOT ACCEPTABLE (Fallback)
    # [in case no accepted languages matched the available variants]
    #
    ForceLanguagePriority Prefer Fallback

    #
    # Commonly used filename extensions to character sets. You probably
    # want to avoid clashes with the language extensions, unless you
    # are good at carefully testing your setup after each change.
    # See http://www.iana.org/assignments/character-sets for the
    # official list of charset names and their respective RFCs.
    #
    AddCharset us-ascii.ascii .us-ascii
    AddCharset ISO-8859-1 .iso8859-1 .latin1
    AddCharset ISO-8859-2 .iso8859-2 .latin2 .cen
    AddCharset ISO-8859-3 .iso8859-3 .latin3
    AddCharset ISO-8859-4 .iso8859-4 .latin4
    AddCharset ISO-8859-5 .iso8859-5 .cyr .iso-ru
    AddCharset ISO-8859-6 .iso8859-6 .arb .arabic
    AddCharset ISO-8859-7 .iso8859-7 .grk .greek
    AddCharset ISO-8859-8 .iso8859-8 .heb .hebrew
    AddCharset ISO-8859-9 .iso8859-9 .latin5 .trk
    AddCharset ISO-8859-10 .iso8859-10 .latin6
    AddCharset ISO-8859-13 .iso8859-13
    AddCharset ISO-8859-14 .iso8859-14 .latin8
    AddCharset ISO-8859-15 .iso8859-15 .latin9
    AddCharset ISO-8859-16 .iso8859-16 .latin10
    AddCharset ISO-2022-JP .iso2022-jp .jis
    AddCharset ISO-2022-KR .iso2022-kr .kis
    AddCharset ISO-2022-CN .iso2022-cn .cis
    AddCharset Big5.Big5 .big5 .b5
    AddCharset cn-Big5 .cn-big5
    # For russian, more than one charset is used (depends on client, mostly):
    AddCharset WINDOWS-1251 .cp-1251 .win-1251
    AddCharset CP866 .cp866
    AddCharset KOI8 .koi8
    AddCharset KOI8-E .koi8-e
    AddCharset KOI8-r .koi8-r .koi8-ru
    AddCharset KOI8-U .koi8-u
    AddCharset KOI8-ru .koi8-uk .ua
    AddCharset ISO-10646-UCS-2 .ucs2
    AddCharset ISO-10646-UCS-4 .ucs4
    AddCharset UTF-7 .utf7
    AddCharset UTF-8 .utf8
    AddCharset UTF-16 .utf16
    AddCharset UTF-16BE .utf16be
    AddCharset UTF-16LE .utf16le
    AddCharset UTF-32 .utf32
    AddCharset UTF-32BE .utf32be
    AddCharset UTF-32LE .utf32le
    AddCharset euc-cn .euc-cn
    AddCharset euc-gb .euc-gb
    AddCharset euc-jp .euc-jp
    AddCharset euc-kr .euc-kr
    #Not sure how euc-tw got in - IANA doesn't list it???
    AddCharset EUC-TW .euc-tw
    AddCharset gb2312 .gb2312 .gb
    AddCharset iso-10646-ucs-2 .ucs-2 .iso-10646-ucs-2
    AddCharset iso-10646-ucs-4 .ucs-4 .iso-10646-ucs-4
    AddCharset shift_jis .shift_jis .sjis
    If your language isn't in there, it's not of this planet!


    [QUOTE=meee;4694839]dklynn the problem is that I have hosting in a country where latin charaters are used, I don't believe Apache is configured to recognize my website language.

    Gudonya! Your method of using the articleId is a great way around the problem of using non-Latin characters for the article titles. I PREFER to use the title - even in other languages - but you need (a) to have your language available on the server and (b) to be able to specify the character range of your characters (use the characters' hex values to determine the start and end characters but USE THOSE CHARACTERS in your character range definition!).

    Regards,

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •