SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    .htaccess RewriteRule clean URL (remove punctuation; replace spaces with hyphens,etc)

    Hello,

    I am trying to create a clean URL using a RewriteRule in an .htaccess file (using Apache 2.2).

    Using a hypothetical example, I would like this:
    Ripley’s Believe It Or Not – Piccadilly Circus (London, England)

    To appear like this:
    attraction/ripleys-believe-it-or-not-piccadilly-circus-london-england

    i.e. Remove all punctuation, replace spaces with hyphens, and make upper case letters lower case. The number of spaces will vary from entry to entry and could be even more than the eight here, so I expect the [N] suffix may well be required.

    I am currently using the ‘id’ (below) rather than the ‘attraction_name’, which is obviously far simpler, but does not create a very useful or attractive URL:
    Code:
    Options +FollowSymLinks
    RewriteEngine on
    RewriteRule ^attraction/([0-9]*)$ attraction/?id=$1 [L,NC,QSA]
    I have also used a PHP custom function (‘GenerateUrl’) to generate the URL I need from within the link (below), but with this method (found at this site) the variable is not passed to the next page in its original state and therefore cannot then be used to select corresponding data.
    PHP Code:
    <a href = "/attraction/<?php echo GenerateUrl($attraction['id']); ?>"><?php echo html($attraction[attraction_name']); ?></a>
    I don’t want to use the ‘RewriteMap myquery’ method as once my site goes live I don’t expect I’ll have access to the server’s httpd.conf or virtualhost configuration files, which that would require.

    I’ve considered using the custom function to create a URL that can be saved in the ‘attraction’ table and therefore be used to select corresponding data thereafter, but would rather not given I’m pretty sure it’s avoidable.

    I just can’t figure out what the RewriteRule should be – can anybody help me out?

    <snip><merged from hijacked thread><edited>
    DK or ScallioXTX seeing as you each seem to be have expertise on RewriteRules, I hope that you might be able to help.
    </snip></merged></edited>

    Thanks in advance,

    Andy
    Last edited by ServerStorm; Nov 20, 2012 at 16:20.

  2. #2
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    And it should allow for numbers in case there is such an attraction to be listed, i.e. 'Cafe 1001' (attractions/cafe-1001).

  3. #3
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Remove apostrophes from URLs

    I've decided that the best way is to create a field in which to store URLs for their respective entries.

    I am using the GenerateUrl function (found here):-

    PHP Code:
    function GenerateUrl ($s) { //Convert accented characters, and remove parentheses and apostrophes $from = explode (',', ",,,,,,,,,,,,,,,,,,,,,,,,,e,i,,u,(,),[,],'"); $to = explode (',', 'c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,e,i,o,u,,,,,,'); //Do the replacements, and convert all other non-alphanumeric characters to spaces $s = preg_replace ('~[^\w\d]+~', '-', str_replace ($from, $to, trim ($s))); //Remove a - at the beginning or end and make lowercase return strtolower (preg_replace ('/^-/', '', preg_replace ('/-$/', '', $s))); } 
    It works great for the most part, although I am having problems with apostrophes.

    Used as quotation marks (i.e. only touching another character on one side) they work fine:-
    'Eiffel Tower (Paris)' becomes eiffel-tower-paris

    But used as actual apostrophes (being sandwiched between two characters), not so well:-
    St Paul's Cathedral (London) becomes st-paul-s-cathedral-london

    I'm using PHP 5.4.3 and have code to undo the modifications of magic quotes (should this be the cause of the problem).

    Thanks,

    Andy

  4. #4
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry, code should read better here:-

    PHP Code:
    function GenerateUrl ($s) {
    //Convert accented characters, and remove parentheses and apostrophes
    $from explode (','",,,,,,,,,,,,,,,,,,,,,,,,,e,i,,u,(,),[,],'"); $to explode (',''c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,e,i,o,u,,,,,,');
    //Do the replacements, and convert all other non-alphanumeric characters to spaces
    $s preg_replace ('~[^\w\d]+~''-'str_replace ($from$totrim ($s)));
    //Remove a - at the beginning or end and make lowercase
    return strtolower (preg_replace ('/^-/'''preg_replace ('/-$/'''$s)));


  5. #5
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I should mention that the desired URL would be:
    st-pauls-cathedral-london
    I've just figured out the original function code I gave DOES work. However, it only seems to works if I apply the function to the name live on the page, i.e.

    PHP Code:
    <?php echo generateurl($attraction['attraction_name']); ?>
    But what I am currently doing is applying the function within the index.php file when data is entered into the website. I suspect the problem is coming from the fact that I am applying the function to a value which has already had the below function applied to it (to deal with magic quotes):-

    PHP Code:
    $attraction_name mysqli_real_escape_string($link$_POST['attraction_name']);
    $attraction_url generateurl($attraction_name); 
    I reckon I've got to shift some coding around to generate the URL from the attraction_name before it is affected by mysqli_real_escape_string. I'll let you know how I get on... (nobody else has yet joined this discussion but I figure if I solve it then it could prove useful to somebody in the future).

  6. #6
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, it turns out that
    mysqli_real_escape_string
    was the cause of the problem. A bit of reordering of the code seems to have sorted it:-

    PHP Code:
    $attraction_url generateurl($_POST['attraction_name']);
    $attraction_name mysqli_real_escape_string($link$_POST['attraction_name']); 
    Thanks!

    Andy

  7. #7
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,067
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    The easiest way to get it to be st-pauls-cathedral-london, is to replace 's with just an s before you do anything else.

    PHP Code:
    <?php
    $str 
    str_replace("'s""s"$str);
    Rmon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  8. #8
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    The original function code did work; the problem I was having was casued by the 'mysqli_real_escape_string' function (see above).

    I now want to remove the word 'The' (or 'the') from the start of any URLs that I create (and 'A' and 'An' as well once I've got that working). Surely it should be a simple change made to the bottom line of the function (see below), but it does not appear to be working - any ideas?

    PHP Code:
    return strtolower (preg_replace ('/^-/'''preg_replace ('/-$/'''preg_replace ('/^the /i'''$s)))); 
    Thanks,

    Andy

  9. #9
    SitePoint Enthusiast
    Join Date
    Nov 2012
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    you could do

    $urlstring = strtolower(preg_replace('[A-Za-z0-9-_]','-', $urlstring));


    that should replace everything thats not A to Z or a to z or 0 to 9 or - or _ with a dash (-) and then make it all lowercase

    you could then check if thats the current url and use a header() call to automatically 301 to the correct place.

    thats what I do for all my projects

  10. #10
    SitePoint Enthusiast andygout's Avatar
    Join Date
    Jun 2012
    Location
    London, United Kingdom, United Kingdom
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This is the solution I've used to remove 'the', 'a', and 'an' from start of any URL:-

    Replace the bottom line of the above GenerateUrl function code with the below (obviously this line still also converts everything to lower case letters and removes any opening or closing hyphens):-

    PHP Code:
    return strtolower (preg_replace ('/^-/'''preg_replace ('/-$/'''preg_replace ('/\b(^the|^a|^an)\b/i'''$s)))); 
    Some useful advice on this subject from Stack Overflow.

    And good article on using \b for word boundaries in regular expressions from Regex Tutorial.

  11. #11
    Certified Ethical Hacker silver trophybronze trophy dklynn's Avatar
    Join Date
    Feb 2002
    Location
    Auckland
    Posts
    14,653
    Mentioned
    19 Post(s)
    Tagged
    3 Thread(s)
    Don't forget to check the ALLOWED/RESERVED/PROHIBITED lists of characters at http://www.ietf.org/rfc/rfc2396.txt before you get into all this changing of URI characters.

    I've got a client set-up to do exactly this BUT have prohibited any offending characters from his article titles. PM me if you would like to see the code.

    Regards

    DK
    David K. Lynn - Data Koncepts is a long-time WebHostingBuzz (US/UK)
    Client and (unpaid) WHB Ambassador
    mod_rewrite Tutorial Article (setup, config, test & write
    mod_rewrite regex w/sample code) and Code Generator


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •