SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Zealot
    Join Date
    May 2003
    Posts
    164
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    WGET and escaping URL with query string

    Hi

    There is a partnering website that provides an RSS feed to display on the website I am working on. The website displays information on the feed every time a user accesses the website. The feed changes almost every day.

    For bandwidth considerations and speed, I would like to download the feed once by the server using a crontab job (my website is in a linux shared hosting environment).

    The problem exists with the URL structure, which I have no control over.

    Here is the URL:

    Code:
    http://www.website.com/feeds/FastRSS?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not(\'1\'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1
    This would be my WGET command that downloads,names and saves the feed as an xml file:

    Code:
    wget http://www.website.com/feeds/FastRSS?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not(\'1\'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1  -O /home/mywebsite/public_html/rssfeed.xml
    I am aware that there are characters that need escaping and this is where I am getting my errors.

    I am also aware that I can avoid having to escape by enclosing the URL with single or double quotes. You will notice that the URL has BOTH single and double quotes, so its not as simple.(unless I escape those maybe? or if I can concatenate? How do I do that? ) I have done alot of reading on wget, but I cant seem to diagnose the problem. Any help on how I can structure the URL in a WGET-friendly way? I have seen a few similar enquiries and solutions. The closest to my problem suggests using URL shortening services(http://www.webhostingtalk.com/archiv.../t-551248.html), but I still think there is a smarter and more efficient way.

    Sorry for the lengthy post

    Thanks
    Elgg Customisation & Theme development
    Modx Custom Development
    PHP programming

  2. #2
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,097
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Try

    Code:
    http://www.website.com/feeds/FastRSS?%3Ffql%3Dand%28meta.collection%3Aor%28zsses%29%2Czssc%3Azw%2Czssc%3A1%2Czssarchived%3Anot%28%271%27%29%2Cstring%28%22%22%2C%2Bmode%253D%22simpleall%22%2Cannotation_class%253D%22user%22%29%29%26view%3Drwallsppublished%26hits%3D25%26offset%3D0%26qtf_lemmatize%3D1%26sortby%3Dzsspubdate-rwpubdatedisplay%26sortdirection%3Ddescending%26collapseon%3Dbatvuigeneric1
    When I feed that to wget it says:

    Code:
     => `FastRSS?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not('1'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1'
    Which is what you want right?

    BWT. To get from your URL to my URL I used the following PHP snippet:
    PHP Code:
    echo str_replace(
      array(
    '(',')'), array('%40''%41'), 
      
    urlencode('?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not(\'1\'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1')
    ); 
    So, URLencode, and change ( with %40 and ) with %41

    The only thing is not(\'1\') is now not('1') in URL, but I'm not sure if that matters ....

    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  3. #3
    SitePoint Zealot
    Join Date
    May 2003
    Posts
    164
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thank-you for responding,ScallioXTX

    URL encoding does not seem to work. I just tried your solution. It then fails to execute at the remote server as it generates a 404 error....unless I misunderstood it.

    I thought I could somehow do it through a bash script without having to write it as PHP.

    if all fails i may use file_put_contents in PHP

    Thanks
    Elgg Customisation & Theme development
    Modx Custom Development
    PHP programming

  4. #4
    SitePoint Member
    Join Date
    Jul 2010
    Posts
    7
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Why not try one of those free URL redirection services like tinyURL?

    then all you need to do is type wget http://redirectionservice.com/myURL

    Problem solved

  5. #5
    SitePoint Zealot
    Join Date
    May 2003
    Posts
    164
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    that was an option i considered,as i mentioned, but I get the hibbie-jibbies relying on another third-party. i may consider it as one of my last options. In the mean time, I will try a linux forum and update u if there is a solution
    Elgg Customisation & Theme development
    Modx Custom Development
    PHP programming

  6. #6
    SitePoint Zealot
    Join Date
    May 2003
    Posts
    164
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I meant to share my solution to the problem.

    WGet has a switch that allows you to read URLs from a file as opposed to writing the whole string.

    There's no need to escape the strings :-)

    All you have to do is save the link(s) in the file then tell Wget to read them off that file.

    Here's an example:
    I pasted all my links in feedlinks.txt
    The feed is then saved in rssfeed.xml

    Code:
    wget -i /home/mywebsite/cronjobs/feedlinks.txt -O /home/mywebsite/public_html/rssfeed.xml
    Elgg Customisation & Theme development
    Modx Custom Development
    PHP programming


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •