WGET and escaping URL with query string

Hi

There is a partnering website that provides an RSS feed to display on the website I am working on. The website displays information on the feed every time a user accesses the website. The feed changes almost every day.

For bandwidth considerations and speed, I would like to download the feed once by the server using a crontab job (my website is in a linux shared hosting environment).

The problem exists with the URL structure, which I have no control over.

Here is the URL:


http://www.website.com/feeds/FastRSS?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not(\\'1\\'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1

This would be my WGET command that downloads,names and saves the feed as an xml file:


wget http://www.website.com/feeds/FastRSS?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not(\\'1\\'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1  -O /home/mywebsite/public_html/rssfeed.xml

I am aware that there are characters that need escaping and this is where I am getting my errors.

I am also aware that I can avoid having to escape by enclosing the URL with single or double quotes. You will notice that the URL has BOTH single and double quotes, so its not as simple.(unless I escape those maybe? or if I can concatenate? How do I do that? ) I have done alot of reading on wget, but I cant seem to diagnose the problem. Any help on how I can structure the URL in a WGET-friendly way? I have seen a few similar enquiries and solutions. The closest to my problem suggests using URL shortening services(http://www.webhostingtalk.com/archive/index.php/t-551248.html), but I still think there is a smarter and more efficient way.

Sorry for the lengthy post

Thanks

Try


http://www.website.com/feeds/FastRSS?%3Ffql%3Dand%28meta.collection%3Aor%28zsses%29%2Czssc%3Azw%2Czssc%3A1%2Czssarchived%3Anot%28%271%27%29%2Cstring%28%22%22%2C%2Bmode%253D%22simpleall%22%2Cannotation_class%253D%22user%22%29%29%26view%3Drwallsppublished%26hits%3D25%26offset%3D0%26qtf_lemmatize%3D1%26sortby%3Dzsspubdate-rwpubdatedisplay%26sortdirection%3Ddescending%26collapseon%3Dbatvuigeneric1

When I feed that to wget it says:


 => `FastRSS?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not('1'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1'

Which is what you want right?

BWT. To get from your URL to my URL I used the following PHP snippet:


echo str_replace(
  array('(',')'), array('%40', '%41'), 
  urlencode('?fql=and(meta.collection:or(zsses),zssc:zw,zssc:1,zssarchived:not(\\'1\\'),string("",+mode%3D"simpleall",annotation_class%3D"user"))&view=rwallsppublished&hits=25&offset=0&qtf_lemmatize=1&sortby=zsspubdate-rwpubdatedisplay&sortdirection=descending&collapseon=batvuigeneric1')
);

So, URLencode, and change ( with %40 and ) with %41

The only thing is not(\‘1\’) is now not(‘1’) in URL, but I’m not sure if that matters …

:slight_smile:

thank-you for responding,ScallioXTX

URL encoding does not seem to work. I just tried your solution. It then fails to execute at the remote server as it generates a 404 error…unless I misunderstood it.

I thought I could somehow do it through a bash script without having to write it as PHP.

if all fails i may use file_put_contents in PHP

Thanks

Why not try one of those free URL redirection services like tinyURL?

then all you need to do is type wget http://redirectionservice.com/myURL

Problem solved :wink:

that was an option i considered,as i mentioned, but I get the hibbie-jibbies relying on another third-party. :slight_smile: i may consider it as one of my last options. In the mean time, I will try a linux forum and update u if there is a solution

I meant to share my solution to the problem.

WGet has a switch that allows you to read URLs from a file as opposed to writing the whole string.

There’s no need to escape the strings :slight_smile:

All you have to do is save the link(s) in the file then tell Wget to read them off that file.

Here’s an example:
I pasted all my links in feedlinks.txt
The feed is then saved in rssfeed.xml


wget -i /home/mywebsite/cronjobs/feedlinks.txt -O /home/mywebsite/public_html/rssfeed.xml