SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Member
    Join Date
    Oct 2005
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    A general question on screen scraping and custom css files for a website

    I hope this is the correct forum to post this. Basically, what I want to do is create a small portal to a popular forum. This would be a mobile version of the site accessible through the iPhone and various other cellphones.

    The easiest way I can think of to do this is to somehow apply a custom css file to the site - basically I would load the site and not use their css, instead applying my own css file for easier mobile viewing.

    Another method would be to screen-scrape to collect the data and then present it in a more viewable format.

    I'm not sure what tool/script I should use to either load my own css (and overwrite the default) or to do screen-scraping, so I'd appreciate any help.

    To make it clear, I am not an administrator in the site and don't have access to any of the back-end.

    Thanks in advance.

  2. #2
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Unfortunately (for you), without the site's permission you legally have no right to scrape their site; besides, mobile technology has advanced quite a bit since the year 2000. A lot of phones can show the sites almost identically to desktop browsers.

    To summarise, this neither ethical nor even useful, so you may have alot of troubles trying to accomplish this.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  3. #3
    SitePoint Member
    Join Date
    Jun 2009
    Posts
    10
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    arkinstall: then why is Google allowed to do it?

    solaroid, you can load a different CSS through JavaScript - a quick Google search will find what you're looking for, however if you're loading their website externally then it gets more complicated.

  4. #4
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Google crawls pages to give them benefit - without engines like Google, those sites probably wouldn't exit; even if they would you wouldn't be able to find them.

    As for JavaScript - if we're talking about phones which can't show basic design easily, JavaScript won't exactly be viable, will it!

    You have two methods. One is to scrape the site and take certain bits of content, putting them into your own code template and attaching your own CSS file.

    Your second method is much more productive for you and the community. Write a plugin for the forum's software (not a scraper but an alternate output) which detects mobile viewers and shows a more suitable version of the site for them. That way you can offer to sell it to this site in question, getting an instant return. You can then find other popular forums and sell it to them.

    You now have yourself a product which is useful to the sites.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  5. #5
    SitePoint Member
    Join Date
    Oct 2005
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    arkinstall, I have the permission about the site, so ethically there is no problem. Also, I understand that there have been great leaps in mobile web browsers (I have an iPhone), but it's still a pain to browse, which is why I would like to change the formatting.

    Unfortunately a plugin won't be viable but I was hoping to somehow load the site in an iframe and them impose my own stylesheet through a script/some other method.

    I think I'll drop the screen scraping idea, CSS seems to be the way to go. I'm trying to do this as a service to the site and for my own convenience. Is there a way to do this?

  6. #6
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, it involves making all traffic go directly through your website so that you can modify the html. You will be writing a proxy server. It's quite a project.

    There's already websites which do this
    http://mowser.com/
    http://www.google.com/m
    http://www.google.com/gwt/n

  7. #7
    SitePoint Wizard Hammer65's Avatar
    Join Date
    Nov 2004
    Location
    Lincoln Nebraska
    Posts
    1,161
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I agree with concerns over permission being granted to use the site content. If that permission is secured however, their may be other ways to provide the content such as RSS feeds derived from the db data and what not.

    Having said that, if I had to whip something up in an hour or two, I would probably take a look to see if all styling was done through external sheets or style tags in the head. If so, I would use regex to pull just the body markup and add my own HTML head content with linked stylesheets. Processing time would be minimal. The most complex issue in scraping normally, is consistently extracting information from a published page as raw data. Since you aren't doing that, this would probably be a relatively easy operation although no where near as easy as getting permission to place a script on their server that pulls the info direct from the db, say using a db account with select privileges only..
    Visit my blog
    PHP && Life
    for technology articles and musings.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •