SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Addict
    Join Date
    Jul 2006
    Location
    Fionnphort, Isle of Mull, Scotland
    Posts
    363
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Web site page lister

    I'm taking over a web site with (I think) about 150 pages, but the retiring web-master can't/won't provide a CD. He suggests I download it page by page, following the links manually. This means, of course, that I'll have a folder full of pages and a folder for every page containing the supporting files (needlessly duplicated). Awful prospect, deadly tedious and wide open to error. I don't have FTP access to the server.

    Can anyone recommend a program that will crawl a web site and compile a report for me listing all the unique pages by URL ?

    I've found a program called 'Web2Disk' which will download the site so that it can be browsed, but it's not very good at putting supporting files (images, CSS, etc) into the correct folder. Anyone know of something better ?

    I've also tried 'PowerMapper' to good effect. Great map, but many pages are duplicated if they can be reached by more than one link sequence. 'SiteSort' would do the job, but not in its evaluation mode, and I'm not yet in a position to commit to purchase.

    Ramasaig

  2. #2
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,290
    Mentioned
    198 Post(s)
    Tagged
    3 Thread(s)
    You don't need a CD of the files from him. You need FTP access.
    If the site has any dynamic pages, saving them from HTTP won't do any good. Since you are taking over the site, ask for FTP access and copy the site's folders/files to your local computer (then put them on a CD yourself if you really want one). Also make sure you get a copy of the database SQL while you're at it. If you can't get those, then you are being asked to take over the site with your "hands tied". IMHO, if they won't see to reason, walk away now. Without FTP access you won't be able to do any maintenance on the site, so why bother.

  3. #3
    SitePoint Zealot sherl0ck's Avatar
    Join Date
    Aug 2008
    Posts
    120
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I am using wget to make a mirror.
    but it only work in linux though

  4. #4
    SitePoint Addict
    Join Date
    Jul 2006
    Location
    Fionnphort, Isle of Mull, Scotland
    Posts
    363
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thank you both. Yes, I know I should get FTP access, I'm working on it. I suspect the site's a real mess, with loads of obsolete pages on the server, and the retiring web-master doesn't want anyone knowing. Fortunately there are no dynamic pages, it's all in HTML with tables and in-line mark-up (with just a little CSS), so I can save it page by page and extract what I need. It's just so tedious.

    Walking away would be sensible, but the client is the local tourism marketing group, of which I'm a member, so this is going to be something of a labour of love with pay that doesn't truly reflect the effort involved.
    Tim Dawson
    Isle of Mull, Scotland

  5. #5
    SitePoint Enthusiast SitemapGenerator's Avatar
    Join Date
    Nov 2007
    Posts
    90
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You might want to try A1 Website Download. However, flash, ActiveX and stuff like that can cause problems. Feel free to PM/email if problems.
    A1 Website Analyzer - Fix broken links, duplicate titles, custom text search, sculpt links
    A1 Sitemap Generator - Build xml, video, image, mobile, visual HTML/CSS sitemaps
    :: WebHelpForums.Net :: Support forum for the A1 tools suite.

  6. #6
    SitePoint Zealot
    Join Date
    Jun 2008
    Location
    Australia
    Posts
    164
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    is your website design in asp.net or php......if you have php web design then you may want to use wget to make mirror site/........it works well with linux server and my be useful for html pages as well

  7. #7
    SitePoint Wizard
    Join Date
    Oct 2001
    Location
    Lancaster, PA
    Posts
    3,019
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I've used www.httrack.com for stuff like this with good results before.

    Steve

  8. #8
    SitePoint Member
    Join Date
    Nov 2008
    Location
    Michigan
    Posts
    11
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    httracks
    maybe

    there are several website copiers out there
    what about ftp and pull down the content or am I missing something


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •