SitePoint Sponsor

User Tag List

Results 1 to 20 of 20
  1. #1
    SitePoint Enthusiast
    Join Date
    Feb 2007
    Posts
    90
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Extracting all Data from Web text,video,image etc

    there is a website www.howstuffworks.com i want to extract all data includes text, audio, video, flash files and all and put it into my my so can u recomend any website that extract data

  2. #2
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    24,177
    Mentioned
    454 Post(s)
    Tagged
    8 Thread(s)
    Quote Originally Posted by selicon.valley View Post
    there is a website www.howstuffworks.com i want to extract all data includes text, audio, video, flash files and all and put it into my my so can u recomend any website that extract data
    It sounds like you are talking about stealing that site's content. Is that so, or is this your own site, or ...?

  3. #3
    SitePoint Enthusiast
    Join Date
    Feb 2007
    Posts
    90
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    no some sites allow to copy their material

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by selicon.valley View Post
    no some sites allow to copy their material
    however, howstuffworks is very clear that you may not re-publish any of it

    you may ~not~ copy their content except for your very own personal use
    The materials available through the Discovery Sites are the property of Discovery or its licensors, and are protected by copyright, trademark and other intellectual property laws. You are free to display and print for your personal, non-commercial use information you receive through the Discovery Sites. But you may not otherwise reproduce any of the materials without the prior written consent of the owner. You may not distribute copies of materials found on the Discovery Sites in any form (including by e-mail or other electronic means), without prior written permission from the owner.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Zealot Sogo7's Avatar
    Join Date
    May 2011
    Posts
    129
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    You cannot use content from howstuffworks.com to build your own website, without prior written permission from the sites admin staff.
    It actually says so in the Terms & Conditions but you have to
    read almost the entire page to find out.

    However you could build a website that contains only links with short descriptions to howstuffworks.com and similar websites with tutorial videos like YouTube. That falls within the 'Fair Use' rules because what appears on your site would not be a duplicate copy and does not contain all the information the user would need. If the site was actually powered by something like the Bing or Google custom search API then the 'Safe Harbor' agreement they have gives you immunity to any claims of copyright infringement. This is because your website would be just a pocket size version of the larger search engine set up to provide specific search information.

    Sadly websites built like this do not perform very well from my experience, having a forum as well for users to generate content helps but it will take a while to build visitor traffic.
    Lovelogic.net Personal Projects Pit - Spammers welcome

  6. #6
    Word Painter silver trophy Shyflower's Avatar
    Join Date
    Oct 2003
    Location
    Winona, MN USA
    Posts
    10,053
    Mentioned
    142 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Sogo7 View Post
    You cannot use content from howstuffworks.com to build your own website, without prior written permission from the sites admin staff.
    It actually says so in the Terms & Conditions but you have to
    read almost the entire page to find out.

    However you could build a website that contains only links with short descriptions to howstuffworks.com and similar websites with tutorial videos like YouTube. That falls within the 'Fair Use' rules because what appears on your site would not be a duplicate copy and does not contain all the information the user would need. If the site was actually powered by something like the Bing or Google custom search API then the 'Safe Harbor' agreement they have gives you immunity to any claims of copyright infringement. This is because your website would be just a pocket size version of the larger search engine set up to provide specific search information.

    Sadly websites built like this do not perform very well from my experience, having a forum as well for users to generate content helps but it will take a while to build visitor traffic.
    Unless you are an attorney, please stop referencing the Fair Use Rule because you have it all wrong.
    Linda Jenkinson
    "Say what you mean. Mean what you say. But don't say it mean." ~Unknown

  7. #7
    SitePoint Enthusiast
    Join Date
    Aug 2012
    Posts
    45
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Dont do it. How would you feel if you worked very hard on a site then somebody just takes your content? That is stealing.

  8. #8
    SitePoint Zealot Sogo7's Avatar
    Join Date
    May 2011
    Posts
    129
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Shyflower View Post
    Unless you are an attorney, please stop referencing the Fair Use Rule because you have it all wrong.
    M'lud now I am confused..lol!
    So are you saying if I was an Attorney my statement would been correct or acceptable to you?
    Lovelogic.net Personal Projects Pit - Spammers welcome

  9. #9
    Word Painter silver trophy Shyflower's Avatar
    Join Date
    Oct 2003
    Location
    Winona, MN USA
    Posts
    10,053
    Mentioned
    142 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Sogo7 View Post
    M'lud now I am confused..lol!
    So are you saying if I was an Attorney my statement would been correct or acceptable to you?
    No, I'm saying if you were an attorney you would have given the appropriate information to begin with. Fair use is a very touchy subject and is decided case by case by a judge in a court or law. However, one of the major aspects of fair use is that it is for non-profit and educational or journalistic use only. That, in itself, is a pretty basic summary.

    However, spouting copyright law on a public forum with no references given to its credibility is just, IMO, bad judgement and we don't recommend that here. Law is built on fact, not opinion.

    This particular conversation has run its course. If you really want to discuss legal issues, please visit the business and legal forum to do so.. and while you are there, you might do a search on fair use as it is a topic that has been covered several times in that forum. This forum and this discussion is about white-hat practices in acquiring and adding legitimate content to a website.
    Linda Jenkinson
    "Say what you mean. Mean what you say. But don't say it mean." ~Unknown

  10. #10
    SitePoint Member XcriptXource's Avatar
    Join Date
    Sep 2012
    Location
    CA
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Could that actually be possible?
    To copy a whole website's videos, pictures, contents, and etc. for your own personal use?
    If so, how so?

  11. #11
    Word Painter silver trophy Shyflower's Avatar
    Join Date
    Oct 2003
    Location
    Winona, MN USA
    Posts
    10,053
    Mentioned
    142 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by XcriptXource View Post
    Could that actually be possible?
    To copy a whole website's videos, pictures, contents, and etc. for your own personal use?
    If so, how so?
    Copy/paste... page by page, image by image, video by video. But why on earth would anyone want to do that when it is much easier just to bookmark a site in a browser?
    Linda Jenkinson
    "Say what you mean. Mean what you say. But don't say it mean." ~Unknown

  12. #12
    SitePoint Member XcriptXource's Avatar
    Join Date
    Sep 2012
    Location
    CA
    Posts
    3
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Shyflower View Post
    Copy/paste... page by page, image by image, video by video. But why on earth would anyone want to do that when it is much easier just to bookmark a site in a browser?
    You know that answer to your question. Use it for personal reference, when you have no internet and only a computer. Just plug in your handy dandy flash drive then boom!
    Walah... you have your self a professionally working website that doesn't use internet. Lol, I was just kidding with all these nonsense. But isn't there another way than just copying image and etc. one by one?

  13. #13
    Life is not a malfunction gold trophysilver trophybronze trophy
    TechnoBear's Avatar
    Join Date
    Jun 2011
    Location
    Argyll, Scotland
    Posts
    6,177
    Mentioned
    264 Post(s)
    Tagged
    5 Thread(s)
    You can use the "Save page as" (or similar) option, under the "File" menu in your browser to save a complete page. (I'm using Firefox, but I presume other browsers are the same.) I don't know if it will save scripts - I've never tried with a page that uses one. I don't know any way to save an entire site.

  14. #14
    Word Painter silver trophy Shyflower's Avatar
    Join Date
    Oct 2003
    Location
    Winona, MN USA
    Posts
    10,053
    Mentioned
    142 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by TechnoBear View Post
    You can use the "Save page as" (or similar) option, under the "File" menu in your browser to save a complete page. (I'm using Firefox, but I presume other browsers are the same.) I don't know if it will save scripts - I've never tried with a page that uses one. I don't know any way to save an entire site.
    You certainly can do that, but it hotlinks everything from the site so if something changes the page will as well. For instance, removed images will show up as place holders instead of the images. Additionally, hot-linking is pretty poor web etiquette. And finally, trying to do that on a large site such as How Stuff Works, would be a nightmare.

    There is absolutely no reason I can think of as to what you would accomplish by downloading someone else's whole site for your own use when all you have to do is open your browser and click on a bookmark to revisit it.
    Linda Jenkinson
    "Say what you mean. Mean what you say. But don't say it mean." ~Unknown

  15. #15
    It's all Geek to me silver trophybronze trophy
    ralph.m's Avatar
    Join Date
    Mar 2009
    Location
    Melbourne, AU
    Posts
    24,177
    Mentioned
    454 Post(s)
    Tagged
    8 Thread(s)
    When I first started web design (which wasn't so long ago), there were programs that purported to download a whole site for you. But I honestly don't see the point—not for any legitimate reasons, anyhow. In the future, HTML5 sites may offer a download facility that allows you to view the content offline, but that's a choice the site owner has to make.

  16. #16
    SitePoint Enthusiast
    Join Date
    Nov 2008
    Posts
    27
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What's the point of scraping that site? Duplicate content penalties plus possible lawsuit = not a happy situation. One possible legal use is not to republish it or syndicate it but to review structure or make an archival copy for personal use. I'm not sure about the latter though.

  17. #17
    Life is not a malfunction gold trophysilver trophybronze trophy
    TechnoBear's Avatar
    Join Date
    Jun 2011
    Location
    Argyll, Scotland
    Posts
    6,177
    Mentioned
    264 Post(s)
    Tagged
    5 Thread(s)
    Quote Originally Posted by Shyflower View Post
    You certainly can do that, but it hotlinks everything from the site so if something changes the page will as well. For instance, removed images will show up as place holders instead of the images. Additionally, hot-linking is pretty poor web etiquette.
    Firefox does save the images, but not background images - they don't appear at all. I've only really used it where I've wanted to print something - generally a knitting pattern - and it's not been set up in a way that will print well, e.g. the pattern is a narrow column that prints over umpteen pages. Then I sometimes save the page and edit my local version to print in a more useful manner. I don't need to keep coming back to it.

  18. #18
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Unless doing something like TechnoBear has mentioned web scrapping and web copying are just bad and in most cases illegal. As Shyflower indicated hot-linking is also bad because it steals the bandwidth that someone else pays to host the images and content not to mention that taking this information can in many cases also be illegal. Companies are increasingly prosecuting sites that have scrapped contents as scrap-checking bots are becoming very good at finding stolen content. You don't want to get mixed up in this unless it is for personal use or that you are VERY clear on the site owner allowing this as well as that they haven't stolen their content from somewhere; how do you know or guarantee this, I don't think you can!
    ictus==""

  19. #19
    SitePoint Member
    Join Date
    Sep 2012
    Posts
    8
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You want to use httrack but as other posters mentioned this might not be a best idea.

  20. #20
    SitePoint Member kurti's Avatar
    Join Date
    Sep 2012
    Posts
    14
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i use webcontent extractor it simply downloads all information in tables , so that you can reuse it anywhere easily


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •