SitePoint Sponsor

User Tag List

Page 1 of 3 123 LastLast
Results 1 to 25 of 51
  1. #1
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Making text hard to copy ?

    Ok, first of all, I'm fully aware, that if I put text into html code, that someone could then easily use that text in their website. Now, supposing, just supposing, that you've done many, many years researching something and then put those results on your website, and you wanted to make it as hard as possible for someone to copy that text, how would you do it, BUT, still have the text spiderable by bots please ?

    Any help appreciated.

    Dez.

  2. #2
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    246
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Display all text as images, but with 'Alt' text??


    - Vince

  3. #3
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,038
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    IMHO any effort you expend towards this will be wasted. If spider bots can get the text so can scraper bots. Once anything is online its "out there" forever.

    So what would I do?
    Hire a Lawyer and sue anyone that used it illegally.

  4. #4
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question

    Thanks Vince and Mittineague, it's appreciated. The image alt text would be tricky with so much text.

    How about pdf's ? Are they spiderable ?

  5. #5
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,038
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    AFAIK, PDF files can be crawled but only the "text" portion. Equivalent to reading the text as rendered in HTML - i.e. no tag attribute values

  6. #6
    SitePoint Wizard bronze trophy bluedreamer's Avatar
    Join Date
    Jul 2005
    Location
    Middle England
    Posts
    3,349
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I think this goes back to the old saying - "if you don't want it copied - don't publish it". Not what you wanted to hear, but it's well nigh impossible protecting any web content.

  7. #7
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by bluedreamer View Post
    but it's well nigh impossible protecting any web content.
    In fact, it defeats the POINT of publishing on the Internet.

  8. #8
    Non-Member
    Join Date
    Jun 2010
    Location
    4727′35″N 2618′0″E
    Posts
    1,789
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    image = ocr to text = text
    pdf text = edit-copy protection removed = text
    pdf image = ocr to text = text

    what's left: flash, silverlight, video. all of which can be transcribed/transformed if needed.

    published content = shared content = IP is the only way to protect it

  9. #9
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question

    Thanks all, the answers were expected. So, yep, all conent can be transformed to text, but some ways make it harder than other ways, for other people to transform, which ways would be the hardest to transform, but still be spiderable please ?

    One other thing, what does the bit below mean ?

    "published content = shared content = IP is the only way to protect it"

  10. #10
    Non-Member bronze trophy
    Join Date
    Nov 2009
    Location
    Keene, NH
    Posts
    3,760
    Mentioned
    23 Post(s)
    Tagged
    0 Thread(s)
    Protecting it will generally break search engines/spiders, just as it usually breaks accessibility. HTML is just not designed for this, and attempting to do so is a total waste of time and effort.

    "IP is the only way to protect it" means intellectual property rights - copyright it (which is instant/free for web publishing now) and sue people that copy it.

    ANY other approach and you are just spinning your wheels over nothing.

  11. #11
    SitePoint Addict
    Join Date
    Mar 2005
    Posts
    246
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here's another idea for you which may be a little 'out-of-the-box' but would still achieve what you need, and may in fact be a better option.

    Create an eBook from your research content, and put it onto Amazon.

    This way you can still have all the keyword terms for spiders within the description, and let Amazon cover the security / copy protection side of things.

    Amazon predicts that it will sell more e-books than paperbacks by the end of next year, so you could gain a few pennies in the process.

    Hope that helps,

    - Vincent

  12. #12
    Non-Member
    Join Date
    Jun 2010
    Location
    4727′35″N 2618′0″E
    Posts
    1,789
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    research usually means discoveries. if it's innovation than a patent will keep you safe.

    if you're selling a solution, take example from those promoting "10 ways to get rich". they blab about it w/o saying anything tangible, but they manage to slip in all the key words. then, based on a subscription, you buy or read the methods.

    same for you: blab about it slipping in all the key words for the bots to find in a normal web page, but keep the essential part out of it. don't bother with pdf or images. build a subscription or a buying mechanism for those wanting the essential.

    or just put it out in the open, using a licence.

  13. #13
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,881
    Mentioned
    122 Post(s)
    Tagged
    1 Thread(s)
    To be honest, the harder you make it to copy, the more likely people are to copy and re-publish it, purely out of spite. There are many legitimate reasons for people wanting to highlight and copy text that don't involve plagiarism, and if you try to interfere with that in any way then you will become very unpopular, and your site will become very unpopular. And the people who have copied your content and made it accessible on their websites will get all of your traffic.

    If people want to copy it, they will find a way to copy it. If you're going to publish it, you can't stop them, and it isn't worth trying - and that is particularly true if you want it spiderable. Any method you use will make it more difficult for people to read and interact with the site in they way that they want to.

    Sure, some people may copy what you've written? Is that the end of the world?

  14. #14
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,139
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    Taking the idea of using an image farther you *could* program a page to generate text as an image. Than use that based on the user agent. That would surely stop idiots.

    If I had to do this I would generate a script that would take several arguments and build out a image of the text via external script. Similar to how you would generate images in the database or behind the site root. Some useful arguments would be perhaps, the font, width, etc so that I can control the layout of text and have it fit the design. Perhaps even taking it further integrate the concept of columns into the mix. I would than cache each unique image generated to prevent the intensive process of building out the image for all users, besides the first request.

  15. #15
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,000
    Mentioned
    101 Post(s)
    Tagged
    0 Thread(s)
    One technique I have read about a couple of years ago (I think it may have been a thread somewhere in the SitePoint forums) was the use of two layers or something like that, they had alternate letters on each and when combined the whole of the text was viewable, it could probably be bypassed though via screen-grabs and ocr.

    Possibly felgall or AlexDawson might be able to remember what the technique was called.
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  16. #16
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,881
    Mentioned
    122 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by SpacePhoenix View Post
    One technique I have read about a couple of years ago (I think it may have been a thread somewhere in the SitePoint forums) was the use of two layers or something like that, they had alternate letters on each and when combined the whole of the text was viewable, it could probably be bypassed though via screen-grabs and ocr.

    Possibly felgall or AlexDawson might be able to remember what the technique was called.
    I remember that technique being discussed, the unanimous conclusion was thar if you thought it was a good idea to inflict that on people, you probably shouldn't be let out on your own!

  17. #17
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    5,000
    Mentioned
    101 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Stevie D View Post
    I remember that technique being discussed, the unanimous conclusion was thar if you thought it was a good idea to inflict that on people, you probably shouldn't be let out on your own!
    Can you remember what the title of that thread was or which forum it was in? I tried to find it via google but haven't had much luck trying to find it so far
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  18. #18
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,038
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    I am unfamiliar with that thread. I like the idea - but only as an excercise in programming.

    I imagine it would be fairly easy to do if the text was monospace, otherwise the layers would look a mess.

    And you could throw Accessibilty out the window. Imagine a screen reader user getting something like

    Code:
    <div id="layer1">W l o e t   y w b i e   e l f e   o r a , b t d n t y u d r   o y a y h n !</div>
    <div id="layer2"> e c m   o m   e s t . F e   r e t   e d   u   o '   o   a e c p   n t i g </div>

  19. #19
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,788
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    The most effective way to achieve what the OP is asking for is to use a PDF with copy protection turned on. Breaking that copy protection may be possible but it is still far more effective than anything that can be done in HTML (apart from suggestions like those already made which would make the page totally unusable to a large number of potential visitors while still being as easy to bypass as the protection in a PDF. (for example the alternate letter approach could be easily resolved by taking a screen print and feeding that through a decent OCR).
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  20. #20
    Mouse catcher silver trophy Stevie D's Avatar
    Join Date
    Mar 2006
    Location
    Yorkshire, UK
    Posts
    5,881
    Mentioned
    122 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by Mittineague View Post
    I am unfamiliar with that thread. I like the idea - but only as an excercise in programming.

    I imagine it would be fairly easy to do if the text was monospace, otherwise the layers would look a mess.

    And you could throw Accessibilty out the window. Imagine a screen reader user getting something like

    Code:
    <div id="layer1">W l o e t   y w b i e   e l f e   o r a , b t d n t y u d r   o y a y h n !</div>
    <div id="layer2"> e c m   o m   e s t . F e   r e t   e d   u   o '   o   a e c p   n t i g </div>
    Yes, that's pretty much exactly how it went. And yes, you would have to use monospace fonts. I think you would also have to use a non-breaking space for every replaced letter, to make sure that lines break nicely between words.

  21. #21
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by deathshadow60 View Post
    "IP is the only way to protect it" means intellectual property rights - copyright it (which is instant/free for web publishing now) and sue people that copy it.
    Thanks, deathshadow60;4717857, What are the recommended ways of copyrighting it for free as you suggest above please ?

  22. #22
    SitePoint Wizard
    Join Date
    Jun 2005
    Posts
    1,428
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by felgall View Post
    The most effective way to achieve what the OP is asking for is to use a PDF with copy protection turned on.
    Thanks Stephen, it's appreciated. Do all pdf programmes come with the facility of copy protection ? I can't seem to find that in my pdf nitro professional ?

    Also, is text in pdf's easily found by the googlebots ?

  23. #23
    SitePoint Evangelist Ed Seedhouse's Avatar
    Join Date
    Aug 2006
    Location
    Victoria, B.C. Canada
    Posts
    592
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dez View Post
    deathshadow60;4717857, What are the recommended ways of copyrighting it for free as you suggest above please ?
    If your country is part of the international agreement on copyright then any original work is copyrighted to the originator automatically.

    Of course, you have to have some way of later proving that you are the originator in order to enforce your copyright, but the copyright itself is atomatic in most countries including, now, the USA.
    Ed Seedhouse

  24. #24
    SitePoint Addict Iceman90's Avatar
    Join Date
    Mar 2006
    Location
    Calgary, Alberta, Canada
    Posts
    391
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If someone really wants it, they'll find a way to steal your content.

  25. #25
    SitePoint Addict Iceman90's Avatar
    Join Date
    Mar 2006
    Location
    Calgary, Alberta, Canada
    Posts
    391
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dez View Post
    Also, is text in pdf's easily found by the googlebots ?
    I've seen several PDFs archived by Google. But you can always use robots.txt to tell robots not to indes your PDFs.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •