SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Strip Unwanted Formatting from Pasted Content

    I have a WYSIWYG editor visitors use to create blog posts. Often they use this to cut content from other sources, like MS Word or another web page, and paste the content into the WYSIWYG.

    When they cut & paste their content, it brings with it a whole mess of additional formatting, skewing the post content.

    I could strip all formatting from the posted content on the server side, but this would defeat the purpose of having a WYSIWYG.

    The best option I can think of is to use javascript/jQuery to strip the formatting before the post is submitted. I would likely use keyUp() and keyDown() for this.

    Step 1: Save cursor position upon keyDown()

    Step 2: Save cursor position upon keyUp()

    Step 3: Use regex to strip formatting from everything between keyUp and keyDown.

    This would allow me to operate exclusively on the freshly pasted content while keeping the formatting the user has previously created via the WYSIWYG.

    I have a few questions. Most importantly, does this sound like the most practical solution? I'm not even sure yet if keyUp() will register the cursor position as being before or after the pasted content.

    Also, which jQuery/javascript function captures the cursor position?

    Finally, this is for a Wordpress site. If anyone knows of a plugin that already addresses this problem, please let me know.

  2. #2
    Barefoot on the Moon! silver trophy Force Flow's Avatar
    Join Date
    Jul 2003
    Location
    Northeastern USA
    Posts
    4,615
    Mentioned
    56 Post(s)
    Tagged
    1 Thread(s)
    Are you building your own WYSIWYG editor, or are you using a pre-built one? Most of the pre-built ones have a feature where you can allow or deny specific HTML tags.

    Take a look at TinyMCE.
    Visit The Blog | Follow On Twitter
    301tool 1.1.5 - URL redirector & shortener (PHP/MySQL)
    Can be hosted on and utilize your own domain

  3. #3
    SitePoint Enthusiast
    Join Date
    Aug 2008
    Posts
    96
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm using Tinymce. I'd thought about only allowing just the basics like you suggested. But it's actually those pesky basic tags like <br />, that are messing things up. I'm fairly well on my way to getting this issue fixed. I found this bit of code earlier, which records the caret position:

    Code:
    	getCursorPosition = function(editor) {
    	        var input = editor.get(0);
    	        if (!input) return; // No (input) element found
    	        if ('selectionStart' in input) {
    	            // Standard-compliant browsers
    	            return input.selectionStart;
    	        } else if (document.selection) {
    	            // IE
    	            input.focus();
    	            var sel = document.selection.createRange();
    	            var selLen = document.selection.createRange().text.length;
    	            sel.moveStart('character', -input.value.length);
    	            return sel.text.length - selLen;
    	        }
    	 };
    I should have the plugin knocked out over the weekend, and hopefully included in WP's public repository sometime next week.

  4. #4
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,115
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    Hi,

    I've built a rich text editor before, here was my function to clean up the html. Should give you a few ideas.
    I didn't bother capturing the pasted content and only stripping that, I just parsed the whole content after a paste.
    Code javascript:
    // removes MS Office generated guff
    cleanHTML: function() {
      var input = this.textarea.value;
      // 1. remove line breaks / Mso classes
      var stringStripper = /(\n|\r| class=(")?Mso[a-zA-Z]+(")?)/g; 
      var output = input.replace(stringStripper, '');
      // 2. strip Word generated HTML comments
      var commentSripper = new RegExp('<!--(.*?)-->','g');
      var output = output.replace(commentSripper, '');
      var tagStripper = new RegExp('<(/)*(meta|link|span|\\?xml:|st1:|o:|font)(.*?)>','gi');
      // 3. remove tags leave content if any
      output = output.replace(tagStripper, '');
      // 4. Remove everything in between and including tags '<style(.)style(.)>'
      var badTags = ['style', 'script','applet','embed','noframes','noscript'];
      }
      for (var i=0; i< badTags.length; i++) {
        tagStripper = new RegExp('<'+badTags[i]+'.*?'+badTags[i]+'(.*?)>', 'gi');
        output = output.replace(tagStripper, '');
      }
      // 5. remove attributes ' style="..."'
      var badAttributes = ['style', 'start'];
      for (var i=0; i< badAttributes.length; i++) {
        var attributeStripper = new RegExp(' ' + badAttributes[i] + '="(.*?)"','gi');
        output = output.replace(attributeStripper, '');
      }
      this.textarea.value = output;
    }
    IE has an onpaste handler you can hook into, for the other browsers I just checked for the Ctrl + V combo.

  5. #5
    padawan silver trophybronze trophy markbrown4's Avatar
    Join Date
    Jul 2006
    Location
    Victoria, Australia
    Posts
    4,115
    Mentioned
    28 Post(s)
    Tagged
    2 Thread(s)
    Finally, this is for a Wordpress site. If anyone knows of a plugin that already addresses this problem, please let me know.
    Wordpress already has "Paste from Word" option in the visual toolbar.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •