Results 1 to 3 of 3
Apr 9, 2013, 10:55 #1
- Join Date
- Oct 2008
- Pretoria, South Africa
- 0 Post(s)
- 0 Thread(s)
HTML Parsing to remove MS Word formatting from xml-rpc request
What I need to do is on WordPress, but all WordPress related stuff works as expected and as such my issue is PHP related.
Basically when you post to WordPress from Word (using xml-rpc), Word inserts <span>-tags width font-family (usually times new roman) and font-size (in px) for each paragraph. I have written a function/plugin to intercept the information before it is saved to the database, and remove it.
The current code that does this is as follows:
$content = preg_replace('/<span\sstyle="font-family\s?:\s?([^;]*)\s?;\s?font-size\s?:\s?([^;]*);\s?">(.*?)<\/span>/is','$3',stripslashes($content));
<p><span style=\"font-family:Times New Roman; font-size:12pt\">16/08/2009 </span></p><p><span style=\"font-family:Times New Roman; font-size:12pt\"><strong>Votum:</strong> Ps.121:1 </span></p>
It does not currently work (the current code works when I re-save the post/page using WordPress, but not directly from the xml-rpc request).
I have also decided to maybe modify it so that it will first strip outCode:
I understand that this will require parsing it as html dom (something I have not done before). How would I go about doing it (or are there a script I can just incorporate).
If I get this to work, I will publish it as a simple to use WordPress Plugin.
The website runs on PHP5.4.
All help will be greatly appreciated.