Results 1 to 3 of 3
Thread: Parsing HTML
Apr 25, 2006, 09:34 #1
I just noticed that the firefox bookmarks page in over 500KB, while it displays only 22KB of text on the screen. I carry this file with me on my disk-on-key so it is important for it to be small.
I want to parse entrys like this too make them smaller:
id="rdf:#$FOQot">Knowledge Base: What Can I Do With My Axim?</a>
I want to parse it to remove the attributes add_date, last_charset,
id, and others that are in other entries. One of the atributes is a represantaion of the favicon, which is too big to post here! The regexes that I was trying were oleaving me with things like: <a></a>, <>, <a href="http://address" ="something" ="somehing else">blahblah</a>. Also, I was unable to parse the WHOLE file at once- my test code would only work if on one entry at a time- thats impractical.
Any ideas? Thanks.