Results 1 to 3 of 3
Thread: Parsing HTML
Apr 25, 2006, 09:34 #1
I just noticed that the firefox bookmarks page in over 500KB, while it displays only 22KB of text on the screen. I carry this file with me on my disk-on-key so it is important for it to be small.
I want to parse entrys like this too make them smaller:
id="rdf:#$FOQot">Knowledge Base: What Can I Do With My Axim?</a>
I want to parse it to remove the attributes add_date, last_charset,
id, and others that are in other entries. One of the atributes is a represantaion of the favicon, which is too big to post here! The regexes that I was trying were oleaving me with things like: <a></a>, <>, <a href="http://address" ="something" ="somehing else">blahblah</a>. Also, I was unable to parse the WHOLE file at once- my test code would only work if on one entry at a time- thats impractical.
Any ideas? Thanks.
Apr 25, 2006, 11:56 #2
- Join Date
- Dec 2003
- Atlanta, GA
- 0 Post(s)
- 0 Thread(s)
First of all, the bookmarks file for Firefox is XML, not HTML. Second, I'm not sure what attributes Firefox requires and which are optional. Might want to do some research on the structure of bookmarks.xml before you start hacking out stuff that looks "optional".
Apr 25, 2006, 12:45 #3