Hi,
I’ve been code scraping some pages. When the data comes back in JSON or as global variables in JavaScript, I have no problem reading it using phantomjs. But today I hit a wall when I saw all the data inside a function like this:
<script type = "text/javascript">
(function($){ var store_locator = new TTCore.App.Modules.StoreLocator({"id":"store_locator_553d2364c96bc","name":"Store Locator","ajax_url":"http:\/\/www.somerul.com/ajax/search","map_type_control":"ROADMAP","map_info_location":"map","marker_base_url":"http:\/\/www.someurl.com\/skin\/frontend\/Sainsburys\/default\/tt\/google\/images\/markers","default_search_text":"Enter Postcode (or Town)","lat":53.083592,"long":-2.417904,"zoom_level":6,"location_count":5,"show_limit":10,"pan_control":1,"draggable":1,"scroll_wheel":0,"street_view_control":0,"zoom_control":1,"zoom_to_closest":1,"show_unused_locations":1,"show_default_on_load":0,"locations":[{"index":1,"id":1,"lat":51.42919336,"long":0.331847238,"name":"Jack And George","address":" //it just goeas on like this for 600 KB ] });
</script>
In the original text all the text is escaped as well. I mean there’s a back slash before each quotation mark.
Any idea how I can get into that self invoking function and read those variables?
long
lat
name
I tried doing it using regex, but it’s a complicated 600 KB text, full of backslashes, and quotation marks. It’s not an easy task for me.
Any ideas?