Problems with Scrapping in Python and getting tags data

Requirements:

Parsing NMAP to generate a list of hosts that are providing the webservices.
Use that list to get the landing page for individual protocol.
From the output of execution of protocol get the syntax as IP of the Host: Url of the Host

Previously Accomplished Tasks:

I have generated a hosts list, by using simple parsing and listening to traffic. Also, identified the port number of individual hosts.

What I am doing and Problem I am Facing:

I am trying to achieve the following (a logical step):
Use the list with IP as an identifier. Use http listener of port 80 to run a query. We should get the website/index.html(default landing page) on entering the IP address.

I have also used cURL, Wget and worked with Scrapy. But I am not able to get landing page (index.html) in a scripted form. I need to get the landing page using single IP for 1 port. After that, I will create a bigger function with multiple parameters, which will run the script on all the ports(which will be simpler, as it is to call the function with different parameters, over and over again).

Solution I have tried:

I have tried NMAP default Scraping Engine. It allows to get few properties(or tags from the page). But still can’t get the required thing ^^.

Background:

Actually, I am working with a military NGO. Right now, we are helping soldiers(who are shifted to new base) to find a new home near the base(Information is provided free of cost as a welfare service). Right now, we are working with collaboration with militarybases administration. After it is achieved, we will continue to provide other important information (like schools, utility stores, health centers near the bases).

Another Issue/Simple Example:

I can’t get multiple tags, while parsing for directory pages of militarybases(MB) website.
So far, I have NMAP, cURL, Wget and Soap to get data from the tags. I am not able to fully parse directory pages of the MB, which is the base of our overall project (gather the basic information about bases).
Example:
I am trying to parse the page: http://militarybases.co/directory/fort-meade-army-base-in-odenton-md/. But I can’t get the information from ‘Get directions’ section, which have the following html:
[google-map-v3 shortcodeid="TO_BE_GENERATED" width="600" height="400" zoom="8" maptype="hybrid" mapalign="left".... bubbleautopan="true" distanceunits="miles" showbike="false" showtraffic="false" showpanoramio="true"]

Maybe, it is due to the fact that page is broken or is there an issue with my way of implementation?

Contact Information:
Dovie Caminiti - Assistant Manager at HelpMilitary Organization
191 E Upper Wacker Dr #550, Chicago, IL 60601, United States
Contact Number: 312-212-3570 | Fax: 312-212-3569
Contact Email: Dovie@helpmilitarydotorg

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.