webscraping
October 27, 2016
🔗Webscraping using python + chrome
In mac OS X, install chromedriver and selenium:
brew install chromedriver
Install selenium via pip:
pip install -U selenium
if required to install globally
sudo pip install selenium
To test, open a terminal, type python and try:
Python 2.7.12 (default, Oct 11 2016, 05:20:59)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get("https://nbari.com")
>>>
🔗scraping
The following example will do:
- Fill a form,
- Parse links of the results.
- Open each link.
- Get text data from a div.
- Go back to step 2 until all links are parsed.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get("https://search.cro.ie/company/CompanySearch.aspx")
inputElement = driver.find_element_by_id("ctl00_ContentPlaceHolder1_textCompanyName")
inputElement.send_keys("limo")
inputElement.send_keys(Keys.ENTER)
links = driver.find_elements_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_GridView1']/tbody/tr/td/a")
data = {}
for i in xrange(0, len(links)):
links = driver.find_elements_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_GridView1']/tbody/tr/td/a")
idx_name = links[i].text
links[i].click()
div = driver.find_element_by_id("companyDetails")
data[idx_name] = div.text
driver.execute_script("window.history.go(-1)")
for key, value in sorted(data.iteritems()):
print "-" * 64
print "%s\n%s" % (key, value)