Webscraping using python + chrome

In mac OS X, install chromedriver and selenium:

brew install chromedriver

Install selenium via pip:

pip install -U selenium

if required to install globally sudo pip install selenium

To test, open a terminal, type python and try:

Python 2.7.12 (default, Oct 11 2016, 05:20:59)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get("https://nbari.com")
>>>

scraping

The following example will do:

  1. Fill a form,
  2. Parse links of the results.
  3. Open each link.
  4. Get text data from a div.
  5. Go back to step 2 until all links are parsed.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get("https://search.cro.ie/company/CompanySearch.aspx")
inputElement = driver.find_element_by_id("ctl00_ContentPlaceHolder1_textCompanyName")
inputElement.send_keys("limo")
inputElement.send_keys(Keys.ENTER)
links = driver.find_elements_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_GridView1']/tbody/tr/td/a")

data = {}
for i in xrange(0, len(links)):
    links = driver.find_elements_by_xpath("//*[@id='ctl00_ContentPlaceHolder1_GridView1']/tbody/tr/td/a")
    idx_name = links[i].text
    links[i].click()
    div = driver.find_element_by_id("companyDetails")
    data[idx_name] = div.text
    driver.execute_script("window.history.go(-1)")

for key, value in sorted(data.iteritems()):
    print "-" * 64
    print "%s\n%s" % (key, value)