What is Selenium?

By its formal definition, "Selenium is a suite of tools for automating web browsers", and that is really what it is. While I think some people might have misunderstood it as a testing tool as I did at first, I've learned to see that Selenium is more of a general purpose tool that can be used for browser testing instead. In fact, any repetitive web-based task that you find yourself doing, you should make Selenium do it for you!

In fact, I have created a project myself, called Readminder, that helps me automate the process of searching the latest chapters and/or chapters where I left off for my favourite web comics and send me an email reminder for them. Check it out here!

Getting Started

Before we get started, there are three important steps in setting up Selenium:

1) Install the Selenium library for your desired programming language

Selenium currently supports up to 6 different programming languages: Java, Python, C# Ruby, JavaScript, and Kotlin. I've chosen Python here as it is the language known for automation/scripting and it is undeniably my favourite language :)

pip install selenium

2) Download a matching version for your browser driver

I'm choosing Chrome here since it's the most popular web browser. First, make sure you are downloading the driver for the right version by going to chome://version

Screenshot 2022-01-12 082600.png

Here, you can see that the version I have is 97.0.4692.71, yours will probably be different. Don't worry if there's no exact matching version listed. Choose the version closest to it. Likely the one to choose is where the last 2 digits won't match and that is fine.

3) Set up the necessary path for the drivers

Depending on your system, it will look slightly different on how you add the location of your drivers to your PATH environment variable.

Bash

echo 'export PATH=$PATH:/path/to/driver' >> ~/.bashrc
source ~/.bashrc

Zsh

echo 'export PATH=$PATH:/path/to/driver' >> ~/.zshrc
source ~/.zshrc

Windows

setx PATH "%PATH%;C:\path\to\driver"

Once the above steps are done, you are now ready to start automating your browser!

Automating your Browsing Experience

Let's have a look at the 8 fundamental steps or actions you can take when automating the browser with Selenium.

Step 1: Start the session

driver = webdriver.Chrome()

Step 2: Take action on the browser

driver.get("http://www.google.com")

The most common web browser action you can take is navigating to a web page. In the above example, we are opening a web page with the URL http://www.google.com. Some other useful web browser actions include:

Other Browser Navigations

driver.back()    # Clicks the browser’s back button
driver.forward()    # Clicks the browser’s forward button
driver.refresh()    # Refreshes the current browser's page

Cookies are useful for storing and loading user information. To add a cookie to the current browsing context, we can use the add_cookie function which accepts a cookie serializable JSON object.

driver.add_cookie({"name": "key", "value": "value"})

The table below describes the fields of a serialized cookie and whether it is optional when provided to add a cookie. Screenshot 2022-01-23 143445.png

Working with Windows and Tabs

Every window and tab has what it's called window handle, which is used to uniquely identify it. To get the window handle of the current window or tab, we can use the following property:

driver.current_window_handle

When we create a new tab or window, the focus will be shifted onto the new window or tab on screen

driver.switch_to.new_window('tab')    # Switch to a new tab
driver.switch_to.new_window('window')    # Switch to a new window

To switch back to our original window, we can simply use a variable to store our original stored window handle and switch back to it.

original_window = driver.current_window_handle
# some window/tab switching logic...
driver.switch_to.window(original_window)

Finally, to close the current window or tab, we can call:

driver.close()

Step 3: Request browser information

driver.title          # get title
driver.current_url    # get current URL

There are a bunch of types of information about the browser you can request besides the title and current URL, including the following:

Cookies

If we had added a cookie before, we can get its details by its name through the get_cookie function like so:

driver.get_cookie("foo")    # Get details of a cookie with the name 'foo'

Window Size

size = driver.get_window_size()
width = size.get("width")
height = size.get("height")

Window Postitions

position = driver.get_window_position()
x1 = position.get('x')
y1 = position.get('y')

Step 4: Establish Waiting Strategy

Generally, we want to establish a waiting strategy to interact with elements in the when it has been loaded and ready to be interacted. Let's look at the 2 types of waiting strategy.

Implicit Wait

driver.implicitly_wait(10)

The simplest strategy is the implicit wait strategy. It tells the WebDriver to poll or repeatedly check the DOM for a specified amount of time when trying to find element(s) and if they are found before this specified amount of time, then we continue without waiting out the full duration (10 seconds in our example above). This waiting strategy is mostly used to make sure that we wait until the element is ready if they are not immediately available.

Explicit Wait

from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, timeout=10)
element = wait.until(EC.element_to_be_clickable((By.ID, 'someid')))

The explicit wait strategy comes in when you need more flexibility in defining exactly how long you want the WebDriver to wait. This wait strategy allows your code to halt program execution, or freeze the thread, until the condition you specify resolves. This means before the timeout (10 seconds in our example above) has elapsed, it will keep trying and waiting on the condition until it returns a true value.

Note that using implicit and explicit waiting strategies in conjunction can cause unpredictable wait times as implicit waits don't have a predefined wait time. For example, setting an implicit wait of 10 seconds and an explicit wait of 15 seconds could cause a timeout to occur after 20 seconds instead of 25.

Step 5: Find an element

from selenium.webdriver.common.by import By
search_box = driver.find_element(By.NAME, "q")
search_button = driver.find_element(By.NAME, "btnK")

Before interacting with an element, we have to first find the element. This search is done through specifying and filtering the desired elements' attribute. In the above example, we are filtering by the name attribute. Other attributes available for the By class are as follows:

ID = "id"
XPATH = "xpath"
LINK_TEXT = "link text"
PARTIAL_LINK_TEXT = "partial link text"
NAME = "name"
TAG_NAME = "tag name"
CLASS_NAME = "class name"
CSS_SELECTOR = "css selector"

There is also the find_elements() method if you want to find more than one element by the specified attribute. This method will return the list of elements found.

Step 6: Take action on element

from selenium.webdriver.common.keys import Keys
search_box.clear()
search_box.send_keys("Selenium" + Keys.ENTER)
search_button.click()

There are 4 basic types of commands that can be executed on an element.

send_keys()

Types the provided keys into an editable element.

click()

A click command is executed on the center of the element.

clear()

Resets the content of an element.

select

There is a Select class to take actions on <select> elements easier. First, you create a Select object using a WebElement that references a <select> element.

from selenium.webdriver.support.select import Select
select_element = driver.find_element(By.ID, 'selectElementID')
select_object = Select(select_element)

Then use one of the following ways to select an option:

1. Select an `<option>` based upon the `<select>` element's internal index

select_object.select_by_index(1)

2. Select an `<option>` based upon its value attribute

select_object.select_by_value('value')

3. Select an `<option>` based upon its text

select_object.select_by_visible_text('text')

Or if you're just interested in what <option> elements the <select> element contains and which <option> elements are selected:

all_available_options = select_object.options
all_selected_options = select_object.all_selected_options

Step 7: Request element information

driver.find_element(By.NAME, "q").get_attribute("value")

Elements contain information that we can extract from. Notice in the above example how we directly locate the element and requesting the value attribute from it. Oftentimes, we need to relocate the element before requesting element information as the DOM has changed since we first locate it. Now, there are 4 common categories from which we can query element information:

1. Attributes and Properties

Using get_attribute and get_property methods, we can fetch our desired attributes and properties from an element.

link_text.get_attribute("href")
img.get_attribute("src")
search_input.get_property("value")

2. Text Content

text = driver.find_element(By.CSS_SELECTOR, "h1").text

3. CSS Value

cssValue = driver.find_element(By.LINK_TEXT, "See More").value_of_css_property('color')

4. States

isDisplayed = driver.find_element(By.ID, 'textLbl').is_displayed()

is_displayed checks if the element is visible or not.

isSelected = driver.find_element(By.CSS_SELECTOR, "input[type='checkbox']").is_selected()

is_selected checks if the element is selected or not.

isEnabled = driver.find_element(By.NAME, 'btnK').is_enabled()

is_enabled checks if the element is enabled or disabled. These methods returns a boolean value. True if the check passes and False otherwise.

Step 8: End the session

driver.quit()

This quits the browser by default and the automation for this session ends.

And that's it! Using these 8 simple steps, you should be able to automate just about anything on your browser with Selenium in Python!

Hope this guide has been helpful and as always, thanks for reading!

Beginner's Guide to Selenium with Python