Watir-Webdriver: Control the Browser

Key Takeaways

Watir-WebDriver is a Ruby gem that enables browser automation, simulating real user actions like clicking and typing, which is essential for comprehensive web application testing.
Unlike its predecessor Watir, which only supports Internet Explorer, Watir-WebDriver extends support to major browsers including Chrome, Firefox, and Safari, leveraging WebDriver for more realistic user interaction emulation.
The syntax of Watir-WebDriver is designed to be intuitive and straightforward, mirroring human behavior on web pages by interacting with HTML elements rather than relying on screen coordinates.
Installation and usage are straightforward: install the gem, write simple Ruby scripts to control browser actions, and run these scripts to automate web interactions.
Watir-WebDriver integrates seamlessly with Selenium WebDriver, enhancing its capabilities with a more object-oriented and consistent syntax, although knowledge of Selenium is beneficial for advanced usage.
The tool’s flexibility is demonstrated through comprehensive support for various web elements and actions, from filling out forms and handling pop-ups to dealing with dynamic content and taking screenshots.

watir

Watir-WebDriver (Watir is short for Web Application Testing in Ruby) is a Ruby gem which allows you to automate your browser (make it click a button, submit a form, wait for some text to appear before continuing, and so on). With its help, a real user can be simulated, allowing you to automate the full stack testing of your web application.

Watir-WebDriver syntax is very clean and inspired by similar frameworks in other languages (Watij for Java and Watin for C#). It is also a well-maintened gem with over 68 releases.

Be aware that Watir (classic) and Watir-WebDriver (both separate Ruby gems) are not the same thing! Watir only supports Internet Explorer, while Watir-WebDriver supports Chrome, Firefox and Safari as well. Think of Watir-WebDriver as Watir 2.0, or as Watir (classic) + WebDriver + some additional features. WebDriver was started by Google to allow browser automation tools to get closer to simulating real user behavior. Even better, all major browser automation frameworks have implemented it.

Selenium 2.0, which is a major part of Watir-Webdriver, also describes itself with this simple formula: Selenium 1.0 + WebDriver = Selenium 2.0.

Let’s (Automatically) Fill Out This Ruby Survey!

Your job is to go to this web page (you can also download it from here, unzip it and replace the ‘browser.goto’ argument with the local path to the form.html file.) and fill it out automatically with Watir-WebDriver.

To get started, first install Watir-webdriver with gem install watir-webdriver. Also, make sure you have Firefox installed (we’ll be using it for this example).

Save the following into a new watir_script.rb file and run it:


require 'watir-webdriver'
browser = Watir::Browser.new :firefox # should open a new Firefox window
browser.goto 'http://nitrowriters.com/form/form.html' # or type the local path to your downloaded copy
browser.text_field(:id => 'my_text_field').set 'Yes!'
browser.textarea(:class => 'element textarea medium').set 'It was a long time ago, I do not remember'
browser.radio(:name => 'familiar_rails', :value => '1').click # yes, I'm very familiar
sleep 2 # puts the entire program to sleep for 2 seconds, so you can see the change
browser.radio(:name => 'familiar_rails', :value => '3').click # actually, just a bit...
browser.text_field(:name => 'favorite_1').set 'Yukihiro' # the creator of Ruby
browser.text_field(:id => 'favorite_2').set 'Matsumoto' # is my favorite Ruby person!
browser.checkbox(:index => 1).click # I like the TDD culture
browser.checkbox(:index => 2).click # And Matz!
sleep 2 # puts the entire program to sleep for 2 seconds, so you can see the change
browser.checkbox(:index => 1).click # Oh well, I like only Matz..
browser.select_list(:id => 'usage').select 'Less than a year'
browser.select_list(:id => 'usage').select_value '2' # Changed my mind
# Here I entered C:/watir.txt because I had such a file inside my C: directory. Please be sure
# to enter a valid path to a file, or your script will report 'No such file or directory' error
browser.file_field.set 'C:/watir.txt' # Change this path to any path to a local file on your computer
puts browser.p(:id => 'my_description').text

I highly recommend you don’t continue with this article until you’ve run this script successfully. There’s nothing more motivating than imagining the possibilities of how you can automate your browser after seeing your first Watir-WebDriver script in action. Read on after you’re done.

How a Real User Interacts With a Web Page

Go and open that web page again. Fill it in manually. Put any values you want in it. Finished? Good. Now, ask yourself: “How did I interact with this page? What did I do?” If you observe yourself carefully, you do 2 things when trying to interact with this or any other web page on the Internet:

a) Visually identify the part of the page you want to interact with (either a text box, a link, clicking on the title of a web page and so on).

b) Perform some action on that part of the page. Most of the time, you click on that part, but sometimes you also enter text. These 2 things are what you do 99% of the time: click (with your mouse) and type things (with your keyboard)

You cannot do b) without a). If you don’t visually identify what element you want to click on or type something in, then it’s pointless.

Previously, we’ve mentioned that Watir-WebDriver’s primary job is to simulate a real user. We know how a real web user interacts with a real web page (by observing ourselves filling out the form manually), so let’s take a look at how the above Watir-WebDriver script works.

Clean & Predictable Syntax

Do a quick analysis of the first 2 lines of the code:


require 'watir-webdriver'
browser = Watir::Browser.new :firefox

By running just the first 2 lines, a new browser window is launched. The second line is the equivalent of a real user opening a Firefox window.

Line 3 starts giving orders to that newly launched browser window. The script tells it to go to a specific web page, which is the equivalent of a real web user typing something into the browser address bar and hitting ‘Enter’:


browser.goto 'http://nitrowriters.com/form/form.html'

Things should start becoming clear starting on line 5


browser.text_field(:id => 'my_text_field').set 'Yes!'

Remember a) and b) on how a real user interacts with a web page? Here’s where the Watir-WebDriver syntax shines. A real surfer would first identify the ‘part’ of a specific page to interact with, followed by performing some action on that ‘part’. How do we tell Watir-WebDriver, in computer terms, where that part is located on the web page? Some browser automation tools use screen coordinates (we’ll later see why this is very unreliable approach). Watir-WebDriver, however, uses HTML elements, relying on a basic truism:

An HTML web page, whether it’s the SitePoint homepage or the form page we’re working with, is essentially just a collection of HTML elements.

What any user, including Watir-WebDriver, does is interact with those elements. Click on buttons (which is a HTML element), enter text in a text field (which is also a HTML element), click on links (links are HTML elements too), and so on. Whatever action you perform on a page, it’s done on some HTML element which is part of a collection we call a web page. The creators of Watir-WebDriver, knowing this, decided to create a fairly intuitive syntax to allow you to simulate almost anything a real user would do on a particular webpage. The syntax structure is:


[browser-instance].[html-element-tag-name](with specific attributes).[action]

You can see this starting at line 5:


browser.text_field(:id => 'my_text_field').set 'Yes!'
# Dear Watir, please find the text field HTML element with the ID attribute of 'my_text_field' and type 'Yes!' into it

browser is the name of the browser instance variable created on line 2. This variable whatever can be named anything:


happy_browser = Watir::Browser.new :firefox

as long as you propagate the rename to each subsequent line:


happy_browser.element_name(attributes).action

text_field(:id => 'my_text_field') is the syntax Watir-WebDriver uses to identify text fields (you can see the syntax Watir-WebDriver uses for various HTML elements here). Inside the parentheses, you have key-value pairs where the key is a particular attribute and the value is the value of that attribute.

You can specify more than 1 key-value pair:


browser.radio(:name => 'familiar_rails', :value => '1').click

Finally, set is the action to perform on that HTML element. In the text_field case, set means to set the text to ‘Yes!’.

Your New Best Friend: Right-Click Inspect

In order to fundamentally understand the how script works, it’s a good exercise to go through the thought process of how it was made in the first place.

By now, you should understand what the first 4 lines do: open a new browser window, ready to take our commands!

The next line is:


browser.text_field(:id => 'my_text_field').set 'Yes!'

browser is just the name of the variable representing the browser. What about the second .text_field bit?

I right-clicked on the element I wanted to interact with (the text box under “Are you a big Ruby fan?”) and then clicked on Inspect (depending on the browser, this could be called ‘Inspect element’ or just ‘Inspect’). The resulting inspection window shows:


  <input id="my_text_field" name="my_text_field" class="element text medium" maxlength="255" value="" type="text">

The


<input type="text"/>

can be found in this table mapped to the text_field method in Watir-WebDriver. Cool, but what if there were more than 1 text_field element on the page? If you just wrote browser.text_field.set 'Yes!' and there were 3 text elements, then Watir-WebDriver would automatically select the first such element. Often, this is not what we want. We need a way to distinguish HTML tags so Watir-WebDriver can always select the right one. This is where HTML attributes come in.

If the element you want to work with has an ID attribute, you’re lucky! According to the W3C, the value of an element ID must be unique within the HTML document. In our case, the


<input type="text"/>

tag has an ID, so I’ve used that. Finally, I call the set method on that element, which performs some action on it, like, typing text into it (we’ll talk more about actions later).

Your final job is to find a unique way to identify a particular HTML element on a particular page. You don’t want Watir-WebDriver to select the wrong element or not find the element at all (in which case, your entire program would crash).

Let’s go to the next line:


browser.textarea(:class => 'element textarea medium').set 'It was a long time ago, I do not remember'

Same concept. The difference between


<input type="text"/>

and


<textarea></textarea>

is that the latter supports multiple lines. You could replace textarea with text_field in this line and the code would work, but Watir-WebDriver would issue this warning: "Locating textareas with '#text_field' is deprecated. Please, use '#textarea' method instead.".

Radio Buttons


browser.radio(:name => 'familiar_rails', :value => '1').click
sleep 2
browser.radio(:name => 'familiar_rails', :value => '3').click

Here, I’ve used 2 attributes to distinguish the radio buttons. Instead of using click as an action method, you can also replace click with set which sounds more user-friendly but does the same thing. You could also use the set? method which would return true or false depending on whether that radio button has been selected:


browser.radio(:name => 'familiar_rails', :value => '1').click
browser.radio(:name => 'familiar_rails', :value => '1').set? #=> true
browser.radio(:name => 'familiar_rails', :value => '3').set # same as click
browser.radio(:name => 'familiar_rails', :value => '1').set? #=> false

Checkboxes

Here, the script interacts with some checkboxes:


browser.checkbox(:index => 1).click
browser.checkbox(:index => 2).click
sleep 2 # puts the entire program to sleep for 2 seconds
browser.checkbox(:index => 1).click

If, say, there are 10 checkbox elements on the page with same class attributes of my_checkbox, doing something like browser.checkbox(:class => 'my_checkbox').click would select the first one it encounters.

What if we want to select the second one? Use the :index attribute.

The first line in the above snippet browser.checkbox(:index => 1).click will click on the second checkbox element on page. The :index attribute accepts integers, where 0 is the first element. The second line in the above snippet, for example, will click on the third checkbox element it finds.

What does the fourth line do? It unchecks the second element (remember, in the first line we clicked on the same element, meaning we “checked” it). There’s a clearer way to do this in Watir-WebDriver:


browser.checkbox(:index => 1).set
browser.checkbox(:index => 2).set
sleep 2 # puts the entire program to sleep for 2 seconds
browser.checkbox(:index => 1).clear

Like with radio buttons, set does the same thing as click. clear, however,checks if the checkbox is “checked”, unchecking it as needed.

Also, you can use set? to see if the checkbox is already checked.

Select Lists & Files

Let’s take a look at the following part of our code:


browser.select_list(:id => 'usage').select 'Less than a year'
browser.select_list(:id => 'usage').select_value '2'

select_list corresponds to a


<select></select>

tag and its “actions” are a bit different than radio buttons or checkboxes. Each select tag has option tags inside which are the options in the list. Each option tag usually has a “value” attribute and text containing the names of the options. The above code lists 2 ways you can select a specific option: either by the “value” attribute or the actual option name.

There is also a file upload box in our form. The HTML for it is:


<input id="give_me_a_file" name="give_me_a_file" class="element file" type="file"/>

Working with upload boxes is dead simple with Watir-WebDriver. All you do is this:


browser.file_field.set 'C:/watir.txt'

Warning: Enter a valid path in the set argument or your program will crash.

Actions

As previously mentioned, real users mainly perform 2 actions on a web page: clicking and entering text.

Common sense helps a lot here. You can’t type text on a button element, for example, but you can click it. We’ve seen how you can use set and clear with checkboxes and radio buttons. If you want to simple click on a textarea you could do:


browser.textarea(:class => 'element textarea medium').click

But a more user-friendly way is:


browser.textarea(:class => 'element textarea medium').focus

Quick tip: When using only 1 attribute to select an element, you’ll often see this when analyzing other peoples code:


browser.textarea(:class, 'element textarea medium').focus

Don’t use this, however, as one of the Watir-WebDriver maintainers announced they could remove it soon.

Types of Actions

Performing actions on HTML elements isn’t all about interacting with the element. In object oriented programming, you have setters and getters. Getters get something from an object while setters change the object.

Imagine every HTML element as an object and every action you can do on that HTML element as either a getter or a setter method. You can set the text for a text field element, for example. You can also get the class attribute value for that object as well.

The last line in our form code uses a “getter” method .text:


puts browser.p(:id => 'my_description').text

Unlike the previous examples, here we tell the browser to get us the text inside the p tag with the id of “my description”. “For Watir demonstration purposes only.” is outputted in this case.

You can get any attribute for a particular element.. The syntax is:


[browser-instance].[HTML-element](:with-some => attributes).[the attribute name of the element] #=> the attribute value

Take the same p element from above and, this time, get the value of the id attribute:


puts browser.p(:text => 'For Watir demonstration purposes only.').id #=> output: 'my_description'

So far, we’ve explored “getter” action mehods”. As for “setter” actions, let’s use a bit of common sense: What is a universal “action” you can do on every single HTML element?

It’s click. You can do browser.textarea.click, browser.div.click, basically browser.[every-HTML-element](:with => every attribute).click.

Each HTML element also has its own specific actions. Go here and enter any HTML element name, “radio” for example. Click on that, and under “Instance Method Summary” you should see the particular methods or actions it can perform. In this example, they are set and .set?. click is also found lower in the list, inherited from the Element class, confirming the ability to use click on any HTML element.

Other Browsers

Unlike Firefox, if you want to use Watir-WebDriver with Chrome, Internet Explorer, or Safari, you’ll have to download the “WebDriver” editions of these browsers (as well as having the actual browsers installed on your machine). The “WebDriver edition” of a browser is a single file you need to place in your operating system load path. I put the file in my [ruby-installation-folder]/bin folder, which is already in the operating system path. I encourage you to run our form example at the beginning of this article with Chrome and IE/Safari if you’re on Windows/Mac.

In case you’re confused how this “web-driver” file fits in the big picture, think of it this way: You have all kinds of hardware on your computer. Take your graphics card. Your graphics card will be useless unless you have a way to “connect” to it by installing a driver. Think of the browser as the graphics card and the web driver file as the driver file allowing Watir-WebDriver to connect to the actual browser and “drive” it.

Vs Record and Playback Tools

There are many “Record and Playback” testing tools where you basically “record” what you’re doing on a webpage and save your actions. Later on, those tools will “playback” what you’ve done. The problem with this type of software is it often relies on screen coordinates and propriety, limited scripting languages. If there’s even a slight change in the design of the page, the whole playback will (likely) cease to work and you’re forced to re-record everything over again.

Watir-WebDriver doesn’t work this way. It uses Selenium-WebDriver on its back-end and, just like with Selenium, it’s not screen coordinates, but HTML elements. Before deciding to do anything, you must tell Watir-WebDriver the exact HTML element you want to work on.

As for Watir-WebDriver vs. Selenium WebDriver, you’ll notice that Watir-WebDriver has a more object-oriented and consistent syntax than Selenium. You need to know Selenium, though, in order to fully master Watir-WebDriver. After all, Selenium-WebDriver is what is powering Watir-WebDriver in the background. If you know Selenium, you can also perform some advanced configurations.

Only the Beginning! Isn’t it Exciting?

Once you understand the foundation of Watir-WebDriver, it’s easy to continue learning it. Things like interacting with AJAX elements shouldn’t be hard to understand once you grasp the basic syntax and the philosophy behind Watir-WebDriver.

If you’re looking for more material, 2 great books that helped me are

WatirWays (free)
Watir Recipes ($9.99)

There’s also Web Application Testing in Ruby (free) by one of the Watir-WebDriver main contributors, containing great information like setting up the webdriver editions of each browser, etc.. The more you learn about Watir-WebDriver, the more tempted you’ll be to explore its beatiful syntax and elegance.

Frequently Asked Questions (FAQs) about Watir WebDriver

What are the main differences between Watir and Selenium WebDriver?

Watir and Selenium WebDriver are both popular tools for automating web browsers. However, they have some key differences. Watir is written in Ruby and is designed to be easy to use, with a simple and intuitive API. It supports multiple browsers, including Internet Explorer, Firefox, Chrome, and Safari. Selenium WebDriver, on the other hand, supports multiple programming languages, including Java, C#, Python, Ruby, and JavaScript. It also supports multiple browsers, but its API can be more complex and less intuitive than Watir’s.

How can I handle pop-ups and alerts using Watir?

Watir provides several methods to handle pop-ups and alerts. For example, you can use the browser.alert.ok method to click the OK button on an alert, or the browser.alert.close method to close an alert. You can also use the browser.alert.text method to get the text of an alert. For pop-ups, you can use the browser.window method to switch to the pop-up window, and then interact with it as you would with any other browser window.

Can I use Watir to test mobile applications?

While Watir is primarily designed for automating web browsers, it can also be used to test mobile applications using a tool like Appium. Appium is a mobile application automation framework that supports both Android and iOS, and it can be used with Watir to automate mobile applications. However, keep in mind that this requires additional setup and configuration, and may not provide the same level of functionality as dedicated mobile testing tools.

How can I handle dynamic elements with Watir?

Watir provides several ways to handle dynamic elements. One of the most common ways is to use the wait_until method, which waits until a certain condition is met before proceeding. For example, you can use browser.button(:id => 'submit').wait_until(&:present?) to wait until a button with the ID ‘submit’ is present on the page. You can also use the when_present method to perform an action when an element becomes present.

Can I use Watir with headless browsers?

Yes, Watir supports headless browsing, which allows you to run your tests without displaying a graphical user interface. This can be useful for running tests in environments where a display is not available, or for speeding up your tests. You can enable headless browsing by using the :headless option when creating a new browser instance, like so: browser = Watir::Browser.new :chrome, headless: true.

How can I take screenshots with Watir?

Watir provides a screenshot method that allows you to take screenshots of the current browser window. You can use browser.screenshot.save 'screenshot.png' to save a screenshot to a file named ‘screenshot.png’. The screenshot will be saved in the current working directory, unless you specify a different path.

How can I handle cookies with Watir?

Watir provides several methods for handling cookies. You can use the cookies method to get a list of all cookies, the add method to add a new cookie, the delete method to delete a cookie, and the clear method to delete all cookies. For example, browser.cookies.add('name', 'value') adds a new cookie with the name ‘name’ and the value ‘value’.

Can I use Watir with Cucumber for BDD?

Yes, Watir can be used with Cucumber, a popular tool for Behavior-Driven Development (BDD). Cucumber allows you to write your tests in a natural language format that is easy to read and understand, and it can be used with Watir to automate the execution of these tests. This can make your tests more maintainable and easier to understand, especially for non-technical stakeholders.

How can I handle frames and iframes with Watir?

Watir provides the frame and iframe methods to handle frames and iframes. You can use these methods to switch to a frame or iframe, and then interact with it as you would with any other browser window. For example, browser.frame(:id => 'myframe').text_field(:id => 'mytextfield').set 'Hello, world!' sets the value of a text field inside a frame with the ID ‘myframe’.

How can I handle dropdown menus with Watir?

Watir provides the select_list and option methods to handle dropdown menus. You can use these methods to select an option from a dropdown menu, like so: browser.select_list(:id => 'mydropdown').option(:value => 'myoption').select. This selects the option with the value ‘myoption’ from a dropdown menu with the ID ‘mydropdown’.