Tag Archives: TotT

Testing on the Toilet: What Makes a Good End-toEnd Test?

by Adam Bender

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

An end-to-end test tests your entire system from one end to the other, treating everything in between as a black box. End-to-end tests can catch bugs that manifest across your entire system. In addition to unit and integration tests, they are a critical part of a balanced testing diet, providing confidence about the health of your system in a near production state. Unfortunately, end-to-end tests are slower, more flaky, and more expensive to maintain than unit or integration tests. Consider carefully whether an end-to-end test is warranted, and if so, how best to write one.

Let's consider how an end-to-end test might work for the following "login flow":



In order to be cost effective, an end-to-end test should focus on aspects of your system that cannot be reliably evaluated with smaller tests, such as resource allocation, concurrency issues and API compatibility. More specifically:
  • For each important use case, there should be one corresponding end-to-end test. This should include one test for each important class of error. The goal is the keep your total end-to-end count low.
  • Be prepared to allocate at least one week a quarter per test to keep your end-to-end tests stable in the face of issues like slow and flaky dependencies or minor UI changes.
  • Focus your efforts on verifying overall system behavior instead of specific implementation details; for example, when testing login behavior, verify that the process succeeds independent of the exact messages or visual layouts, which may change frequently.
  • Make your end-to-end test easy to debug by providing an overview-level log file, documenting common test failure modes, and preserving all relevant system state information (e.g.: screenshots, database snapshots, etc.).
End-to-end tests also come with some important caveats:
  • System components that are owned by other teams may change unexpectedly, and break your tests. This increases overall maintenance cost, but can highlight incompatible changes
  • It may be more difficult to make an end-to-end test fully hermetic; leftover test data may alter future tests and/or production systems. Where possible keep your test data ephemeral.
  • An end-to-end test often necessitates multiple test doubles (fakes or stubs) for underlying dependencies; they can, however, have a high maintenance burden as they drift from the real implementations over time.

The First Annual Testing on the Toilet Awards

By Andrew Trenk

The Testing on the Toilet (TotT) series was created in 2006 as a way to spread unit-testing knowledge across Google by posting flyers in bathroom stalls. It quickly became a part of Google culture and is still going strong today, with new episodes published every week and read in hundreds of bathrooms by thousands of engineers in Google offices across the world. Initially focused on content related to testing, TotT now covers a variety of technical topics, such as tips on writing cleaner code and ways to prevent security bugs.

While TotT episodes often have a big impact on many engineers across Google, until now we never did anything to formally thank authors for their contributions. To fix that, we decided to honor the most popular TotT episodes of 2014 by establishing the Testing on the Toilet Awards. The winners were chosen through a vote that was open to all Google engineers. The Google Testing Blog is proud to present the winners that were posted on this blog (there were two additional winners that weren’t posted on this blog since we only post testing-related TotT episodes).

And the winners are ...

Erik Kuefler: Test Behaviors, Not Methods and Don't Put Logic in Tests 
Alex Eagle: Change-Detector Tests Considered Harmful

The authors of these episodes received their very own Flushy trophy, which they can proudly display on their desks.



(The logo on the trophy is the same one we put on the printed version of each TotT episode, which you can see by looking for the “printer-friendly version” link in the TotT blog posts).

Congratulations to the winners!

Testing on the Toilet: Change-Detector Tests Considered Harmful

by Alex Eagle

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.


You have just finished refactoring some code without modifying its behavior. Then you run the tests before committing and… a bunch of unit tests are failing. While fixing the tests, you get a sense that you are wasting time by mechanically applying the same transformation to many tests. Maybe you introduced a parameter in a method, and now must update 100 callers of that method in tests to pass an empty string.

What does it look like to write tests mechanically? Here is an absurd but obvious way:
// Production code:
def abs(i: Int)
return (i < 0) ? i * -1 : i

// Test code:
for (line: String in File(prod_source).read_lines())
switch (line.number)
1: assert line.content equals def abs(i: Int)
2: assert line.content equals return (i < 0) ? i * -1 : i

That test is clearly not useful: it contains an exact copy of the code under test and acts like a checksum. A correct or incorrect program is equally likely to pass a test that is a derivative of the code under test. No one is really writing tests like that, but how different is it from this next example?
// Production code:
def process(w: Work)
firstPart.process(w)
secondPart.process(w)

// Test code:
part1 = mock(FirstPart)
part2 = mock(SecondPart)
w = Work()
Processor(part1, part2).process(w)
verify_in_order
was_called part1.process(w)
was_called part2.process(w)

It is tempting to write a test like this because it requires little thought and will run quickly. This is a change-detector test—it is a transformation of the same information in the code under test—and it breaks in response to any change to the production code, without verifying correct behavior of either the original or modified production code.

Change detectors provide negative value, since the tests do not catch any defects, and the added maintenance cost slows down development. These tests should be re-written or deleted.

Testing on the Toilet: Prefer Testing Public APIs Over Implementation-Detail Classes

by Andrew Trenk

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.


Does this class need to have tests?
class UserInfoValidator {
public void validate(UserInfo info) {
if (info.getDateOfBirth().isInFuture()) { throw new ValidationException()); }
}
}
Its method has some logic, so it may be good idea to test it. But what if its only user looks like this?
public class UserInfoService {
private UserInfoValidator validator;
public void save(UserInfo info) {
validator.validate(info); // Throw an exception if the value is invalid.
writeToDatabase(info);
}
}
The answer is: it probably doesn’t need tests, since all paths can be tested through UserInfoService. The key distinction is that the class is an implementation detail, not a public API.

A public API can be called by any number of users, who can pass in any possible combination of inputs to its methods. You want to make sure these are well-tested, which ensures users won’t see issues when they use the API. Examples of public APIs include classes that are used in a different part of a codebase (e.g., a server-side class that’s used by the client-side) and common utility classes that are used throughout a codebase.

An implementation-detail class exists only to support public APIs and is called by a very limited number of users (often only one). These classes can sometimes be tested indirectly by testing the public APIs that use them.

Testing implementation-detail classes is still useful in many cases, such as if the class is complex or if the tests would be difficult to write for the public API class. When you do test them, they often don’t need to be tested in as much depth as a public API, since some inputs may never be passed into their methods (in the above code sample, if UserInfoService ensured that UserInfo were never null, then it wouldn’t be useful to test what happens when null is passed as an argument to UserInfoValidator.validate, since it would never happen).

Implementation-detail classes can sometimes be thought of as private methods that happen to be in a separate class, since you typically don’t want to test private methods directly either. You should also try to restrict the visibility of implementation-detail classes, such as by making them package-private in Java.

Testing implementation-detail classes too often leads to a couple problems:

- Code is harder to maintain since you need to update tests more often, such as when changing a method signature of an implementation-detail class or even when doing a refactoring. If testing is done only through public APIs, these changes wouldn’t affect the tests at all.

- If you test a behavior only through an implementation-detail class, you may get false confidence in your code, since the same code path may not work properly when exercised through the public API. You also have to be more careful when refactoring, since it can be harder to ensure that all the behavior of the public API will be preserved if not all paths are tested through the public API.

Testing on the Toilet: Truth: a fluent assertion framework

by Dori Reuveni and Kurt Alfred Kluever

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.


As engineers, we spend most of our time reading existing code, rather than writing new code. Therefore, we must make sure we always write clean, readable code. The same goes for our tests; we need a way to clearly express our test assertions.

Truth is an open source, fluent testing framework for Java designed to make your test assertions and failure messages more readable. The fluent API makes reading (and writing) test assertions much more natural, prose-like, and discoverable in your IDE via autocomplete. For example, compare how the following assertion reads with JUnit vs. Truth:
assertEquals("March", monthMap.get(3));          // JUnit
assertThat(monthMap).containsEntry(3, "March"); // Truth
Both statements are asserting the same thing, but the assertion written with Truth can be easily read from left to right, while the JUnit example requires "mental backtracking".

Another benefit of Truth over JUnit is the addition of useful default failure messages. For example:
ImmutableSet<String> colors = ImmutableSet.of("red", "green", "blue", "yellow");
assertTrue(colors.contains("orange")); // JUnit
assertThat(colors).contains("orange"); // Truth
In this example, both assertions will fail, but JUnit will not provide a useful failure message. However, Truth will provide a clear and concise failure message:

AssertionError: <[red, green, blue, yellow]> should have contained <orange>

Truth already supports specialized assertions for most of the common JDK types (Objects, primitives, arrays, Strings, Classes, Comparables, Iterables, Collections, Lists, Sets, Maps, etc.), as well as some Guava types (Optionals). Additional support for other popular types is planned as well (Throwables, Iterators, Multimaps, UnsignedIntegers, UnsignedLongs, etc.).

Truth is also user-extensible: you can easily write a Truth subject to make fluent assertions about your own custom types. By creating your own custom subject, both your assertion API and your failure messages can be domain-specific.

Truth's goal is not to replace JUnit assertions, but to improve the readability of complex assertions and their failure messages. JUnit assertions and Truth assertions can (and often do) live side by side in tests.

To get started with Truth, check out http://google.github.io/truth/

Testing on the Toilet: Writing Descriptive Test Names

by Andrew Trenk

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

How long does it take you to figure out what behavior is being tested in the following code?

@Test public void isUserLockedOut_invalidLogin() {
authenticator.authenticate(username, invalidPassword);
assertFalse(authenticator.isUserLockedOut(username));

authenticator.authenticate(username, invalidPassword);
assertFalse(authenticator.isUserLockedOut(username));

authenticator.authenticate(username, invalidPassword);
assertTrue(authenticator.isUserLockedOut(username));
}

You probably had to read through every line of code (maybe more than once) and understand what each line is doing. But how long would it take you to figure out what behavior is being tested if the test had this name?

isUserLockedOut_lockOutUserAfterThreeInvalidLoginAttempts

You should now be able to understand what behavior is being tested by reading just the test name, and you don’t even need to read through the test body. The test name in the above code sample hints at the scenario being tested (“invalidLogin”), but it doesn’t actually say what the expected outcome is supposed to be, so you had to read through the code to figure it out.

Putting both the scenario and the expected outcome in the test name has several other benefits:

- If you want to know all the possible behaviors a class has, all you need to do is read through the test names in its test class, compared to spending minutes or hours digging through the test code or even the class itself trying to figure out its behavior. This can also be useful during code reviews since you can quickly tell if the tests cover all expected cases.

- By giving tests more explicit names, it forces you to split up testing different behaviors into separate tests. Otherwise you may be tempted to dump assertions for different behaviors into one test, which over time can lead to tests that keep growing and become difficult to understand and maintain.

- The exact behavior being tested might not always be clear from the test code. If the test name isn’t explicit about this, sometimes you might have to guess what the test is actually testing.

- You can easily tell if some functionality isn’t being tested. If you don’t see a test name that describes the behavior you’re looking for, then you know the test doesn’t exist.

- When a test fails, you can immediately see what functionality is broken without looking at the test’s source code.

There are several common patterns for structuring the name of a test (one example is to name tests like an English sentence with “should” in the name, e.g., shouldLockOutUserAfterThreeInvalidLoginAttempts). Whichever pattern you use, the same advice still applies: Make sure test names contain both the scenario being tested and the expected outcome.

Sometimes just specifying the name of the method under test may be enough, especially if the method is simple and has only a single behavior that is obvious from its name.

Testing on the Toilet: Web Testing Made Easier: Debug IDs

by Ruslan Khamitov 

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

Adding ID attributes to elements can make it much easier to write tests that interact with the DOM (e.g., WebDriver tests). Consider the following DOM with two buttons that differ only by inner text:
Save buttonEdit button
<div class="button">Save</div>
<div class="button">Edit</div>

How would you tell WebDriver to interact with the “Save” button in this case? You have several options. One option is to interact with the button using a CSS selector:
div.button

However, this approach is not sufficient to identify a particular button, and there is no mechanism to filter by text in CSS. Another option would be to write an XPath, which is generally fragile and discouraged:
//div[@class='button' and text()='Save']

Your best option is to add unique hierarchical IDs where each widget is passed a base ID that it prepends to the ID of each of its children. The IDs for each button will be:
contact-form.save-button
contact-form.edit-button

In GWT you can accomplish this by overriding onEnsureDebugId()on your widgets. Doing so allows you to create custom logic for applying debug IDs to the sub-elements that make up a custom widget:
@Override protected void onEnsureDebugId(String baseId) {
super.onEnsureDebugId(baseId);
saveButton.ensureDebugId(baseId + ".save-button");
editButton.ensureDebugId(baseId + ".edit-button");
}

Consider another example. Let’s set IDs for repeated UI elements in Angular using ng-repeat. Setting an index can help differentiate between repeated instances of each element:
<tr id="feedback-{{$index}}" class="feedback" ng-repeat="feedback in ctrl.feedbacks" >

In GWT you can do this with ensureDebugId(). Let’s set an ID for each of the table cells:
@UiField FlexTable table;
UIObject.ensureDebugId(table.getCellFormatter().getElement(rowIndex, columnIndex),
baseID + colIndex + "-" + rowIndex);

Take-away: Debug IDs are easy to set and make a huge difference for testing. Please add them early.

Testing on the Toilet: Don’t Put Logic in Tests

by Erik Kuefler

This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.

Programming languages give us a lot of expressive power. Concepts like operators and conditionals are important tools that allow us to write programs that handle a wide range of inputs. But this flexibility comes at the cost of increased complexity, which makes our programs harder to understand.

Unlike production code, simplicity is more important than flexibility in tests. Most unit tests verify that a single, known input produces a single, known output. Tests can avoid complexity by stating their inputs and outputs directly rather than computing them. Otherwise it's easy for tests to develop their own bugs.

Let's take a look at a simple example. Does this test look correct to you?

@Test public void shouldNavigateToPhotosPage() {
String baseUrl = "http://plus.google.com/";
Navigator nav = new Navigator(baseUrl);
nav.goToPhotosPage();
assertEquals(baseUrl + "/u/0/photos", nav.getCurrentUrl());
}

The author is trying to avoid duplication by storing a shared prefix in a variable. Performing a single string concatenation doesn't seem too bad, but what happens if we simplify the test by inlining the variable?

@Test public void shouldNavigateToPhotosPage() {
Navigator nav = new Navigator("http://plus.google.com/");
nav.goToPhotosPage();
assertEquals("http://plus.google.com//u/0/photos", nav.getCurrentUrl()); // Oops!
}

After eliminating the unnecessary computation from the test, the bug is obvious—we're expecting two slashes in the URL! This test will either fail or (even worse) incorrectly pass if the production code has the same bug. We never would have written this if we stated our inputs and outputs directly instead of trying to compute them. And this is a very simple example—when a test adds more operators or includes loops and conditionals, it becomes increasingly difficult to be confident that it is correct.

Another way of saying this is that, whereas production code describes a general strategy for computing outputs given inputs, tests are concrete examples of input/output pairs (where output might include side effects like verifying interactions with other classes). It's usually easy to tell whether an input/output pair is correct or not, even if the logic required to compute it is very complex. For instance, it's hard to picture the exact DOM that would be created by a Javascript function for a given server response. So the ideal test for such a function would just compare against a string containing the expected output HTML.

When tests do need their own logic, such logic should often be moved out of the test bodies and into utilities and helper functions. Since such helpers can get quite complex, it's usually a good idea for any nontrivial test utility to have its own tests.