Skip to content

Writing tests

Vladimir Turov edited this page Mar 15, 2022 · 5 revisions

Table of contents

  1. General testing process
  2. Essential classes
    1. StageTest
    2. CheckResult
    3. WrongAnswer, TestPassed
    4. DynamicTest
      1. Declaration
      2. Ordering
      3. Time limit
      4. Data parameterization
      5. Repeating
      6. Files
    5. TestedProgram
      1. Initializing
      2. Starting tested program
      3. Executing tested program
      4. Working with background programs
      5. Working with output more effectively
  3. Complex example
  4. Dynamic input

General testing process

You can see the abstract testing algorithm in the code block below.

methods = get_dynamic_methods()
for method in methods:
    result = invoke(method)
    if not result.correct:
        break

As you can see, there are some methods that are invoked and return a result. The first failed test stops the whole testing. Let's see how to implement these methods.

Essential classes

You can import every mentioned class from the hstest module.
For exampme from hstest import StageTest, TestedProgram

StageTest

Every hs-test test should one way or another extend StageTest class (or extend another class that extends this one).

Example:

from hstest import StageTest

class TicTacToeTest(StageTest): pass

CheckResult

CheckResult object represents the result of the test. In case of a failed test, a special message will be shown to the user that should be specified in tests. This message is called feedback.

If a user's program failed the test feedback should be shown to the user to help to resolve their problem or to find a mistake. Please, do not write feedbacks like Wrong answer or Try again - such feedbacks are useless for the user. For example, instead of Your answer is wrong you can help the user by providing clearer feedback: The number of people is wrong, should be "45", but your program printed "44". See the difference: in this case, it becomes obvious for the user that they may be somehow missed one iteration of the loop. Remember that Hyperskill is a learning platform and feedbacks of tests must help the user to spot the mistake. This task becomes quite difficult because tests know nothing about the user's program, only it's output for the provided input, so that's why tests should not compare two huge strings (outputted and correct ones) blindly to check the user's output: you cannot provide useful feedback by doing that. In tests, it's better to check the user's program output little by little to spot every mistake that could happen and provide specialized feedback for the specific error.

Examples:

Via constructor (feedback is required):

from hstest import CheckResult
...
return CheckResult(count == 42, "Count should be 42, found " + count)

Via static methods:

return CheckResult.wrong("Count should be 42, found " + count)
return CheckResult.correct()

Via imported functions:

from hstest import wrong, correct
...
return wrong("Count should be 42, found " + count)
return correct()

WrongAnswer, TestPassed

These exceptions are very useful if you want to finish testing being deep into several methods. It can be useful if you are under a lot of methods and don't want to continue checking everything else. For example, by parsing the user's output which is obviously should be done in a separate method because it should be called from different tests.

Throwing WrongAnswer exception will be treated like returning CheckResult.wrong(...) object. It can be used often. It is recommended to use this class everywhere instead of CheckResult.wrong(...)

Throwing TestPassed exception will be treated like returning CheckResult.correct() object. It is used rarely.

Examples:

from hstest import WrongAnswer, TestPassed
...
raise WrongAnswer("Count should be 42, found " + count)
raise TestPassed()

DynamicTest

Declaration

The most important class for you is actually a decorator @dynamic_test - it marks the method as a test for the hs-test library. This method should return CheckResult object.

All these marked methods should be located inside a class that extends StageTest.

Examples:

Method declaration:

class Test(StageTest):
    @dynamic_test
    def test(self):
        return CheckResult.correct()

Ordering

All tests are ordered by their location in the class. The methods that are written before the other tests in the source code will be executed before them.

You can change the ordering by providing order integer argument into a @dynamic_test decorator. The default value is 0, you can set it to be lower or higher. Tests with the lower order value will be tested before ones with the higher order value. If two tests have the same order, they will be ordered by their location in the source code.

Examples:

@dynamic_test(order=-5)
@dynamic_test(order=3)

Time limit

You can set a time limit for the test by providing time_limit integer argument into a @dynamic_test decorator. The value is expected to be in milliseconds. The default value is 15000 (exactly 15 seconds). You can set it to 0 or negative values to disable the time limit completely.

Examples:

@dynamic_test(time_limit=10000)
@dynamic_test(time_limit=0)

Data parameterization

You can parameterize your tests by providing data argument into a @dynamic_test decorator. The value is expected to be a list of data every item of which will be passed to the test.

Examples:

The test test will be run 5 times, for every value of the array

test_data = [1, 2, 3, 4, 5]

@dynamic_test(data=test_data)
def test(self, x):
    ...

More formal way of writing the same test

test_data = [
    [1], [2], [3], [4], [5]
]

@dynamic_test(data=test_data)
def test(self, x):
    ...

Passing 2 arguments

test_data = [
    [1, "Hello"],
    [2, "World"],
    [3, "!"],
]

@dynamic_test(data=test_data)
def test(self, x, message):
    ...

Passing an array

test_data = [
    [[1, 2, 3]],
    [[2, 3, 4]],
    [[3, 4, 5]],
]

@dynamic_test(data=test_data)
def test(self, arr):
    ...

Using a function call to pass data

def generate_data():
    return [
        [1, "Hello"],
        [2, "World"],
        [3, "!"],
    ]


class TestParametrizedData(StageTest):
    @dynamic_test(data=generate_data())
    def test(self, a, b):
        ...

Repeating

You can run the same test multiple times by providing repeat integer argument into a @dynamic_test decorator. By default, this value is equal to 1. If the data argument is present, the test will be repeated for every parametrization value. You can disable the test by setting this argument to 0.

Examples:

Repeat 5 times:

@dynamic_test(repeat=5)

Disable the test

@dynamic_test(repeat=0)

Generate 15 tests, repeat 5 times for every data value

test_data = [
    [1, "Hello"],
    [2, "World"],
    [3, "!"],
]

@dynamic_test(repeat=5, data=test_data)
def test(self, x, message):
    ...

Files

You can also create external files for testing. Before starting the test case, these files are created on the file system so that the student's program can read them. After the test finishes, these will be deleted from the file system.

The value is expected to be a name of a dict which contains str filenames as keys and str contents of these files as values.

An example of how to set up external files for test cases can be seen below:

files_dict = {
    "file1.txt": "content of file1.txt",
    "file2.txt": "content of file2.txt"
}

@dynamic_test(files=files_dict)
def test(self):
    ...

Using a function call to pass files:

def get_files():
    return {
        "file1.txt": "content of file1.txt",
        "file2.txt": "content of file2.txt"
    }

@dynamic_test(files=get_files())
def test(self):
    ...

TestedProgram

Initializing

Class TestedProgram is used to store the user's program and execute it. You should create instances of this class only inside a test. You don't need to provide any link to the user program's main file: it will be done automatically. The hs-test library will search for the file across all the files the user has written. The rules are the following:

  1. If there is a single file, it will be executed
  2. If there are multiple files, the library will chose files that aren't imported by the other user's files. Usually, there is a single such file.
  3. In case there are multiple such files, the library will chose ones that contain __name__ and __main__ suggesting that they might be the ones to run.
  4. In case there are still multiple such files the testing will fail and suggst the user to have a single file that contains __name__ and __main__ so it will be run by the test.
@dynamic_test
def test(self):
    pr = TestedProgram()
    ...

You can also pass a string to the constructor representing a module where to search for the main file. The search is done recursively, considering inner modules also.

@dynamic_test
def test(self):
    server = TestedProgram("chat.server")
    ...

Starting tested program

Creating TestedProgram instance the user's program won't be started.

To start tested program, you need to invoke either .start(...) or .start_in_background(...) methods. The first one is used almost all the time and the second one is used rarely. You can pass command-line arguments to the tested program as the parameters of these methods. The execution of the test will pause at this point and user's program will be launched. It will be run until it requests input. After that, it will be paused and the execution of the test resumes. The .start(...) method will return the output user's program printed before requesting an input.

Example:

User's program:

import sys

print(sys.argv[1] + " " + sys.argv[2])
line = input()
...

Test

@dynamic_test
def test(self):
    pr = TestedProgram()
    output = pr.start("Hello", "World!")
    ...

Let's discuss the code above step by step.

  1. The TestedProgram() part will search the user's program for the main file and that's it.
  2. The pr.start("Hello", "World!") will start the user's program with three command-line arguments: "path/to/user/program.py", "Hello" and "World!". Then the execution of the test pauses. Notice, that by the Python specification the first command-line argument is always a path to the user's program.
  3. The user's program starts to execute. The print(sys.argv[1] + " " + sys.argv[2]) will print to the sys.stdout the text "Hello World!\n". This text will be saved by the hs-test.
  4. The input() will request an input that is not determined at the moment, so the execution of the user's program pauses and the execution of the test resumes.
  5. The pr.start("Hello", "World!") will return output that was collected while the program was running. So the output variable will be equal to "Hello World!\n".

Example 2: Test

@dynamic_test
def test(self):
    pr = TestedProgram()
    pr.start_in_background("Hello", "World!")
    ...

The only difference with the previous example is that we use .start_in_background(...) method instead of .start(...). The execution differences are:

  1. It returns to the test execution immediately and the tested program will be executed in parallel (note that in the previous example everything is executed consecutively).
  2. The method will return None.

We consider ways to extract output from the program that runs in the background later.

Executing tested program

After returning from the .start(...) method you need to generate input for the user's program and pass it to the program using the .execute(...) method. As the .start(...) method, the .execute(...) method will pause test execution and continue the user's program until it runs out of input. After that, the user's program will be paused and the execution of the test resumes. The .execute(...) method will return output that the user's program was printed in this period.

Additionally, .start(...) and .execute(...) methods will return output not only in case the user's program requests additional input, but also in case of finishing the user's program. If case of additional .execute(...) call on this tested program the execution will immediately fail and the user will see feedback Program is finished too early. Additional input request was expected.

Use .is_finished() method to check if the program is finished or not. You can also call .stop() to terminate the user's program but it's not required to do so: every TestedProgram instance that was started will be stopped anyway after the end of the test.

Example:

User's program

import sys

print(sys.argv[1] + " " + sys.argv[2])
line = input()

if line == "42":
    print("Life")
else:
    print("Not life")

line = input()
...

Test

@dynamic_test
def test(self):
    pr = TestedProgram()
    output = pr.start("Hello", "World!")
    
    if "Hello" not in output or "World!" not in output:
        raise WrongAnswer("Your output should contain " + 
            "\"Hello\" and \"World!\", but it isn't")

    output = pr.execute("42").lower().strip()

    if output != "life":
        raise WrongAnswer("Your output should be equal to " + 
            "\"Life\" after inputting \"42\"")

    output = pr.execute("Line1\nLine2\nline3\n")
    ...

Let's discuss the code above step by step following step #5 from the example in the previous section.

  1. Between invocations of .start(...) and .execute(...) or between two invocations of .execute(...) methods you need to check the user's output. It is very convenient to check these partial outputs compared to checking the whole output at once. This way it is possible to check different aspects of the user's output and generate feedbacks that are useful for the user. Thus, in the test we check that the user's output contains "Hello" and "World!" (for example, it is required in the stage description).
  2. If the check fails, the raise WrongAnswer(...) will throw an exception and stops the test. After that, the hs-test library will stop the user's program automatically.
  3. The pr.execute("42") will pass the string "42" as the input to the user's program and waits till the user's program consumes all of it and requests additional input.
  4. Then the user's program resumes and outputs "Life". After that the program requests additional input which is undetermined. Thus, the user's program pauses and the execution of the test resumes.
  5. The pr.execute("42") returns "Life\n" and processed to be "life" using .lower().strip() methods. Actually, it is a good idea to process lowercased and stripped output strings because more formally correct solutions will pass these tests. Tests should not check the output too strictly and should allow some degree of freedom especially in parts where it's not so important.
  6. The pr.execute("Line1\nLine2\nline3\n") passes 3 lines to the user's program. And returns only when the user's program consumes all 3 lines (or exits).
  7. The further checking continues the same way.

Working with background programs

You can create a background program by launching it using the method .start_in_background(...). But actually, you can move the tested program to the background by calling the method .go_background(). Upon calling this method:

  1. It immediately returns to the test method.
  2. The user's program continues its execution in parallel. But it will still wait for the input since it was waiting before this call. The difference is that after the following .execute(...) call it won't wait for the user's program and returns immediately.

The other method does the opposite: it's called .stop_background(). Upon calling this method:

  1. It immediately returns if the program not in the background.
  2. Otherwise it waits for the first input request from the user's program and then returns like it's not in the background anymore. And it really is not.

Use .is_in_background() to check if the program is executing in the background or not.

If the tested program executes in the background, .start_in_background(...) call and any .execute(...) call will return immediately and return an empty string. The only way to retrieve the user's program output is to call .get_output() method which will return the output that was collected since the previous .get_output() call.

You can use .get_output() call if the program not in the background also. If will return output that was printed since the last .execute(...) call. It might not be empty because the user's program can print some text in a separate thread. It will always be empty in case the user's program is single-threaded.

If the user's program is running in the background and not waiting for the input, any .execute(...) call will fail the testing and hs-test library will blame the test author for such an outcome. To be sure that the user's program waits for the output use the .is_waiting_input() method. Alternatively, you can use .stop_background() and .go_background() calls consecutively.

Working with output more effectively

Sometimes, you need to check output once after several .execute(...) calls. The only way to achieve that is to collect output after every .execute(...) call and it can be inconvenient and error-prone. It could be done in the following way:

output = pr.execute(...)
outout += pr.execute(...)
outout += pr.execute(...)
...
if ?:
    output += pr.get_outout()
self.check(output)

Also, sometimes the tested program uses background threads to print some information and it just so happens that input request was performed just before such output, so .exexute(...) call won't return such line but tests expect it to do so. Sometimes this output will be returned inside the .execute(...) statement, sometimes you should call .get_output() to get output that was printed after the last .execute(...) call.

The hs-test library guarantees that the sum of all .start(...), .execute(...), .get_output() calls will result in a single output with no duplicate lines or characters, so one outputted character must appear only once among these calls. But as was demonstrated above, sometimes it is really hard to determine which call contains a result you need to check.

Upon calling .set_return_output_after_execution(false) method:

  1. Any .start(...) and .execute(...) calls will return an empty string (but they will still wait for the input request if they are not in the background mode).
  2. The only way to retrieve the user's program output is to call .get_output(). This call will return all the output since the last .get_output() call.

This way the hs-test library ensures that the sum of all .start(...), .execute(...), .get_output() calls will result in a single output but now you can be sure that the output you need to check is inside a certain .get_output() call.

You can see the updated code below:

pr.set_return_output_after_execution(false)
...
pr.execute(...)
pr.execute(...)
pr.execute(...)
...
self.check(pr.get_output())

Complex example

The user's program:

1   # some code that print something
2   
3   num = int(input())
4   
5   # some code that process input and print something
6   
7   line = input()
8   
9   # some code that process input and print something
10  # ...

The tests (see comments for explanations):

from hstest import *

class TestUserProgram(StageTest):
    @dynamic_test
    def test(self):
        main = TestedProgram()
        
        # You can pass command-line args here as parameters
        # This output is from the start of the program
        # execution till the first input (lines 1, 2)
        output = main.start().lower()

        if "hello" not in output: {
            # You can return CheckResult object and
            # the execution will stop immediately
            return CheckResult.wrong(
                "Your program should greet the user " +
                    "with the word \"Hello\"")

        if "input" not in output:
            # You can throw WrongAnswer error here also,
            # like in the 'check' method
            raise WrongAnswer(
                "Your program should ask the user " +
                    "to print something (output the word \"input\")")

        # If you want to continue to execute the user's program
        # you should invoke 'execute' method and pass input as a string.
        # You can pass multiple lines, in this case
        # execution of this method continues only when
        # all the lines will be consumed by the user's program
        # and additional input will be requested.
        output2 = main.execute("42")

        # This output is from the first input input
        # till the second input (lines 3, 4, 5, 6)
        output2 = output2.toLowerCase()

        if "42" not in output2:
            # You may also want to stop execution but indicate
            # that the user's program is correct on this test
            return CheckResult.correct()

        if "4+2" in output2:
            # You can throw TestPassed error here also,
            # like in the 'check' method to indicate
            # that the user's program is correct on this test
            raise TestPassed()

        output3 = main.execute("line1\nline2")

        # Now you can test all the output here and not in 'check' method
        # usually, checking all the output parts is enough, but to fully
        # simulate previous example you can write the following code:
        reply = output + output2 + output3
        if len(reply.strip().split("\n")) != 4:
            return CheckResult.wrong("Your program should print exactly 4 lines")

        if False:
            # You can force to additionally invoke 'check' method
            # and test all the output there by returning None in this method
            # but usually, it's not needed at all since
            # you can fully check all the output parts in this method
            return None

        # You can check whether the main method is finished or not
        # Don't worry if you are feel you need to call this method
        # It's automatically checked in every "execute" call 
        # and also after finishing this dynamic method
        if not main.is_finished():
            return CheckResult.wrong(
                "Your program should not request more, than 3 lines.")

        return CheckResult.correct()

    # Other examples:

    @dynamic_test
    def test2(self):
        # you can use multiple methods marked with this decorator
        return CheckResult.correct()

    @dynamic_test
    def test3(self):
        main = TestedProgram()
        main.start()

        # a big advantage of this approach that you can use 
        # loops with an undetermined number of iterations.
        tries = 0
        while True:
            # test tries to guess a number from 0 to 9
            output = main.execute(str(tries));
            if "you guessed" in output:
                break
            tries += 1

            # but keep in mind that the user can mess up
            # and stuck in a guessing loop infinitely
            if tries == 10:
                raise WrongAnswer(
                    "Test tried all the possible numbers but " +
                    "didn't guess the number")

        # another advantage is that you can use different number
        # of executions in different branches (this is impossible 
        # to simulate using dynamic input functions)
        if True:
            main.execute("")
            main.execute("")
        else:
            main.execute("")

        return CheckResult.correct()

Dynamic input

Dynamic input is an older approach to write such tests with dynamically generated input. Note, that approach described above is called Dynamic method, you should use it to write tests.

Dynamic input is now deprecated. So, don't write the tests using the methods described below, use this information just to understand the tests that were written in such a way.

Dynamic input is based on a set of dynamic functions that are called in the process of the user's program execution. When the user's program asks for input and there is no input left, the next dynamic function is called to which the output the program has printed from the previous dynamic function call is passed.

To add a dynamic function to the set in Java you need to call .addInput(..) which takes a function that takes a String (it's user's program output so far printed) and in Python, you need to pass a list of dynamic functions as the stdin parameter.

A dynamic function may:

  • Return a String, which is considered to be the next input to be passed to the user's program.
  • Return a CheckResult object. In this case, the execution of the user's program immediately stops, and the result of this test is determined by this object.
  • Throw a TestPassed or a WrongAnswer exception. In this case, the execution of the user's program also immediately stops, and the result of this test is determined by this exception.

By default, dynamic functions can be executed only once, but if you want them to be triggered multiple times, you can pass a number along with the function. In case you pass a negative number, the function will be executed infinitely. Instead of a function, you can also pass a String, it's a shortcut to a function that always returns this string as the next input.

Examples of defining dynamic functions

TestCase(
    stdin=[
        "input1",                          # Just returns "input1" as the next input once
        (12, "func2"),                     # Just returns "input2" as the next input 12 times
        lambda output: \                   # Returns "input3" in case user's program printed "hello" before requesting this input
            CheckResult.wrong("Greet the user!") \
            if "hello" not in output.lower().strip() \
            else "input3",
        (5, lambda out: out + "input4"),  # Returns 'out + "input4"' 5 times
        str.lower,                        # Returns str.lower(out) once (same as out.lower())
        (-1, self.test_func)              # Returns self.testFunc(out) infinitely (the following dynamic functions will never be executed)
    ]
)

Let's see a more complex example:

User's program in Python:

1   # some code that print something
2   
3   num = int(input())
4   
5   # some code that process input and print something
6   
7   line = input()
8   
9   # some code that process input and print something
10  # ...

Test in Python (see comments for explanations):

from hstest.check_result import CheckResult
from hstest.stage_test import StageTest
from hstest.test_case import TestCase


class TestUserProgram(StageTest):

    def generate(self) -> List[TestCase]:
        return [
            TestCase(stdin=[self.func1, self.func2])
        ]

    def func1(self, output):
        # This output is from the start of the program
        # execution till the first input (lines 1, 2)
        output = output.lower()

        if "hello" not in output:
            # You can return CheckResult object and
            # the execution will stop immediately
            return CheckResult.wrong(
                "Your program should greet the user " +
                "with the word \"Hello\"")
        
        if "input" not in output:
            # You can raise WrongAnswer exception here also,
            # like in the 'check' method 
            raise WrongAnswer(
                 "Your program should ask the user " +
                 "to print something (output the word \"input\")")

        # If you want to continue to execute the user's program
        # you should return an input as a string.
        # You can return multiple lines, in this case
        # the next dynamic input function will be invoked only when
        # all the lines will be consumed by the user's program
        # and additional input will be requested.
        return "42"

    def func2(self, output):
        # This output is from the first input input
        # till the second input (lines 3, 4, 5, 6)
        output = output.lower()

        if '42' in output:
            # You may also want to stop execution but indicate
            # that the user's program is correct on this test
            return CheckResult.correct()

        if '4+2' in output:
            # You can raise TestPassed exception here also
            # like in the 'check' method to indicate
            # that the user's program is correct on this test
            raise TestPassed()

        return "line1\nline2"

    def check(self, reply: str, attach: Any) -> CheckResult:
        # If no CheckResult object was returned or no exception was raised
        # from every invoked dynamic input function then 'check'
        # method is still invoked. "reply" variable contains all the output
        # the user's program printed.

        if len(reply.strip().split()) != 4:
            return CheckResult.wrong(
                "Your program should print exactly 4 lines")

        return CheckResult.correct()

if __name__ == '__main__':
    TestUserProgram().run_tests()