I had a little look into testing. I think there are two basic approaches
XSB A test is a Prolog file with some main goal. The test driver (a shell script) runs Prolog, loads the file and runs the main goal. That prints output that is compared with the correct output.
The others have some Prolog way to specify the code to run and describe how it should perform. SICStus and SWI-Prolog use SWI-Prolog’s PlUnit (although the driver in SWI-Prolog has been mostly rewritten recently). Logtalk’s lgtunit is inspired by this as well. Ciao fits tests in its assertion language. That looks very different, but the functionality seems comparable. ECLiPSe has .tst files that contain goals and their expected output.
I dislike (sorry) XSB’s solution for several reasons. Two stand out: textual equality testing of Prolog terms is flaky and the framework depends on Unix tools.
I think we roughly have three options
Adopt one of the existing frameworks and let each Prolog implement their own test driver.
Define a new one and again, let each Prolog implement their own test driver.
Adopt Logtalk for running PIP tests.
The main advantage of the first two approaches is that the PIP tests can run natively in the test infrastructure of each system. The main advantage of using Logtalk is that it provides a portable solution to both the tests and support predicates you may need to run the tests.
FYI. SWI-Prolog has tests from ECLiPSe (strings) and XSB (tabling). The ECLiPSe tests use a modified version of @jschimpf’s test_util_iso.pl. The XSB tests basically reimplement the XSB test driver logic in Prolog and uses rewrite tricks to turn each .P test file into a SWI-Prolog unit test. That was a lot of work, but then there were a lot of tests to be reused
Trying to reply and the software forum prevents me of writing posts with more than two links, tells me that I cannot post links to my website, and that my posts are classified as spam. Can someone fix the forum settings? Thanks.
Hi @JanWielemaker, I think that to avoid a chicken-and-egg situation PIPs should not impose any testing framework. My vote goes for:
Let the PIP authors decide what is the most effective way to specify tests and specifications (natural language, pseudo-code, reference implementations, etc.)
It could be recommended (but not mandatory) to write a reference (executable) test suite, that will help implementation and checking for conformance.
Allow any existing test framework (as soon as the semantics are clear)
A unified test framework or test framework compatibility would be another PIP itself.
Maybe supplementary PIP material like reference implementations or can be added as PIP revisions.
First of all, kudos to Jan – I know he did some hard work to translate a large portion of the XSB tests into his own form. I don’t fully agree with his opinon, but his opinion not based on ignorance.
I do agree with Jan about the difficulty of running the XSB test suite on non-Unix systems (Windows). David, who uses Windows, always has Cygwin installed so that he can run the XSB test suite.
That said, I think the XSB test framework has some nice points.
First, it is lightweight in terms of system state. Many tests change the system state by modifying dynamic code, creating or abolishing tables and so on. The XSB framework allows one to run as many tests in a test file as needed, but to obtain a fresh state by making a new test file.
This often makes it easy to pinpoint system issues. If we implement some shaky new feature that causes unexpected problems like a core dump, only a few tests are affected.
As a result the framework has proved useful to us. There are about 900 test files, and most of the test files perform multiple tests – so likely several thousand dests. As an aside the Ergo test suite has about 600 test files, again usually with several tests per file.
Also the XSB test framework should be simple for other Prologs to run: only a couple of environment variables need to be changed. bling
But maybe I’m not understanding some of the difficulties that Jan faced when he ported XSB’s tabling tests. I think we should discuss testing in one of our Thursday meetings (after we’ve reviewed Joachim’s PIP proposal). My current opinion is that if we have multiple test frameworks that are easily portable to multiple systems, we should allow their use.
Needless to say, lgtunit doesn’t require any porting and work as-is with all systems participating here ( and several more). The Prolog standards compliance suite flags multiple bugs in most of these systems. It also includes test sets for Prolog features such as Unicode (wip) and unbound integer arithmetic. Happy to add tests for other features such as tabling. All together, the current Logtalk distribution includes more than 10k tests. Automation support is provided for both POSIX and Windows systems. Report generation is included.
I don’t see a strong requirement to select a single testing framework. I will be however happy if people here actually run those tests and start fixing the exposed bugs, notably those in core features. That would be meaningful progress. Is really simple. On POSIX:
$ cd logtalk
$ logtalk_tester -p eclipse (or ciao, or xsb, or swi, or...)
We’d also like to run logtalk_tester regularly (or other test suites easy to port) for checking compliance too. This will surely catch more errors, wrongly encoded test cases, etc.
Incomplete implementation of the format/2-3 predicates account for 282 test failures (tests are repeated with chars, codes, and atom). There are also some tests that are not consensual (with different Prolog implementers complaining of different ones…). But there also exposed bugs in core built-in predicates such as unify_with_occurs_check/2 and missing de facto standard arithmetic functions (e.g. hyperbolic functions).
My general advice is to fix first the test failures that you recognize as bugs and leave discussions on tests you disagree on to later.