Some thoughts about a test framework for PIPS

I had a little look into testing. I think there are two basic approaches

  • XSB
    A test is a Prolog file with some main goal. The test driver (a shell script) runs Prolog, loads the file and runs the main goal. That prints output that is compared with the correct output.
  • The others have some Prolog way to specify the code to run and describe how it should perform. SICStus and SWI-Prolog use SWI-Prolog’s PlUnit (although the driver in SWI-Prolog has been mostly rewritten recently). Logtalk’s lgtunit is inspired by this as well. Ciao fits tests in its assertion language. That looks very different, but the functionality seems comparable. ECLiPSe has .tst files that contain goals and their expected output.

I dislike (sorry) XSB’s solution for several reasons. Two stand out: textual equality testing of Prolog terms is flaky and the framework depends on Unix tools.

I think we roughly have three options

  • Adopt one of the existing frameworks and let each Prolog implement their own test driver.
  • Define a new one and again, let each Prolog implement their own test driver.
  • Adopt Logtalk for running PIP tests.

The main advantage of the first two approaches is that the PIP tests can run natively in the test infrastructure of each system. The main advantage of using Logtalk is that it provides a portable solution to both the tests and support predicates you may need to run the tests.

FYI. SWI-Prolog has tests from ECLiPSe (strings) and XSB (tabling). The ECLiPSe tests use a modified version of @jschimpf’s test_util_iso.pl. The XSB tests basically reimplement the XSB test driver logic in Prolog and uses rewrite tricks to turn each .P test file into a SWI-Prolog unit test. That was a lot of work, but then there were a lot of tests to be reused :slight_smile:

For reference on lgtunit:

Summary: Tools - Logtalk
Documentation: lgtunit — The Logtalk Handbook v3.85.0 documentation
Testing automation script: logtalk_tester man page
Test reports automation script: logtalk_allure_report man page

As lgtunit tool supports multiple test dialects, running existing tests without modifying them should be possible. See a simple example at:

There’s also mature support for property-based testing. See the blog post in the “testing” category.

Trying to reply and the software forum prevents me of writing posts with more than two links, tells me that I cannot post links to my website, and that my posts are classified as spam. Can someone fix the forum settings? Thanks.

Hi @JanWielemaker, I think that to avoid a chicken-and-egg situation PIPs should not impose any testing framework. My vote goes for:

  • Let the PIP authors decide what is the most effective way to specify tests and specifications (natural language, pseudo-code, reference implementations, etc.)
  • It could be recommended (but not mandatory) to write a reference (executable) test suite, that will help implementation and checking for conformance.
  • Allow any existing test framework (as soon as the semantics are clear)

A unified test framework or test framework compatibility would be another PIP itself.
Maybe supplementary PIP material like reference implementations or can be added as PIP revisions.

First of all, kudos to Jan – I know he did some hard work to translate a large portion of the XSB tests into his own form. I don’t fully agree with his opinon, but his opinion not based on ignorance.

I do agree with Jan about the difficulty of running the XSB test suite on non-Unix systems (Windows). David, who uses Windows, always has Cygwin installed so that he can run the XSB test suite.

That said, I think the XSB test framework has some nice points.

First, it is lightweight in terms of system state. Many tests change the system state by modifying dynamic code, creating or abolishing tables and so on. The XSB framework allows one to run as many tests in a test file as needed, but to obtain a fresh state by making a new test file.

This often makes it easy to pinpoint system issues. If we implement some shaky new feature that causes unexpected problems like a core dump, only a few tests are affected.

As a result the framework has proved useful to us. There are about 900 test files, and most of the test files perform multiple tests – so likely several thousand dests. As an aside the Ergo test suite has about 600 test files, again usually with several tests per file.

Also the XSB test framework should be simple for other Prologs to run: only a couple of environment variables need to be changed. bling

But maybe I’m not understanding some of the difficulties that Jan faced when he ported XSB’s tabling tests. I think we should discuss testing in one of our Thursday meetings (after we’ve reviewed Joachim’s PIP proposal). My current opinion is that if we have multiple test frameworks that are easily portable to multiple systems, we should allow their use.

As mentioned by @JanWielemaker, there is a simple pure ISO test suite (harness and 950 tests) I did in 2013.

The test harness is just 200 lines of plain ISO Prolog. Test patterns are quick to write and look like

functor(foo(a), fo, 1)	    should_fail.
functor(foo(a, b, c), X, Y)	should_give X==foo, Y==3.
functor(1, X, Y)		    should_give X==1, Y==0.
functor([_|_], '.', 2)	    should_give true.
functor(X, Y, 3)		    should_throw error(instantiation_error, _).
functor(X, foo, a)		    should_throw error(type_error(integer, a), _).

Output is simply

...
----- Finished tests from file iso.tst
953 tests found.
895 tests succeeded.
51 tests failed.
7 tests skipped.

Needless to say, lgtunit doesn’t require any porting and work as-is with all systems participating here ( and several more). The Prolog standards compliance suite flags multiple bugs in most of these systems. It also includes test sets for Prolog features such as Unicode (wip) and unbound integer arithmetic. Happy to add tests for other features such as tabling. All together, the current Logtalk distribution includes more than 10k tests. Automation support is provided for both POSIX and Windows systems. Report generation is included.

I don’t see a strong requirement to select a single testing framework. I will be however happy if people here actually run those tests and start fixing the exposed bugs, notably those in core features. That would be meaningful progress. Is really simple. On POSIX:

$ cd logtalk
$ logtalk_tester -p eclipse (or ciao, or xsb, or swi, or...)

On Windows:

PS > cd Logtalk
PS > logtalk_tester.ps1 -p eclipse (or ...)

Do you want a nice report?

$ logtalk_tester -p eclipse -f xunit
$ logtalk_allure_report
$ allure open

Similar on Windows. Just add the .ps1 extension.

Just to contribute to this discussion, the Ciao test suite framework based on assertion was used also to encode ISO Prolog conformance test: GitHub - ciao-lang/iso_tests: Tests for ISO Prolog conformance (1047 test cases).

We’d also like to run logtalk_tester regularly (or other test suites easy to port) for checking compliance too. This will surely catch more errors, wrongly encoded test cases, etc.

Example with ECLiPSe by running just the Prolog standards compliance suite (note the number of tests):

$ cd ~/logtalk/tests/prolog 
$ logtalk_tester -p eclipse -w
% Batch testing started @ 2024-11-23 22:17:59
%         Logtalk version: 3.86.0-b01
%         ECLiPSe version: 7.0.57
% ...
% 192 test sets: 179 completed, 13 skipped, 0 broken, 0 timedout, 0 crashed
% 3509 tests: 156 skipped, 2868 passed, 485 failed (0 flaky)
%
% Batch testing ended @ 2024-11-23 22:29:42

Incomplete implementation of the format/2-3 predicates account for 282 test failures (tests are repeated with chars, codes, and atom). There are also some tests that are not consensual (with different Prolog implementers complaining of different ones…). But there also exposed bugs in core built-in predicates such as unify_with_occurs_check/2 and missing de facto standard arithmetic functions (e.g. hyperbolic functions).

My general advice is to fix first the test failures that you recognize as bugs and leave discussions on tests you disagree on to later.

1 Like

Btw, Logtalk 3.99.0 added lgunit support for exporting code coverage stats in Cobertura XML and LCOV formats plus support for exporting tests results in the Subunit and CTRF (Common Test Report Format) formats. These are industry standards, with multiple industry de facto standard and open source viewers and CI/CD integration support, which facilitate integration with enterprise testing and reporting software stacks.

But Logtalk cannot help in profiling the Prolog systems it is
layered on. Further you will only find a few Prolog systems that
open their own Prolog execution to profiling. And then they might

only open the 100% Prolog implemented part and but not
natively implemented parts. This is why for example missing
test cases for PIP-0110 such as the following don’t get caught:

/* Trealla Prolog 2.94.12 */
?- format("~3R", [7625597484987]), nl.
%%% crash

Migtht be a coverage problem that big int together with upper case
in format/3 was not tested. So some big int paths in the code would
show up in the Cobertura ( = Coverage ) report as unused.

Actually it crashes on Windows 11 with 0xc0000005, so it might
be also a matrix testing problem. A Prolog system known for its
profiler is SWI-Prolog. But a Cobertura ( = Coverage ) report

would show more than only byrd box ports:

A Cobertura ( = Coverage ) report usually go down to
source code line numbers, or even finer. SWI-Prolog
seems to offer something coarser, since it has both

library(prolog_profile) and library(prolog_coverage), the
later on clause level it seems. But Logtalk is totally on the
mercy of the underlying Prolog system, to deliver more than

only a profiling of the test harness, namely the profiling of
the implementation of predicates. Profiling of the implementations
would be needed for Enterprise grade coverage reports.

Ideally, test dialects are orthogonal to testing frameworks. This allows writing tests in the simplest possible way for a particular case while taking advantage of a testing framework features. A dedicated test dialect will always going to be arguably better than more general test dialects (think DSLs). Logtalk’s lgtunit supports user-defined test dialects for this exact purpose. Just for fun, I committed the following example of running tests expressed using ECLiPSe test dialect using lgtunit:

The tests.tst file is taken from the ECLiPSe documentation. We can run the tests manually using, e.g., GNU Prolog:

$ gplgt --quiet
| ?- logtalk_load(eclipse_tests_dsl(tester)).
% 
% tests started at 2026-05-04, 19:31:06
% 
% running tests from object tests
% file: /Users/pmoura/logtalk/examples/eclipse_tests_dsl/tests.tst
% 
% t1: success (in 0.000000000/0.000000000 cpu/wall seconds)
% t2: success (in 0.000000000/0.000000000 cpu/wall seconds)
% t3: success (in 0.000000000/0.000000000 cpu/wall seconds)
% t4: success (in 0.000000000/0.000000000 cpu/wall seconds)
% 
% 4 tests: 0 skipped, 4 passed, 0 failed (0 flaky)
% runtime: 0.000000000/0.000000000 cpu/wall seconds
% completed tests from object tests
% 
% no code coverage information collected
% tests ended at 2026-05-04, 19:31:06
% 

(633 ms) yes

Or using the logtalk_tester automation script using e.g. the Trealla Prolog backend:

$ cd logtalk/examples/eclipse_tests_dsl/
$ logtalk_tester -p trealla
% Batch testing started @ 2026-05-04 19:43:44
%         Logtalk version: 3.100.0-b01
%         Trealla Prolog version: 2.94.20
%         OS version: Darwin, x86_64, 23.6.0
%
% logtalk/examples/eclipse_tests_dsl
%         4 tests: 0 skipped, 4 passed, 0 failed (0 flaky)
%         completed tests from object tests in 0.000 seconds
%         clause coverage n/a
%
% 1 test sets: 1 completed, 0 skipped, 0 broken, 0 timedout, 0 crashed
% 4 tests: 0 skipped, 4 passed, 0 failed (0 flaky)
%
% Batch testing ended @ 2026-05-04 19:43:49
% Batch run took 0h:00m:05s

This solution provides an interoperability path between different systems, avoiding the need to rewrite tests sets.

Still this interoperability is not the same as Cross-Platform testing.
The goal is completely different from having a petting Zoo of Prolog
systems on one single operating system and single machine. From

having a large enclosure of roaming wild animals in a real world
savanna of Prolog systems, running on various platforms. There
the problem is mainly to convey the message that:

By ensuring your app performs consistently everywhere,
you build reliability and confidence with your users.

Now I find that Tealla Prolog ships an .exe, but the creator of
Trealla Prolog cannot test it, since he doesn’t have a Windows
machine. And if I don’t build it locally, I don’t have a symbol table,

and I cannot decipher a stack trace from today 2026-05-05:

From this call:

/* Trealla Prolog 2.95.5 */
?- format("~3R", [7625597484987]), nl.
%%% crash

Sometimes applications deliver cross stack trace dump analysis,
you can basically submit a stack trace from anywhere, and
they can analyze whats going on. Maybe I have overlooked something,

but I was already going low level with WinDbg. But it is also
a case where the error might or might not be in Prolog code itself,
it might be also a built-in implementation error, or even an error

in a 3rd party native component. So I have to turn from user to
developer and build it locally, so that I see the symbol table.
Having the cost of a development environment? I don’t think

that Logtalk can relieve me from this old fashioned workflow.

BTW: Now that SWI-Prolog and IntelliJ share the same head quarters
greater area of Amsterdam, I wonder when they will team up?

Analyze external stack traces
Last modified: 08 January 2026
https://www.jetbrains.com/help/idea/analyzing-external-stacktraces.html

Actually the above solution is a little bit over the top, since it
is for obsfuscated applications, i.e. uses cryptographic methods.