I want to lift this discussion to the automation issue:
Some thoughts about a test framework for PIPS
https://discourse.prolog-lang.org/t/some-thoughts-about-a-test-framework-for-pips/108
While the gold standard in the past of a testing solution would
be , to be happy to be able to cursory test multiple Prolog systems.
By manually inspecting some logs on your local machine.
The challenges in the current world could be much more complex.
The problem is usually multiple platforms. Not onyl Linux, Mac and
Windows, already different CPU architectures Intel, AMD and ARM.
This gives huge testing matrices, with distributed results. A
cockpit might then create a Single Pane of Glass (SPOG), giving
you the commander Spock looking at his Scope feel of control.
While Logtalk has some allure based tooling. But there is a gap
between offering some tooling, and practicing the tooling itself.
I don’t find some published reports. Also other Prolog systems
might prefer some more lightweight solutions, that don’t depend
on 3rd party tooling. For exampe this proposal creates two
dependencies, namely Logtalk and Allure:
But it lacks the galactic matrix automation! How do you scale it, so
that the testing gets fully automated? There is no suggestion
what scripting to use when platforms are mixed, like Linux,
Mac and Windows! It has rather the repelling suggestion, via the
.ps1 extension, to duplicate scripting for different platforms, since
the .ps1 extension refers to PowerShell. Jan W. has repeatedly
demonstrated reporting with graphical output, I recently tried the
same, with a 100% Prolog written SVG generator. What is missing
is a bar chart color legend. Its actually a benchmark not a compliance test:
But under the hood there is also some automation framework
based on Java Ant Tasks. Which has the advantage that it is
portable across Linux, Mac and Windows, only requires
a Java JVM installed. But it might not be to the taste of everybody.
From seeing Ulrich Neumerkels test cases, I remember also
some cross Prolog system testing. But the issue usually starts
with the test result format questions. Allure suggests JSON based
test result files. But you can also use a Prolog databases format,
just the usual Prolog facts inside a plain text file, as the test result.

