About the radix option (and a little about float write options)

JanWielemaker · November 15, 2024, 11:07am

I had a look at the radix(+Radix) option of XSB. I also has a closer look at SICStus float_format(+Format). Actually implementing this stuff makes you more aware of the issues

Dealing with Radix looked straightforward. In the current proposal of the values I think binary is missing (ISO: e.g., 0b10110). hex seems a bit out of style compare to the other option names. Possibly hexadecimal is better? Anyway, I added the named options as well as supporting integers. I didn’t read carefully though and now insert the prefix, so

?- write_term(100, [radix(hex)]).
0x64

As the TBD says there should possibly an option for this, I leave this for now. Doing this made me realize that several systems allow reading grouped integers as 1_000_000 and/or 1 000 000. This is an attractive option to have as well, as it allows (for example) to configure the toplevel to use this notation.

Assuming we probably agree that all these are useful, we have

The radix
Whether or not to include the radix prefix
Grouping, possibly with some more options such as the grouping character and wrapping long lines.

For floats we have similar issues

Precision
Width. I fail to understand what this does, @Theresa_Swift , can you give an example?
Specifier (gGfF, but I assume also eE?)
Use locale or Prolog, i.e., write Prolog 3.5 as 3,5 as used in several European languages.

Given all these options, and possibly more, I’m getting more and more tempted to go for SICStus float_format(+Format) and similarly (new) integer_format(+Format).

The main question is what the format specifiers should be. We could support multiple, distinguished by the first character, e.g., using ~... for format/2 based and %... for C printf based?

jschimpf · November 16, 2024, 11:40am

In C printf (of which Prolog format/2,3 is more or less a subset) we have

%[-+ 0][Minwidth][.Precision][eEfFgG]

where

Minwidth is the minimum(!) field width, possibly padded left or right
Precision is the number of digits after the dot

and both of these can be digits or *, in which case the number is taken from the argument list.
The flags mean

- align left within Minwidth
+ always print a sign (+ or -)
<spc> print a space if there is no sign
0 pad left with 0 instead of spaces
: SWI(?) extension indicating use of locale

If we support float_format(+Format), then we have to specify how to pass the *-parameters, so probably float_format(+Format,+Params), although that gets a bit unwieldy.

But in the interest of a minimal feature set, I’d argue that for for the purposes of write_term:

all the width and padding-related features can be omitted because they only make sense if you want to print a standalone float (but then you can use format/2,3!). If you have a complex term in Prolog syntax with several embedded floats, padding around individual floats is nonsense.
options that create Prolog-incompatible syntax (such as different locale) can be omitted, as they make no sense for a float embedded in a term in Prolog syntax.

That leaves us with Precision, eEfFgG and possibly +.

For these reasons, I’ve come full circle and would now recommend just XSB’s

float_precision(+Precision)
float_specifier(+eEfFgG)

possibly combined into something like

float_style(+eEfFgG,+Precision)

As for minwidth/maxwidth/padding/wrapping I think they should be reconsidered for the whole Prolog term, not specifically for floats.

JanWielemaker · November 16, 2024, 2:57pm

Agree. Note that format/2 does not support these either as spacing and padding is controlled using ~t, etc. I see that XSB float_width indeed does this:

| ?- write(=),write_term(356.35, [float_width(10)]), write(=).
=    356.35=

That is unclear to me. Note that write/1 also writes something that can not be read back. The XSB radix option example below, also produces invalid Prolog syntax.

| ?- write_term(3653643, [radix(hex)]).
37c00b

SICStus provides (in format/2) the h and H specifiers, which are interesting. They act as gG in the sense that they choose between e and f format, but they always print enough digits to read back the same number (as write/1). The optional number specifies how eager it is using exponential notation. It will do so if either the left or right side in f notation has more than N leading/trailing zeros. Default is 3. I have added this to SWI-Prolog.

The XSB float precision seems to differ from what printf() does. E.g.

| ?- write_term(356.356437, [float_precision(4)]).
356.4

Whereas f specifies digits after the dot (with an arbitrary number of digits before the dot) and e also specifies digits after the dot, but with exactly one digit before. XSB does something else, which looks useful to me, but is not supported by any format specifier I’m aware of.

Or just omit them? In rare cases you can dynamically construct the format, no?

I am against this as multi-argument options make it hard to define a generic option processing library. float(style(.., ...)) would work for me.

jschimpf · November 17, 2024, 9:23am

I was deliberately imprecise. I am merely trying to identify guidelines that can help us sort though the big pile of conceivable features, and crystallise a subset of core functionality for the PIP.
The idea that write_term should print something that is still recognisable as a Prolog term seems helpful to me. I would not want to duplicate all functionality of format/2,3 – if you want to print data tables for your accountant, use format/printf, not write_term.

That would be a shame. One of the advantages of an option-list (compared to a format-string language) is that runtime parameters can be passed directly, I would not want to lose that.

That had not occurred to me – why is that such a problem?

JanWielemaker · November 17, 2024, 9:50am

Part seems SWI-Prolog specific. For historical reasons, option processing allows for [name=Value, ...] as well as [name(Value),...]. This is deprecated for a long time, so possibly we can remove that … More recent at IMO a worthwhile direction is that option processing allows for SWI-Prolog’s dynamic dicts, e.g.

write_term(T, #{max_depth:20, numbervars:true}).

This representation is easy to manipulate and (much) faster to process.

Finally, library(option) provides e.g.

?- option(max_depth(Depth), Options, 10).

This is very frequently used, where the last argument provides the default.

Of course, we can find ways around this. The general direction I took though is to get rid of multi-valued options and use an option term instead. Note that (AFAIK), ISO defines nor mentions multi-valued options.

jschimpf · November 17, 2024, 11:34pm

Revised proposal for integers, floats and strings:

Subterm-type-specific options

float(+SubOptions): how to print floats, with a list of sub-options in
- precision(+Precision): number of digits after the decimal point. TBD: default, and symbolic values for “as many as accurate” and “as many as needed for reading back”.
- style(+Style): select a C-printf-like style, one of f (default), e or g.
- upper: use upper case E for the exponent indicator instead of e.
integer(+SubOptions): how to print integers, with a list of sub-options in
- base(+Base): print integers in the given base. Base is either one of the atoms dec (default, without prefix), bin (with prefix 0b), oct (with prefix 0o), hex (with prefix 0x), or an integer in the range 2…36 (with prefix Base').
- bare: suppress the base indicator prefix.
- grouping(+Size): if 0 (default) no grouping. Otherwise size of digit groups.
- separator(+Atom): string used to separate digit groups, default '_'. TBD
- upper: use upper case letters instead of the default lower case for printing in bases greater than 10.
atom(+SubOptions), string(+SubOptions), text(+SubOptions): how to print atoms or strings, with a list of sub-options in
- max(+Length): truncate text after Length characters. Don’t truncate if 0 (default).
- quote(+When): whether to print quotes. When is one of never (default), when_needed or always.
- escape(+What): whether to print nonprintable characters as escape sequences in quoted text. When is one of all (default), most (all but newlines and tabs) or none.