floating point parsing is locale-dependent
I did some research to compare behavior with other shells.
The below table shows whether string conversion is locale-Dependent or Independent in different situations:
- | POSIX.1-2008 | yash 2.40 | zsh 5.1.1 | ksh 93u+ 2012-08-01 |
parsing literals in arithmetic expansions | N/A | I | I | D |
parsing variable values in arithmetic expansions | N/A | D | I | D |
printing results of arithmetic expansions | N/A | D | I | D |
parsing operands of the printf built-in | D | D | both | D |
printing results of the printf built-in | D | D | D | D |
Locale-dependency of floating-point conversion was once considered in #19731. Since then, interpretation of floating-point literals which directly appear in arithmetic expansion has been locale-independent. However, interpretation of floating-point values in a variable was kept locale-dependent to maintain interoperability with the printf built-in.
Consider passing the result of arithmetic expansion as an operand to the printf built-in. As required by POSIX.1-2008, the built-in interprets the operand locale-dependenly, and as such, the result of arithmetic expansion must have been formatted locale-dependently. Also consider assigning the result of arithmetic expansion to a variable and using the variable in another arithmetic expansion. Since a locale-dependently formatted value is assigned to the variable, the shell needs to interpret the variable value locale-dependently when evaluating the arithmetic expression.
That said, I have to admit that this does not work if you change the locale during execution of a shell script. One possible solution might be to provide a safe way to convert between locale-independent and -dependent format of floating-point values.
I don't come up with a good idea yet how to offer conversion from locale-dependent format to locale-independent format. AFAIK no other shell supports such conversion. Should yash have a special method for conversion which is incompatible with other shells?
Here are my thoughts on the matter.
I think conversion from locale-dependent to locale-independent format is wrought with problems and I don't think you should try it.
I understand your argument about interoperability with the printf built-in, but I don't think that's worth the price either. Currently, if a yash script executed in France or the Netherlands produces floating point output for parsing by another instance of that script run in Japan or America, it will fail to parse unless locale settings are altered.
Instead, I think yash should act like zsh -- in my opinion, it strikes the best balance. Internally, everything float is done as if the POSIX/C locale were active, including producing results of arithmetic expansion. Only printf produces locale-dependent output. If printf output needs to be interoperable with the shell, that's easy: it should simply be invoked as LC_ALL=C printf. Locale-dependent output should not be used as input.
I wrote the preceding anonymous comment. Forgot to log in, sorry.
Instead, I think yash should act like zsh
In that plan, we need to consider the possibility for the printf built-in to misinterpret floating-point arguments. Zsh's printf built-in first interprets the argument locale-dependently, and only if it fails, it tries to interpret it as an arithmetic expansion. If the C-locale representation of a number could be interpreted as a different number in the current locale, the printf built-in would print the wrong number.
Specifically, in some locals like da_DK, periods are used to group digits into 3-digit components. For example, the number 12.000 is considred not as twelve but as twelve thousand. This does not work well with C-locale representation of numbers where the period is a decimal point.
Closing for now. For further discussion, please open an issue or discussion on GitHub. https://github.com/magicant/yash
The parsing of the arithmetic floating point in yash depends on the user's locale. Only the current locale's floating point character is accepted. That means the arithmetic grammar of scripts using floating point depends on the end user's current locale settings, which kills portability and seems clearly undesirable. The only way to make such a script portable is to unset LC_ALL and set LC_NUMERIC to POSIX.
Output: