Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Thousand separator (space) not working when parsing #193

Closed
orousseil opened this issue May 18, 2018 · 10 comments · Fixed by #278
Closed

Thousand separator (space) not working when parsing #193

orousseil opened this issue May 18, 2018 · 10 comments · Fixed by #278

Comments

@orousseil
Copy link

orousseil commented May 18, 2018

Hello
The french thousand separator is the space character. When you parse a money amount with currency from a string it doesn't work !!

See this test case for example :

@Test
public void testParseKO() {
    MonetaryAmountFormat format = MonetaryFormats
            .getAmountFormat(AmountFormatQueryBuilder.of(Locale.FRANCE)
                    .set(CurrencyStyle.CODE)
                    .build());
    MonetaryAmount amountOk = format.parse("123,01 EUR");
    MonetaryAmount amountKo = format.parse("14 000,12 EUR");
    assertThat(amountOk.getNumber().doubleValueExact()).isEqualTo(123.01); // OK
    assertThat(amountKo.getNumber().doubleValueExact()).isEqualTo(14000.12); // KO
}

result is:

May 18, 2018 5:36:30 PM org.javamoney.moneta.DefaultMonetaryContextFactory createMonetaryContextNonNullConfig
INFO: Using custom MathContext: precision=256, roundingMode=HALF_EVEN

org.junit.ComparisonFailure: 
Expected :14000.12
Actual   :0.12
@keilw
Copy link
Member

keilw commented Dec 10, 2018

Also check against the BP for consistent behavior.

@keilw
Copy link
Member

keilw commented Dec 17, 2018

According to https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html even German locales may use spaces to separate digits of very large amounts.

@keilw keilw pinned this issue Dec 17, 2018
@keilw keilw changed the title French thousand separator (space) not working when parsing Thousand separator (space) not working when parsing Dec 24, 2018
@keilw
Copy link
Member

keilw commented Dec 29, 2018

@stokito, @marschall and others, do you have an idea (and time) to address this? I already mentioned it to @atsticks, @otaviojava, but they may be on vacation right now. Especially ParseContext seems wrong/broken, because parsing properly formatted string won't work especially if blanks are included. MonetaryFormatsTest highlights the major problems (tests disabled, they should be fixed and enabled again)

@stokito
Copy link
Member

stokito commented Dec 29, 2018

I don't have any idea ATM but tomorrow will try to fix that

@keilw
Copy link
Member

keilw commented Dec 29, 2018

Thanks, keep in mind, this is a showstopper even for the MR. All the other stuff including benchmarks are nice to have, but we can keep that till a new JSR in the course of 2019.

@marschall
Copy link
Member

marschall commented Dec 30, 2018

org.javamoney.moneta.format.MonetaryFormatsTest.testFormatFR() currently fails because U+00A0 (NBSP) is used instead of U+0020 (SP). I believe NBSP is correct as we do not want line breaks in a number.

I'm a bit confused by the fix for #151 in org.javamoney.moneta.internal.format.AmountNumberToken#removeNBSP(String)

@marschall
Copy link
Member

I believe parsing with of spaces as a grouping separator is in general broken because of org.javamoney.moneta.internal.format.ParseContext#lookupNextToken() which considers white space a token separator.

@stokito
Copy link
Member

stokito commented Dec 31, 2018

WIP here https://github.com/stokito/jsr354-ri/tree/issue-193
will continue in next year :)

@stokito
Copy link
Member

stokito commented Jan 14, 2019

I fixed the issue in the https://github.com/stokito/jsr354-ri/tree/issue-193
But the commit history is very dirty so during next week I'll try to recommit everything and only then will send a PR.
Also the bad news is that there is other issues found during a testing with negative amounts

@stokito
Copy link
Member

stokito commented Jan 27, 2019

Ok guys, let's discuss the issue with Non-breaking space NBSP here.
Some locales (mostly European) uses a space as a Grouping (thousands) Separator and as a separator between the currency code. But the Unicode consortium decided that it will be better to use NBSP between amount and currency code and latter French consortium decided to use the NBSP even inside amount for thousands separation.

That's why parsing of amounts with a usual space is failing and that's why after formatting we have strange failed tests like expected [12,50 CHF] but found [12,50 CHF] - here the actual string contains NBSP between amount and currency code.
In fact the same problem happens in other languages and in JDK itself especially after JDK9 where the default locales provider was changed to CLDR (Unicode's official Common Locale Data Repository) which is uses NBSP and this caused a lot of migration issues https://bugs.openjdk.java.net/browse/JDK-4510618
There is mentioned other link to JDK9 release note that CLDR now is used as a default locales provider http://www.oracle.com/technetwork/java/javase/9-relnote-issues-3704069.html#JDK-8008577
This may explain why we didn't faced the problem before.

JDK8 uses CLDR release 21, JDK9 uses their CLDR release 29, JDK 11 uses CLDR release 33. In your IDE you can search for a class sun.text.resources.ext.FormatData_fr from JAVA_HOME/jmods/jdk.localedata.jmod.
In the FormatData_fr you can find this lines:

            { "NumberElements",
                new String[] {
                    ",", // decimal separator
                    "\u00a0", // group (thousands) separator
                    ";", // list separator
                    "%", // percent sign
                    "0", // native 0 digit
                    "#", // pattern digit
                    "-", // minus sign
                    "E", // exponential
                    "\u2030", // per mille
                    "\u221e", // infinity
                    "\ufffd" // NaN
                }
            },

You may note the \u00a0 i.e. NBSP.

So let's see what is inside the latest CLDR.
Go to http://cldr.unicode.org/ click on CLDR v34 released link, then in the grid on top click on the link to Charts34 then By-Type / Numbers/ Symbols and find Symbols: group to see the grouping symbols are used for different locales.
Or if you lazy click on this link :) https://www.unicode.org/cldr/charts/34/by_type/numbers.symbols.html#a1ef41eaeb6982d

You may see that fr locale has some space and you can copy it and search this symbol by unicode table https://unicode-table.com/en/search/?q=%E2%80%AF and see that this is Narrow No-Break Space (NNBSP) U+202F. Wait, this not NBSP this is something different!
That's right: in latest CLDR release they changed NBSP as thousands separator to NNBSP. And this change now applies to all French locales like Canadian fr_CA. They even had some political battle and drama inside:

And even more: they have some intentions to change this for other locales:

So in fact to be prepared for the next JDK releases we should handle both NBSP and NNBSP.

IMHO the Unicode guys thinking only about typography and presentation but from what I saw in their discussions they never even thinked about parsing or how to input this symbols for users. They changing the formats without even thinking about how many bugs this will produce.
I'm pretty sure that this NBSP/NNBSP are just not needed.
What is a purpose of the NBSP?
If you want that your browser always showed the amount and currency together then you can do that with some <span style="white-space:nowrap;">12,50 CHF</span>. And that's all. Even more: you should do that anyway because decimal digits are Arabic and are written from right to left so you should always align amount by a right side especially in tables i.e. <span style="white-space:nowrap; text-align: right">12,50 CHF</span>.
So this should be always solved on the container level.

Why to use Narrow-NBSP? Because this can make visually easier to see that those digits are close together. But again, not all fonts supports this. In my opinion this should be fixed be replacing the space symbol with underscore _ but none of formats are using this.

But the main point against this spaces as it was already mentioned and they are not ASCII symbols and this may be unexpected for developers especially on storing or when sending to external systems.

So we have several options:

  1. Use the NPSB as grouping and currency separator i.e. the same behavior as in JDK.
    • Pros we'll have the same behavior as JDK9 which is good for interoperability.
    • Cons It will be difficult to implement for us because now we always expect a space and we need to change all parsing logic and overcomplicate it.
  • Cons user's will receive the same unexpected problems, report here the issues and in 99% of cases they will try to replace NBSP with space before parsing and after formatting.
  1. Neglect and use only space on both parsing and formatting.
    • Pros this simplifies a little bit a parsing for us and parsing can be faster.
    • Cons but users who already have an amount with NBSP should replace it with a space before parsing.
  2. Follow the Postel's law and accept both space and NBSP on parsing but use the usual space on formatting.
  3. Allow to specify the behavior by a pattern or additional property like typographicalMode. For example if you want that the amount was formatted with NBSP you can set some boolean property.

So I think we should use extend third approach for all locales, not only for Bulgarian locale but latter we can add the additional boolean flag to force formatting with NBSP.

stokito added a commit to stokito/jsr354-ri that referenced this issue Jan 27, 2019
…g numbers with thousands for locales like French and Bulgaria which uses space as a separator. Also fix formatting that fails when currency unit is not space-separated from amount.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants