Thousand separator (space) not working when parsing #193

orousseil · 2018-05-18T15:34:00Z

Hello
The french thousand separator is the space character. When you parse a money amount with currency from a string it doesn't work !!

See this test case for example :

@Test
public void testParseKO() {
    MonetaryAmountFormat format = MonetaryFormats
            .getAmountFormat(AmountFormatQueryBuilder.of(Locale.FRANCE)
                    .set(CurrencyStyle.CODE)
                    .build());
    MonetaryAmount amountOk = format.parse("123,01 EUR");
    MonetaryAmount amountKo = format.parse("14 000,12 EUR");
    assertThat(amountOk.getNumber().doubleValueExact()).isEqualTo(123.01); // OK
    assertThat(amountKo.getNumber().doubleValueExact()).isEqualTo(14000.12); // KO
}

result is:

May 18, 2018 5:36:30 PM org.javamoney.moneta.DefaultMonetaryContextFactory createMonetaryContextNonNullConfig
INFO: Using custom MathContext: precision=256, roundingMode=HALF_EVEN

org.junit.ComparisonFailure: 
Expected :14000.12
Actual   :0.12

The text was updated successfully, but these errors were encountered:

keilw · 2018-12-10T23:13:03Z

Also check against the BP for consistent behavior.

keilw · 2018-12-17T13:57:40Z

According to https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html even German locales may use spaces to separate digits of very large amounts.

keilw · 2018-12-29T18:04:04Z

@stokito, @marschall and others, do you have an idea (and time) to address this? I already mentioned it to @atsticks, @otaviojava, but they may be on vacation right now. Especially ParseContext seems wrong/broken, because parsing properly formatted string won't work especially if blanks are included. MonetaryFormatsTest highlights the major problems (tests disabled, they should be fixed and enabled again)

stokito · 2018-12-29T21:44:30Z

I don't have any idea ATM but tomorrow will try to fix that

keilw · 2018-12-29T22:22:36Z

Thanks, keep in mind, this is a showstopper even for the MR. All the other stuff including benchmarks are nice to have, but we can keep that till a new JSR in the course of 2019.

marschall · 2018-12-30T10:47:16Z

org.javamoney.moneta.format.MonetaryFormatsTest.testFormatFR() currently fails because U+00A0 (NBSP) is used instead of U+0020 (SP). I believe NBSP is correct as we do not want line breaks in a number.

I'm a bit confused by the fix for #151 in org.javamoney.moneta.internal.format.AmountNumberToken#removeNBSP(String)

marschall · 2018-12-30T11:22:07Z

I believe parsing with of spaces as a grouping separator is in general broken because of org.javamoney.moneta.internal.format.ParseContext#lookupNextToken() which considers white space a token separator.

stokito · 2018-12-31T21:51:56Z

WIP here https://github.com/stokito/jsr354-ri/tree/issue-193
will continue in next year :)

stokito · 2019-01-14T01:01:56Z

I fixed the issue in the https://github.com/stokito/jsr354-ri/tree/issue-193
But the commit history is very dirty so during next week I'll try to recommit everything and only then will send a PR.
Also the bad news is that there is other issues found during a testing with negative amounts

stokito · 2019-01-27T18:22:27Z

Ok guys, let's discuss the issue with Non-breaking space NBSP here.
Some locales (mostly European) uses a space as a Grouping (thousands) Separator and as a separator between the currency code. But the Unicode consortium decided that it will be better to use NBSP between amount and currency code and latter French consortium decided to use the NBSP even inside amount for thousands separation.

That's why parsing of amounts with a usual space is failing and that's why after formatting we have strange failed tests like expected [12,50 CHF] but found [12,50 CHF] - here the actual string contains NBSP between amount and currency code.
In fact the same problem happens in other languages and in JDK itself especially after JDK9 where the default locales provider was changed to CLDR (Unicode's official Common Locale Data Repository) which is uses NBSP and this caused a lot of migration issues https://bugs.openjdk.java.net/browse/JDK-4510618
There is mentioned other link to JDK9 release note that CLDR now is used as a default locales provider http://www.oracle.com/technetwork/java/javase/9-relnote-issues-3704069.html#JDK-8008577
This may explain why we didn't faced the problem before.

JDK8 uses CLDR release 21, JDK9 uses their CLDR release 29, JDK 11 uses CLDR release 33. In your IDE you can search for a class sun.text.resources.ext.FormatData_fr from JAVA_HOME/jmods/jdk.localedata.jmod.
In the FormatData_fr you can find this lines:

            { "NumberElements",
                new String[] {
                    ",", // decimal separator
                    "\u00a0", // group (thousands) separator
                    ";", // list separator
                    "%", // percent sign
                    "0", // native 0 digit
                    "#", // pattern digit
                    "-", // minus sign
                    "E", // exponential
                    "\u2030", // per mille
                    "\u221e", // infinity
                    "\ufffd" // NaN
                }
            },

You may note the \u00a0 i.e. NBSP.

So let's see what is inside the latest CLDR.
Go to http://cldr.unicode.org/ click on CLDR v34 released link, then in the grid on top click on the link to Charts34 then By-Type / Numbers/ Symbols and find Symbols: group to see the grouping symbols are used for different locales.
Or if you lazy click on this link :) https://www.unicode.org/cldr/charts/34/by_type/numbers.symbols.html#a1ef41eaeb6982d

You may see that fr locale has some space and you can copy it and search this symbol by unicode table https://unicode-table.com/en/search/?q=%E2%80%AF and see that this is Narrow No-Break Space (NNBSP) U+202F. Wait, this not NBSP this is something different!
That's right: in latest CLDR release they changed NBSP as thousands separator to NNBSP. And this change now applies to all French locales like Canadian fr_CA. They even had some political battle and drama inside:

And even more: they have some intentions to change this for other locales:

So in fact to be prepared for the next JDK releases we should handle both NBSP and NNBSP.

IMHO the Unicode guys thinking only about typography and presentation but from what I saw in their discussions they never even thinked about parsing or how to input this symbols for users. They changing the formats without even thinking about how many bugs this will produce.
I'm pretty sure that this NBSP/NNBSP are just not needed.
What is a purpose of the NBSP?
If you want that your browser always showed the amount and currency together then you can do that with some <span style="white-space:nowrap;">12,50 CHF</span>. And that's all. Even more: you should do that anyway because decimal digits are Arabic and are written from right to left so you should always align amount by a right side especially in tables i.e. <span style="white-space:nowrap; text-align: right">12,50 CHF</span>.
So this should be always solved on the container level.

Why to use Narrow-NBSP? Because this can make visually easier to see that those digits are close together. But again, not all fonts supports this. In my opinion this should be fixed be replacing the space symbol with underscore _ but none of formats are using this.

But the main point against this spaces as it was already mentioned and they are not ASCII symbols and this may be unexpected for developers especially on storing or when sending to external systems.

So we have several options:

Use the NPSB as grouping and currency separator i.e. the same behavior as in JDK.
- Pros we'll have the same behavior as JDK9 which is good for interoperability.
- Cons It will be difficult to implement for us because now we always expect a space and we need to change all parsing logic and overcomplicate it.

Cons user's will receive the same unexpected problems, report here the issues and in 99% of cases they will try to replace NBSP with space before parsing and after formatting.

Neglect and use only space on both parsing and formatting.
- Pros this simplifies a little bit a parsing for us and parsing can be faster.
- Cons but users who already have an amount with NBSP should replace it with a space before parsing.
Follow the Postel's law and accept both space and NBSP on parsing but use the usual space on formatting.
- Pros it's easy to implement and in fact we already use this approach for BG locale Bulgarian Lev has wrong Currency Format #151 (comment).
- Pros this makes life much easier for developers so less potential bugs which is quite important when you working with money.
Allow to specify the behavior by a pattern or additional property like typographicalMode. For example if you want that the amount was formatted with NBSP you can set some boolean property.

So I think we should use extend third approach for all locales, not only for Bulgarian locale but latter we can add the additional boolean flag to force formatting with NBSP.

…g numbers with thousands for locales like French and Bulgaria which uses space as a separator. Also fix formatting that fails when currency unit is not space-separated from amount.

orousseil mentioned this issue May 22, 2018

Thousand separator (space) not working when parsing JavaMoney/jsr354-ri-bp#46

Closed

keilw added the bug label May 28, 2018

keilw added this to the 1.3 milestone May 28, 2018

keilw modified the milestones: 1.3, 1.4 Jul 15, 2018

keilw pinned this issue Dec 17, 2018

keilw changed the title ~~French thousand separator (space) not working when parsing~~ Thousand separator (space) not working when parsing Dec 24, 2018

keilw added the showstopper label Dec 24, 2018

keilw mentioned this issue Dec 30, 2018

Inconsistent formatting white space #261

Closed

keilw added the Priority: Blocker label Dec 30, 2018

stokito mentioned this issue Jan 12, 2019

DefaultMonetaryAmountFormat fails when currency unit is not space-separated #210

Closed

stokito mentioned this issue Jan 26, 2019

Some format tests failing #202

Closed

stokito mentioned this issue Jan 27, 2019

Fix parsing and formatting amounts with a space as thousand separator #278

Merged

keilw closed this as completed in #278 Jan 27, 2019

This was referenced Jan 29, 2019

Indian number formatting broken #275

Closed

Formatting with a custom pattern of the negative amounts incorrectly puts a minus before currency code #282

Closed

charphi mentioned this issue Jul 1, 2019

NumberFormat produces different code point after Java 13 build 21 jdemetra/jdemetra-core#421

Closed

keilw unpinned this issue Aug 5, 2019

starksm64 mentioned this issue May 20, 2022

NumberFormatCustomizationTest.testNumberFormatField error under Java SE 17 jakartaee/jsonb-api#330

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thousand separator (space) not working when parsing #193

Thousand separator (space) not working when parsing #193

orousseil commented May 18, 2018 •

edited

Loading

keilw commented Dec 10, 2018

keilw commented Dec 17, 2018

keilw commented Dec 29, 2018

stokito commented Dec 29, 2018

keilw commented Dec 29, 2018 •

edited

Loading

marschall commented Dec 30, 2018 •

edited

Loading

marschall commented Dec 30, 2018

stokito commented Dec 31, 2018

stokito commented Jan 14, 2019 •

edited

Loading

stokito commented Jan 27, 2019 •

edited

Loading

Thousand separator (space) not working when parsing #193

Thousand separator (space) not working when parsing #193

Comments

orousseil commented May 18, 2018 • edited Loading

keilw commented Dec 10, 2018

keilw commented Dec 17, 2018

keilw commented Dec 29, 2018

stokito commented Dec 29, 2018

keilw commented Dec 29, 2018 • edited Loading

marschall commented Dec 30, 2018 • edited Loading

marschall commented Dec 30, 2018

stokito commented Dec 31, 2018

stokito commented Jan 14, 2019 • edited Loading

stokito commented Jan 27, 2019 • edited Loading

orousseil commented May 18, 2018 •

edited

Loading

keilw commented Dec 29, 2018 •

edited

Loading

marschall commented Dec 30, 2018 •

edited

Loading

stokito commented Jan 14, 2019 •

edited

Loading

stokito commented Jan 27, 2019 •

edited

Loading