Simd json encode #120

nielsdos · 2025-02-03T21:16:31Z

TODO:

~~more performant resolver (too much call overhead, use template-style code)~~ SEMI-DONE, it's not worth going for the extra few percentages because that will create a lot of code bloat.
verifications
bitset stuff: does the widening worsen performance? (A: no)
benchmarks

mvorisek · 2025-02-04T01:02:22Z

I have no C toolchain setup so here is my idea in human words:

do {
#if simd support
    if (len >= sizeof(__m128i)) {
        calc mask;
        if (mask is "all not-to-be-escaped") {
            pos += sizeof(__m128i);
            continue;
        }
        mask_length = sizeof(__m128i);
    } else {
        set mask to "all maybe to-be-escaped"
        mask_length = len;
    }
    
    for (i = 0; i < mask_length; i++) {
        if (mask[i] == "not-to-be-escaped") {
            pos++; // OR i += -1 + , pos += "not-to-be-escaped" length calculated from mask if faster
            continue;
        }
        
        if (pos) {
            add unescaped;
        }

#endif
        handle single "maybe to-be-escaped" character as before;
#if simd support
    }
#endif
} while (len);

#if simd support
if (pos) {
    add unescaped;
}
#endif

The keypoint is to set/use mask even for non-SIMD (no SIMD support or too short input) to be able to process the "maybe to-be-escaped" characters at one place as before. That should generate shorter code, optimal performance for "never-to-be-escaped" chracters and hopefully introduce minimal overhead for other characters.

nielsdos · 2025-02-04T06:51:44Z

A couple of points regarding your suggestion:

I already use the mask to know where to put the escape characters. I do this by looping over all set bits, appending in bulk from the input what we didn't append yet, and then escaping the new character. This code is here:

php-src/ext/json/json_encoder.c

Lines 555 to 570 in d7f2562

    
           				do { 
        
           					/* Note that we shift the input forward, so we have to shift the mask as well, 
        
           					 * beyond the to-be-escaped character */ 
        
           					int len = zend_ulong_ntz(mask); 
        
           					mask >>= len + 1; 
        
           					smart_str_appendl(buf, s, len + pos); 
        
           					pos += len; 
        
           					us = (unsigned char)s[pos]; 
        
           					s += pos + 1; /* skip 'us' too */ 
        
           					pos = 0; 
        
           					bool handled = php_json_printable_ascii_escape(buf, us, options); 
        
           					ZEND_ASSERT(handled == true); 
        
           				} while (mask != 0);

The suggestion to use the mask even for short inputs will still make a slowdown. In my testing the overhead of looping over the bits and performing the check is higher than just doing a "dumb" byte per byte loop.
The remaining performance "issue" is regarding the case where we have to escape (almost) everything. The performance for not needing to escape is pretty good.
My code would've been closer to your pseudocode if we didn't need to cater for non-printable or UTF-8 characters. This adds an extra unavoidable complexity to the control flow of the algorithm.

This reverts commit d7f2562.

nielsdos · 2025-02-05T21:02:55Z

VTune shows some DSB stalls, that I have tried to improve by changing code layout. I also see some bad speculation (machine check) and data stalls in php_json_printable_ascii_escape (as expected) that I want to investigate.

nielsdos · 2025-02-06T22:08:04Z

The overhead of the worst case with escapes is now only around 6%, so relatively small.
This is mostly due to tweaking of the algorithm, and tightening the code layout.

This reverts commit bd6e462.

nielsdos added 14 commits February 3, 2025 20:58

wip

a3dd262

shift opt

144b0e1

Get rid of acc

7d485a9

SSE2 guard

58f30ff

use ascii

6a01058

dynamic mask

55a0b0e

comment

2a2008e

wip

7c966a6

wip

65f3b7e

potential solution

124396a

remove some debug

1826161

correct ifdefs, without resolver support

5a2c034

Attempt to use standard bitset stuff

326b982

preliminary resolver support (needs more work)

db54e3f

nielsdos mentioned this pull request Feb 3, 2025

json_encode can use SIMD php/php-src#17672

Open

fix native build

8bcd6bb

nielsdos added 12 commits February 4, 2025 07:55

let ci run without max_shift trick to compare perf

d7f2562

Revert "let ci run without max_shift trick to compare perf"

2b11554

This reverts commit d7f2562.

Reduce overhead of worst case to 1.5x

ef72f33

wip1

e3baa23

cheaper pos compute

3c8b68e

no magic nrs

4d16463

simple heuristic

27a89e0

various small improvements

b071dba

save ci resources

2ae769e

test with always inline

10bd63a

tweak

5df25a4

code layout trick (vtune dsb improvement)

d5c5b9f

nielsdos added 14 commits February 5, 2025 23:32

skip extra check

ceb8443

tweak

81efe6b

abstract away

1d7109d

mark branch

45e91f5

split off

dfd6de0

cs

ff4ef5b

fix mask on sse2 builds

57efb3a

test

df0117e

tweaks

246b413

tweak

847497f

flag

901a957

tighter code layout

8947f09

Remove check

4c41ad3

Tweak

40cd08f

nielsdos added 5 commits February 6, 2025 23:15

Code layout and comment tweak

dfb690d

test with indirect function ptr

bd6e462

Revert "test with indirect function ptr"

8d5a381

This reverts commit bd6e462.

code layout

d4297de

wip

bc48fb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simd json encode #120

Simd json encode #120

nielsdos commented Feb 3, 2025 •

edited

Loading

mvorisek commented Feb 4, 2025 •

edited

Loading

nielsdos commented Feb 4, 2025 •

edited

Loading

nielsdos commented Feb 5, 2025

nielsdos commented Feb 6, 2025

Simd json encode #120

Are you sure you want to change the base?

Simd json encode #120

Conversation

nielsdos commented Feb 3, 2025 • edited Loading

mvorisek commented Feb 4, 2025 • edited Loading

nielsdos commented Feb 4, 2025 • edited Loading

nielsdos commented Feb 5, 2025

nielsdos commented Feb 6, 2025

nielsdos commented Feb 3, 2025 •

edited

Loading

mvorisek commented Feb 4, 2025 •

edited

Loading

nielsdos commented Feb 4, 2025 •

edited

Loading