Skip to content

More tests #176

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 25 commits into from
Closed

More tests #176

wants to merge 25 commits into from

Conversation

jmid
Copy link
Collaborator

@jmid jmid commented Sep 11, 2021

This PR

  • adds tuple and bind tests (both positive, negative, and statistics).
  • I also added a manual shrinker and test for the IntTree test.
  • I grouped test-names per sub-module as the long list at the end was getting hard to maintain.

Edit: I've now also

  • added shrink-logging tests to compare the total number of shrinking attempts.
  • separated the tests into a separate "test library" and runner.
    Then we have one runner for the expect tests in the CI - while other runners can reuse (some of) the same tests,
    e.g., for local shrinker benchmarking.

Here's the diff -y of the new test outputs. Note how QCheck2's int shrinking strategy generally spends
less successful shrinking steps (again) - as have been discussed in, e.g., PR #153 and #173:

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have different components failed (0 shrink steps):		Test pairs have different components failed (0 shrink steps):

(4, 4)									(4, 4)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have same components failed (125 shrink steps):	     |	Test pairs have same components failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs have a zero component failed (124 shrink steps):	     |	Test pairs have a zero component failed (122 shrink steps):

(-1, 1)								     |	(1, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are (0,0) failed (125 shrink steps):			     |	Test pairs are (0,0) failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are ordered failed (125 shrink steps):		     |	Test pairs are ordered failed (2 shrink steps):

(0, -1)									(0, -1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs are ordered reversely failed (125 shrink steps):	     |	Test pairs are ordered reversely failed (63 shrink steps):

(0, 1)									(0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test pairs sum to less than 128 failed (121 shrink steps):	     |	Test pairs sum to less than 128 failed (59 shrink steps):

(0, 128)								(0, 128)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples have pair-wise different components failed (7 shrink st |	Test triples have pair-wise different components failed (3 shrink st

(0, 7, 7)							     |	(0, 0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples have same components failed (188 shrink steps):	     |	Test triples have same components failed (64 shrink steps):

(0, -1, 0)							     |	(0, 1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples are ordered failed (188 shrink steps):		     |	Test triples are ordered failed (3 shrink steps):

(0, -1, 0)								(0, -1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test triples are ordered reversely failed (188 shrink steps):	     |	Test triples are ordered reversely failed (64 shrink steps):

(0, 0, 1)								(0, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples have pair-wise different components failed (23 shrin |	Test quadruples have pair-wise different components failed (4 shrink

(0, 0, 0, 0)								(0, 0, 0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples have same components failed (250 shrink steps):	     |	Test quadruples have same components failed (126 shrink steps):

(0, 1, 0, 1)								(0, 1, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples are ordered failed (251 shrink steps):		     |	Test quadruples are ordered failed (5 shrink steps):

(0, 0, -1, 0)								(0, 0, -1, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test quadruples are ordered reversely failed (251 shrink steps):     |	Test quadruples are ordered reversely failed (66 shrink steps):

(0, 0, 0, 1)								(0, 0, 0, 1)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test bind ordered pairs failed (123 shrink steps):		     |	Test bind ordered pairs failed (1 shrink steps):

(0, 0)									(0, 0)

--- Failure --------------------------------------------------------	--- Failure --------------------------------------------------------

Test bind list_size constant failed (261 shrink steps):		     |	Test bind list_size constant failed (15 shrink steps):

(4, [0; 0; 0; 0])							(4, [0; 0; 0; 0])

As a bonus, there are also pretty (and completely identical) histograms to be found:

+++ Stats for quad dist ++++++++++++++++++++++++++++++++++++++++++++	+++ Stats for quad dist ++++++++++++++++++++++++++++++++++++++++++++

stats quad sum:								stats quad sum:
  num: 500000, avg: 200.13, stddev: 58.33, median 200, min 5, max 39	  num: 500000, avg: 200.13, stddev: 58.33, median 200, min 5, max 39
    5.. 24:                                                         	    5.. 24:                                                         
   25.. 44:                                                         	   25.. 44:                                                         
   45.. 64: ##                                                      	   45.. 64: ##                                                      
   65.. 84: ######                                                  	   65.. 84: ######                                                  
   85..104: ############                                            	   85..104: ############                                            
  105..124: #####################                                   	  105..124: #####################                                   
  125..144: ###############################                         	  125..144: ###############################                         
  145..164: ##########################################              	  145..164: ##########################################              
  165..184: ##################################################      	  165..184: ##################################################      
  185..204: ####################################################### 	  185..204: ####################################################### 
  205..224: #####################################################   	  205..224: #####################################################   
  225..244: ###############################################         	  225..244: ###############################################         
  245..264: ######################################                  	  245..264: ######################################                  
  265..284: ##########################                              	  265..284: ##########################                              
  285..304: ################                                        	  285..304: ################                                        
  305..324: #########                                               	  305..324: #########                                               
  325..344: ####                                                    	  325..344: ####                                                    
  345..364: #                                                       	  345..364: #                                                       
  365..384:                                                         	  365..384:                                                         
  385..404:                                                         	  385..404:                                                         

+++ Stats for bind dist ++++++++++++++++++++++++++++++++++++++++++++	+++ Stats for bind dist ++++++++++++++++++++++++++++++++++++++++++++

stats ordered pair difference:						stats ordered pair difference:
  num: 1000000, avg: 25.02, stddev: 22.36, median 19, min 0, max 100	  num: 1000000, avg: 25.02, stddev: 22.36, median 19, min 0, max 100
    0..  4: ####################################################### 	    0..  4: ####################################################### 
    5..  9: #####################################                   	    5..  9: #####################################                   
   10.. 14: #############################                           	   10.. 14: #############################                           
   15.. 19: ########################                                	   15.. 19: ########################                                
   20.. 24: #####################                                   	   20.. 24: #####################                                   
   25.. 29: ##################                                      	   25.. 29: ##################                                      
   30.. 34: ################                                        	   30.. 34: ################                                        
   35.. 39: #############                                           	   35.. 39: #############                                           
   40.. 44: ############                                            	   40.. 44: ############                                            
   45.. 49: ##########                                              	   45.. 49: ##########                                              
   50.. 54: #########                                               	   50.. 54: #########                                               
   55.. 59: ########                                                	   55.. 59: ########                                                
   60.. 64: ######                                                  	   60.. 64: ######                                                  
   65.. 69: #####                                                   	   65.. 69: #####                                                   
   70.. 74: ####                                                    	   70.. 74: ####                                                    
   75.. 79: ###                                                     	   75.. 79: ###                                                     
   80.. 84: ##                                                      	   80.. 84: ##                                                      
   85.. 89: ##                                                      	   85.. 89: ##                                                      
   90.. 94: #                                                       	   90.. 94: #                                                       
   95.. 99:                                                         	   95.. 99:                                                         
  100..104:                                                         	  100..104:                                                         

stats ordered pair sum:							stats ordered pair sum:
  num: 1000000, avg: 75.12, stddev: 46.93, median 72, min 0, max 200	  num: 1000000, avg: 75.12, stddev: 46.93, median 72, min 0, max 200
    0..  9: ####################################################### 	    0..  9: ####################################################### 
   10.. 19: #####################################################   	   10.. 19: #####################################################   
   20.. 29: #####################################################   	   20.. 29: #####################################################   
   30.. 39: #####################################################   	   30.. 39: #####################################################   
   40.. 49: #####################################################   	   40.. 49: #####################################################   
   50.. 59: #####################################################   	   50.. 59: #####################################################   
   60.. 69: #####################################################   	   60.. 69: #####################################################   
   70.. 79: #####################################################   	   70.. 79: #####################################################   
   80.. 89: #####################################################   	   80.. 89: #####################################################   
   90.. 99: #####################################################   	   90.. 99: #####################################################   
  100..109: ##################################################      	  100..109: ##################################################      
  110..119: ###########################################             	  110..119: ###########################################             
  120..129: #####################################                   	  120..129: #####################################                   
  130..139: ###############################                         	  130..139: ###############################                         
  140..149: #########################                               	  140..149: #########################                               
  150..159: ####################                                    	  150..159: ####################                                    
  160..169: ###############                                         	  160..169: ###############                                         
  170..179: ###########                                             	  170..179: ###########                                             
  180..189: ######                                                  	  180..189: ######                                                  
  190..199: ##                                                      	  190..199: ##                                                      
  200..209:                                                         	  200..209:                                                         

Edit: This again builds on top of #172 and #174 (merge! merge! 😄)

@jmid jmid requested a review from c-cube September 11, 2021 18:14
@jmid
Copy link
Collaborator Author

jmid commented Sep 12, 2021

A few observations:

  1. I found myself wanting a uniform positive int generator. With QCheck I can just write (pair pos_int pos_int), e.g., in the test pair_ordered but for QCheck2 to achieve the same I have to write Gen.(pair (pint ~origin:0) (pint ~origin:0)). The opaque Gen.t makes the optional parameter mandatory - which is just clunky (also pointed out in issue QCheck2.Gen design considerations #162)

  2. QCheck2's int-shrinker relies on bind, as it first generates a bool to decide the integer's sign. This has a side-effect for shrinking: true (meaning "generate a negative int") is reduced to false, thus reducing negative integers to positive ones - which can seem simpler as an end-user. Since we don't use a splittable RNG, the Random.State has moved on since the original state, and therefore the resulting int shrinking strategy will reduce an arbitrary negative integer -1975781842156211842 to an arbitrary positive integer 2696939011544317271 (not necessarily with a smaller, absolute value):

    $ head shrink_algo_logs/triple_same_components_qcheck2.expected 
    fails (-1975781842156211842, -4571327332697646483, -3285013039971199785) 
    fails (2696939011544317271, -4571327332697646483, -3285013039971199785)
    fails (0, -4571327332697646483, -3285013039971199785)
    fails (0, 3095744455229849699, -3285013039971199785)
    holds (0, 0, -3285013039971199785)
    ...
    

    This is also how 4571327332697646483 is reduced to the seemingly unrelated 3095744455229849699 above.

  3. The lack of a splittable RNG also makes for an unpredictable strategy when list and pair generators are combined in QCheck2. Thus in shrink_algo_logs/pair_lists_rev_concat_qcheck2.expected we find, e.g.:

    ...
    fails ([3762171117042495591; 4588024816217148396], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    holds ([], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    fails ([2152069955941623198], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    holds ([], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    fails ([0], [2696939011544317271; 1975781842156211841; 1035416029544138122; 1118378519987614091])
    holds ([0], [])
    fails ([0], [1599388225294475516; 3378876932193098527])
    ...
    

    Here a 2-element list in the first component is reduced to a seemingly unrelated 1-element list, and later a 4-element list in the second component is reduced to a seemingly unrelated 2-element list.
    This strategy of "starting from Random.State scratch" affects the result when the generator has been lucky to find a Random.State producing a counterexample requiring some relation between the components of a tuple, e.g., for
    Test pairs lists no overlap QCheck returns ([0], [0]) after 22 successful shrink steps whereas QCheck2 returns ([0], [0; 0; 0; 0]) after 27 successful shrink steps.

Overall:

@jmid jmid mentioned this pull request Nov 3, 2021
This was referenced Apr 2, 2022
@jmid
Copy link
Collaborator Author

jmid commented Apr 16, 2022

This PR is superseded by #234 and #237

@jmid jmid closed this Apr 16, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant