-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathVERSION
777 lines (727 loc) · 47.4 KB
/
VERSION
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
v0.7.1d: QuadRay, code name "GIzmo+1d": macOS M1, VS2022, Ubuntu 22.04
- add VS2022 support for RooT demo and core test
- use separate domains for RT and PT (best performance for both modes)
- add option to toggle path-tracer from command-line
- implement additional path-tracer controls
- clean up command-line messages in engine apps
- clean up compilation with debug level 2
- switch to malloc for 64-bit pointer/address combo
- use assembler-local labels to build on M1 macOS
- add support for M1 macOS to makefiles
- add VS2022 support for SIMD test
- add notes for building on Windows with VS2022 and M1 macOS
- update documentation and main header (add braces to ASM_INIT)
- add double-precision logic/arithmetic to ARMv7, x86
- add workarounds for POWER8 and POWER9 targets on Ubuntu 22.04
- drop ppc64abi32 targets (since QEMU 5.2.0), also from QEMU build script
- add notes for VS2022, QEMU 6.2.0 and 7.2.0, Ubuntu 23.04
- swap 16-bit and SIMD integer compare test groups (30-37 <-> 38-44)
- update copyright year to 2023
v0.7.0g: QuadRay, code name "GIzmo+g", backports, VS2022, Ubuntu 22.04
- fix issue on Windows when screen scaling is enabled (high DPI displays)
- add VS2022 support for RooT demo and core test
- switch to malloc for 64-bit pointer/address combo
- require both SSE4.1 and SSE4.2 for SSE4 (v4) target slots
- add DAZ support for flush-to-zero mode on x86 (makes on par with RISCs)
- backport integer SIMD compare subset (min/max/ceq/cne/clt/cle/cgt/cge)
- backport tests for integer SIMD compare (30-36) (signed/unsigned)
- target slots AVX512DQ now include VL backends for 128/256-bit subsets
- optimize SIMD compare and mask-jump instructions for AVX-512
- add 64-bit sign/zero-extend bridges to existing 32/64-bit BASE subsets
- optimize standalone remainder instructions on ARM and POWER
- implement direct ASM section output comparison method (bypass C++ test)
- extended 30-reg 256-bit and 15-reg 512-bit POWER backends are deprecated
- extended POWER backends are still supported with v1.0.0f ASM feature set
- add VS2022 support for SIMD test
- update build scripts with TDM64-GCC 10.3.0-2 compiler reference
- update documentation and main header (add braces to ASM_INIT)
- add double-precision logic/arithmetic to ARMv7, x86
- add workarounds for POWER8 and POWER9 targets on Ubuntu 22.04
- drop ppc64abi32 targets (since QEMU 5.2.0), also from QEMU build script
- add notes for VS2022, QEMU 6.2.0 and 7.2.0, Ubuntu 23.04
- clean up comments in BASE and SIMD headers
- update copyright year to 2023
v0.7.1c: QuadRay, code name "GIzmo+1c": real-time path-tracer
- instructions for path-tracer test scenes are in root/RooT.h
- the RT_TEST_PT flag is now turned on along with SIMD buffers
- direct ray tracing mode is 20-30% slower with SIMD buffers enabled
- path-tracer runs 5-10 times faster with SIMD buffers on (scenes 17/18)
- allow running the path-tracer without pausing (interactive)
- path-tracer is now only enabled with test scenes where it works best
- fix issue on Windows when screen scaling is enabled (high DPI displays)
- add 8-bit (byte) BASE instruction subset, redesign 16-bit BASE
- add 8-bit elements SIMD subset (native on RISCs, mostly emulated on x86)
- add 64-bit sign/zero-extend bridges to existing 32/64-bit BASE subsets
- add 32-bit sign/zero-extend bridges to new 8/16-bit BASE subsets
- add RT_BASE flag to limit addressing granularity, extend range on ARMv8
- add mask-jump (mkj) SIMD instructions for 8/16-bit SIMD subsets
- add DAZ support for flush-to-zero mode on x86 (makes on par with RISCs)
- add support for AVX-512 fp16 subset to match existing ARMv8.2 + SVE
- AVX-512 fp16 requires separate binary (no target slot or cap check)
- implement direct ASM section output comparison method (bypass C++ test)
- AVX-512 fp16 now provides validation for ARM's fp16 using above method
- target slots AVX512DQ now include VL backends for 128/256-bit subsets
- target slots AVX512DQ now require BW support to facilitate 8/16-bit SIMD
- optimize SIMD compare and mask-jump instructions for AVX-512
- optimize setting flags instructions in 8/16-bit BASE subsets (on RISCs)
- optimize standalone remainder instructions on ARM and POWER
- extended 30-reg 256-bit and 15-reg 512-bit POWER backends are deprecated
- extended POWER backends are still supported with v1.0.0f ASM feature set
- fix 16-bit (half-int) BASE addressing granularity on POWER
- add notes for VS2022, QEMU 6.2.0, Intel SDE 9.0
- clean up comments in BASE and SIMD headers
- update copyright year to 2022
v0.7.1b: QuadRay, code name "GIzmo+1b", second development release
- implement ray buffering optimization to improve SIMD efficiency
- allocate array of SIMD-buffer structures for each node in the hierarchy
- keep SIMD-buffers separate for each thread and ray-bounce level
- run solvers for all nodes in the list before any shading (no overdraw)
- as buffers fill up trigger shading for full SIMD only (more efficient)
- flush all buffers upon frame rendering completion (process partial SIMD)
- direct ray tracing mode runs slower with buffers due to copying overhead
- path tracing mode benefits from SIMD-buffers the most with 5x speed up
- enabling SIMD-buffers along with PT test scene requires recompilation
- check RT_FEAT_BUFFERS in core/tracer/tracer.cpp for instructions
- implement integer SIMD compare subset (signed/unsigned)
- add integer SIMD compare on MIPS32/64 (min/max/ceq/cne/clt/cle/cgt/cge)
- add integer SIMD compare on ARMv8 (64-bit min/max emulated) and SVE
- add integer SIMD compare on POWER (64-bit emulated on POWER7)
- add integer SIMD compare on x86+SSE2/4 (64-bit emulated)
- add integer SIMD compare on x86+AVX1/2 (emulated for full SIMD AVX1)
- add integer SIMD compare on x86+AVX512
- add integer SIMD compare on original legacy targets (ARMv7, x86, PPC G4)
- add integer SIMD compare for half-int SIMD backends (16-bit elements)
- require both SSE4.1 and SSE4.2 for SSE4 (v4) target slots
- add tests for integer SIMD compare (38-51)
v0.7.0f: QuadRay, code name "GIzmo+f", fixes and tests
- fix displacement encodings on MIPS
- add testing for displacement levels and types
- update makefiles to support ancient HW (SSE2, SSE1 has issue with cvzps)
- update SIMD test framework, add scripts for test automation
- update comments for QEMU 5.2.0 and QEMU 6.0.0 (require ninja-build)
- update CORE test framework, add scripts for test automation
- fix feature flags in tracer, clean up comments
v0.7.1a: QuadRay, code name "GIzmo+1a", first development release
- add asin/acos SIMD meta instructions (asnps/acsps) to tracer
- allow changing inherited pause-mode in path-tracer
- allow path-tracer in all scenes (demo, test) for further development
- optimizing out path-tracer per scene significantly reduces memory footprint
- add flag to allow/prohibit path-tracer per scene (RT_OPTS_PT)
- enable pause-mode in RooT demo automatically when invoking path-tracer (-q)
- add quasi-realistic/quake/quality mode (-q) to CORE tests for path-tracer
- add camera ray sample randomizer with Tent filter for path-tracer
- implement antialiasing in path-tracer, add to RooT demo and CORE tests
- add feature flag for path-tracer in rendering backend, wrap code sections
- make path-tracer SIMD-width agnostic, fix normals for light surfaces
- add flag (RT_TEST_PT) to enable test scenes for path-tracer in RooT demo
- add 24/32/48-bit masked LCG methods to SIMD PRNG in engine and tracer
- add PRNG splitter for ray depth in path-tracer, use PRNG-based masking
- add PRNG splitter for Fresnel in path-tracer, use PRNG-based masking
- compute random diffuse sample over hemisphere
- add power series approximation for SIMD sin/cos
- make path-tracer's PRNG reproducible across SIMD widths (using root mask)
- add light sphere to test scene 18, enable Gamma, Fresnel
- decorrelate path-tracer samples within SIMD (using separate seed-plane)
- add preliminary support for basic path-tracer to RooT demo (-p/-q)
- accumulate path-tracer samples in fp-color planes
- allocate framebuffer's fp-color planes
- add recursive light sampling for path-tracer
- add emission properties to light surfaces
- compute random diffuse sample within halfcube (rough approximation)
- compute orthonormal basis for diffuse sampling in path-tracer
- add SIMD PRNG (16-bit masked from 32-bit LCG) to rendering backend
- add tests for half-int SIMD/BASE ops (run level 30-37)
- drop extended POWER targets from SIMD testing (no half-int support)
- add half-int SIMD arithmetic with saturate (except original SSE1)
- add implementation for half-int BASE ops across modern targets
- add BASE half-int support on legacy ARMv7 and x86
- add SIMD half-int support on legacy ARMv7 and x86
- adjust displacement types for BASE half-int on legacy ARMv7 and x86
- adjust displacement types for BASE half-int on MIPS and POWER
- adjust displacement types for BASE half-int on x86_64
- adjust displacement types for scalar fp16 on ARMv8
- split SIMD half-int subset from fp16 on ARMv8
- add SIMD half-int support on x86_64, enable on ARMv8
- add SIMD half-int support on MIPS and POWER
- add preliminary support for POWER9 fp128 SIMD ops (not tested)
- add preliminary support for ARMv8.2 fp16 SIMD ops (not tested)
v0.7.0e: QuadRay, code name "GIzmo+e", 2021 extended support
- clarify instructions for POWER8 server, Raspberry Pi 3/4
- update links and comments in project files
- make comment for compiler swapping on MIPS more generic
- update mappings for byte/char SIMD ops
- update TDM64-GCC compiler reference to version 9.2.0
- update copyright year to 2021
- update comments for remainders and scaled addressing
- optimize remainder ops on POWER9
- add scaled-indexed addressing modes
- use scaled-indexed addressing in tracer
- make thread affinity step configurable on Windows
v0.7.0d: QuadRay, code name "GIzmo+d", documentation edition
- clean up task descriptions in roadmap
- add notes for Ubuntu, QEMU, MIPS cross-compilers
- add Ubuntu (MATE) 20.04 LTS to makefile notes
- update standalone MIPS compiler to 2020.06-01
- change RUN_LEVEL to SUB_TEST for better wording
- clean up comment about displacement values
- add initial documentation for the assembler
- add sin/cos and log/exp math definitions to rtbase
- block antialiasing adjustment when limited by SIMD
- block window resizing when exceeds screen
- use portable definitions in plotters, clean up in engine and tracer
- add new task to roadmap for ray-marching algorithms
- add new task to roadmap for ray-sorting optimization
v0.7.0c: QuadRay, code name "GIzmo++", celebration edition
- celebrating C++ and its various compilers
- add notes for Ubuntu Server on Raspberry Pi 4
- add -mcpu=power8 compiler option to makefiles on POWER
- fix RISC targets with clang after version 6.0
- update copyright year to 2020
v0.7.0b: QuadRay, code name "GIzmo+b", 2020-02-02 archive edition
- all releases after 2020-01-01 have 2nd naming from their baseline: (GIzmo)
- letter from the update (b,c,..) appears concatenated after (+) in the name
- future minor releases (v0.7.Xa) will have digit and letter (+1a, +2a, +3a)
- future major releases (v0.8.Xa) will have the form: (2+, 2+1a, 2+2a, 2+3a)
- clean up and update comments related to recent compiler and QEMU versions
- fix comments for SIMD instructions in 3-operand forms, clarify for SIMD div
- add SIMD fma3 aliases as 3-operand forms: fma**3**
- fix SIMD fma3 emulation with fp32 elements on AVX1
v0.7.0a: QuadRay, code name "GIzmo+", 2020-edition ("GIzmo" + 2019 updates)
- all new releases from now on will use *X.Y.Za(bc..) naming scheme
- all branches start with letter (b), all tags start with letter (v)
- first release (tag) on every new branch will be marked with letter (a)
- all subsequent minor updates will have letters (b,c,..), tags aren't moving
- use (U,O) new keys (update/offscreen), drop F9/F10
- make key-handling more responsive in RooT demo (not skipping next frame)
- fix window size exceeding screen, add flag to make AA-grid regular
- fix -f option in CORE tests, add optimal mode -o (omit unoptimized run)
- fix Fresnel flag for metals and other opaque surfaces in tracer
- fix check for diffuse/specular lighting, add pause-animation mode (-p)
- use integer indices for primary rays update (makes it SIMD-width agnostic)
- clean up STORE_SIMD macro in rendering backend
- add SIMD flag to replace VMX targets with VSX (on)
- add signed BASE ops to combined-arithmetic-jump (arithmetic shift right)
- add setting-flags BASE arithmetic shift right
- make setting-flags BASE ops orthogonal to size/type (cmd**Z**)
- add -mips64r6 compiler option to makefiles on MIPS
- optimize 64-bit SIMD shifts on POWER9, clean up mkj** formatting
- improve ARM/x86 compatibility in SIMD shifts
- add SIMD integer multiply instruction (for 32/64-bit elements)
- update copyright year to 2019
- fix SIMD backend struct load-level in debug mode (backported down)
- fix 32-bit BASE compare-to-mem on 64-bit POWER
- fix usage of non-persistent temp-register on POWER
- update build instructions and makefile notes
- add notes about QEMU 3.1.0 for SVE emulation
- add SIMD flag to replace VMX targets with VSX (off)
- fix and clean up SIMD target selection in headers
- fix/add comments for SIMD/BASE shift count value
- adjust build instructions for older HW compatibility
- adjust Win64 release build script for lower core-count
v0.7.0: QuadRay engine, code name "GIzmo", base for future GI path-tracing work
- makes use of UniSIMD assembler 1.0.0 "ENsed" for ARM-SVE, POWER9, new scheme
- renewed directory structure, move RooT demo files to a separate root folder
- add new fp-compatibility and feature tasks, rename TASKS file to ROADMAP
- add support for 30 SIMD register pairs (2x128) backend on POWER7/8
- add support for 30 SIMD registers (scalar+128+256) backend on Skylake-X
- drop standalone SSE2 target from x64, reuse SSE4 (v4) slot, add compat flag
- add support for 128-bit AVX1+FMA3 (v16) and AVX2+FMA3 (v32) targets for AMD
- compactify POWER7/8 targets into one slot, add new RT_SIMD_COMPAT_PW8 flag
- swap legacy PowerPC G4/POWER6 VMX (now v4) with POWER7/8 VSX1/2 (now v1)
- 64-bit POWER6 now matches 64-bit Nehalem target (both v4), 15x128/8x256-bit
- add support for POWER9 backend (v2) with immediate vector loads/stores
- move 128-bit 30 SIMD registers Skylake-X target from v1 to v2, match POWER9
- reserve 128-bit v1 and 256-bit v4 for 30 SIMD registers emulation on AVX1/2
- implement plain ARM-SVE backend (v4) for 256/512/1K4/2K8-bit vector lengths
- implement paired ARM-SVE backend (v1) for 512/1K4/2K8-bit SIMD target slots
- new scheme: RT_128=4+8, RT_256=1+2, RT_512=1+2, RT_1K4=1+2 are 15 registers
- new scheme: RT_128=1+2, RT_256=4+8, RT_512=4+8, RT_1K4=4+8 are 30 registers
- add elm*x_st instruction to detach scalar subset from vectors (via mem)
- add support for horizontal pairwise/reductive add/mul/min/max instructions
- implement antialiasing as a cycle of pairwise horizontal fp-adds, no macros
- add support for Gamma correction to rendering backend (approx with sqrt)
- add support for Fresnel reflectance on reflective/refractive surfaces
- fix ray normalization for refraction and Fresnel (was close to normal)
- add test scenes 17 and 18 for Fresnel (from RaVi/smallpt, work in progress)
- implement fixed-point for specular pow, plotters for Gamma/Fresnel (-z)
- implement alternating 2x antialiasing, plotters for antialiasing samples
- introduce rt_Platform class as a common instance for multiple rt_Scene
- move tracer backend's SIMD target tracking to rt_Platform
- move thread-pool from rt_Scene to rt_Platform, add support for core-count
- implement multi-group affinity for Windows threading (more than 64 threads)
- add support for fullscreen mode on Windows (-w 0) as a modal window
- merge Win32/64 files into one (RooT_winxx.cpp), drop Win64 pthread support
- add alternative key-mapping to RooT demo (all digit keys plus few letters)
- patch system allocators to compile on macOS, widen OS support in makefiles
- clean up SIMD tests and engine's CORE to support PIE (also macOS)
- add pthread barriers implementation (RooT_linux.cpp) for macOS (no XShm)
- separate 64-bit Linux from multilib build scripts, add for macOS
- add VMX-compatible scalar SIMD subset on PPC G4 and POWER family of CPUs
- add MSA/scalar compatibility on big-endian MIPS, support for fp32 11-bit DP
- rename sections in target-specific headers to BASE, SIMD, ELEM (for scalar)
- optimize long displacements for BASE, SIMD, ELEM on RISCs where applicable
- implement proper SIMD-scaling for displacement types (as sliding in rtbase)
- move common internal x87 FPU sections to BASE headers on x86
- dedicate rtconf header for configurable instruction subsets on all targets
- allow target-specific headers to redefine common instructions from rtbase
- improve SIMD target reporting in tests, add -c n option to reduce test time
- update notes for MIPS cross-compiler location, add -mnan=2008 to makefiles
- update notes for AArch64 Linux, QEMU 3.0.0, Intel SDE, add ARM IE reference
- add test for SIMD mask-move (mmv), run level 27
- add test for 8/15/30 BASE/SIMD registers, run level 28
- warning-free building with GCC/Clang and MSVC++
- fix BASE shifts with zero immediate arg on legacy ARMv7 (backported down)
- convert all text files with unix2dos
- always reserve maximum space for SIMD register file
- save/restore temp predicate register on AVX512
- fix SIMD registers save/restore for 15x128x2 on POWER7
- fix temporary FPRs save/restore on POWER
- fix scalar SIMD min/max on POWER7
- fix BASE compare immediate encodings on POWER
- fix location for 128/256-bit common SIMD instructions
- fix for scalar SIMD alignment on ARMv7, POWER8
- fix compilation in C++11 mode with RT_DEBUG=2, non-debug build on Windows
- add comment for NaNs handling in floating point piepline
- clarify comments about SIMD fp round instructions
- fix comment for SIMD shifts with count in memory
- add comment for scalar/vector compatibility
- undef SIMD flags of the same width in corresponding tracer backend files
- fix stride for frame copying in Windows
- fix stride for BMP file format
v0.6.7: Backend improvements, new command-line options, extra SIMD backends
- make new demo scene 3 (former test scene 14) default, add camera animator
- add two new cameras (on F3) to demo scene 3 to stress test CPU (benchmark)
- place SIMD-size-factor to top-left corner, tile-width-factor to top-right
- add SIMD-target and FPS-average reporting to console/terminal on switches
- implement standard non-XShm fallback on Linux when XShm fails to allocate
- implement pthreads for Win64 builds using TDM64-GCC, enable 120 threads
- build RooT demo with -pthread option on all Linux and Win64 targets
- add -n n option to override SIMD-native-size (128, 256, 512, or 1,2,4), F8
- add -k n option to override SIMD-size-factor (1, 2, 4 where available), F6
- add -s n option to override SIMD-sub-variant (1,2,4,8 where available), F7
- add -q n option to override SIMD-total-quads (1,2,4,8 where available)
- add -t n option to override thread-pool size (1, .. 1000, default 120)
- add -w n option to override window-rect size (original-dimensions * n)
- add -w 0 option to enable window-less mode (fullscreen) on Linux only
- add -x n option to override original-x-resolution (applies to -w above)
- add -y n option to override original-y-resolution (applies to -w above)
- add -d n option to select default demo scene (1, 2, 3), switch on F11
- add -c n option to select default camera (1, 2, 3, ..), switch on F3
- add -b n option to choose time (ms) when testing/animation begins
- add -e n option to choose time (ms) when testing/animation ends, exit
- add -f n option to specify the number of consecutive frames to render
- add -g n option to specify the time-delta (ms) for consecutive frames
- add -i n option to save resulting image at the end of each run, on F4 also
- add -r n option to override fps-logging update rate, n is interval (ms)
- add -l option to turn off fps-logging updates to console/terminal, F5
- add -h option to toggle screen numbers drawing, core_test/on RooT/off, F12
- add -u n option for serial update/render 1-3/4, 5/6 update/render off, F9
- add -o option to enable offscreen mode in RooT demo, render-to-memory, F10
- add -a option to enable antialiasing, 4x for fp32 / 2x for fp64 pipes, F2
- adjust framebuffer's stride to SIMD width for arbitrary frame dimensions
- adjust tile width to maximal SIMD width from default size of 8x8 pixels
- fix scanline check in tracer for multi-threading (when y_res < thnum)
- clean up tracer files to delegate SIMD target selection to rtarch header
- switch to UniSIMD-assembler version 0.9.1 for additional SIMD backends
- expose 128/256-bit SIMD subsets (cmd[i/j/l]*, cmd[c/d/f]*) simultaneously
- add 3-operand SIMD instructions to all targets, emulate where not present
- implement basic scalar SIMD support (arithmetic + compare-to-mask-elem)
- implement additional paired/quaded 8-register SIMD backends on x86_64
- add 8-register makefile flags RT_256_R8, RT_512_R8, RT_1K4_R8, RT_2K8_R8
- original 15-register makefile flags RT_128, RT_256, RT_512 remain
- add new makefile flag RT_1K4 for 15-register code-bases on paired AVX-512
- expose 30 registers as an extension to common baseline of 15 where present
- each major architecture has at least one SIMD target with 30 registers
- add new RT_SIMD selector flag to remap vector-length-agnostic subsets
- add new RT_REGS selector flag to choose targets within given RT_SIMD width
- rename SIMD target headers to reflect size-factor/sub-variant, move legacy
- add new internal flags RT_128X*, RT_256X*, RT_512X* to match SIMD headers
- new internal flags keep SIMD sub-variant value in format for native width
- implement SIMD flags compatibility layer in rtzero to map makefile flags
- rtarch main header selects appropriate BASE/SIMD target from flags above
- implement SIMD target format converters in rtbase for runtime selection
- change SIMD target reporting to native-size x size-factor v version format
- reserve _RX slots in SIMD target mask for predicated backends (30+8 regs)
- clean up (drop) legacy SSE(1) support from x32 headers/makefiles
- move BASE sub-target selection to rtarch main header (ARM, x86)
- add notes for AArch64 Linux on Raspberry Pi 3 to INSTALL file
- add new TASKS file with description for future tasks
- enforce full ARMv7 instruction set (32-bit words) in makefiles
- fix LLVM's condition evaluation sign on all targets, define M -/+
- fix SIMD registers save/restore for 128-bit AVX targets (backported down)
- fix buffer allocation in SIMD tests (for 64-bit elems)
- fix stack alignment (now 16 bytes) on ARMv8/AArch64 (hardware) targets
- allow external override (from makefiles) for SIMD compatibility modes
- minor fixes in rtarch, accelerate release builds on multi-core machines
v0.6.6: Backend improvements, 256-bit SIMD on RISCs, basic AVX-512 support
- switch to UniSIMD-assembler version 0.9.0 for extended SIMD targets
- adjust root rt_SIMD_INFO struct to contain both 32-bit and 64-bit constants
- add new sign-mask and full-mask general purpose constants to rt_SIMD_INFO
- expose 32/64-bit SIMD-element-size subsets (cmdo*, cmdq*) simultaneously
- element size in existing cmdp* subset remains configurable with RT_ELEMENT
- all three SIMD subsets (cmdo*, cmdp*, cmdq*) are still SIMD-width-agnostic
- expose fixed 64-bit BASE subset cmdz* for 64-bit targets only
- existing address-size cmdx*, element-size cmdy* and 32-bit cmdw* remain
- add BASE move instructions for 64-bit immediates as pairs of 32-bit types
- add new rotate-right and inverse-logic BASE instructions (ror, ann, orn)
- add new BMI1/BMI2 implementations for existing BASE instructions on x86
- implement non-portable x87 ISA subset for x86 targets internally
- implement fused-multiply-accumulate (fma/fms) on all SIMD targets
- add new mask-move SIMD instructions to common SIMD ISA (was x86 only)
- use mask-move (mmv) for STORE_SIMD in tracer on all targets (was x86 only)
- add new fp-negate and inverse-logic SIMD instructions (neg, orn, not)
- add new variable SIMD shifts with per-element count to all targets
- implement 256-bit SIMD support (2x128-bit, 15 regs) on modern RISC targets
- implement 512-bit SIMD support (4x128-bit, 15 regs) on modern POWER targets
- implement 512-bit SIMD support (1x512-bit, 16 regs) on future x86 targets
- AVX1/AVX2 256-bit SIMD for x86 (1x256-bit, 16 regs) remains supported
- 256-bit SIMD with 15 regs becomes new common baseline for modern hardware
- improve test coverage for BASE and SIMD load-op instructions
- add tests for new rotate, logic, shifts, fma/fms instructions, run level 24
- add command line options to CORE test for SIMD width/variant override
- add rtzero header file to clean up assembler definitions after use
- rename instruction parameters to better reflect their use as source/dest
- add formulas for all BASE and SIMD instructions for better clarity
- reserve the whole alphabet for future BASE and SIMD instruction subsets
- add new SIMD compatibility flags for 128-bit AVX1/2, FMA/FMS/FMR, XMM regs
- add wrappers for 64-bit literals to better support legacy 32-bit compilers
- adjust two conditional jumps in tracer to accommodate 512-bit POWER VSX
- fix label_ld/label_st range on ARMv7/AArch64 to be on par with other targets
- fix discrepancy in VMX/VSX vector-loads on POWER (from here backported down)
- fix AVX-version of mmvpx_ld from zeroing to merging on x86
- fix compilation on legacy Visual C++ 6.0 (Windows XP)
v0.6.5: Backend improvements, full 64-bit fp/int SIMD compute elements
- adjust rendering backend/structures to work with 64-bit SIMD elements
- switch to UniSIMD-assembler version 0.8.1 for 64-bit SIMD support
- add element-sized BASE ISA subset to fixed-32-bit and address-sized subsets
- new instruction mnemonics introduced for element-sized BASE subset (cmdy*)
- add new rtarch headers to house element-sized SIMD subset for 64-bit targets
- support for 64-bit SIMD elements currently requires 64-bit addresses as well
- enable full-precision SIMD rcpps/rsqps and rceps/rseps instructions
- add new offset corrections for endianness related to element-sized subset
- add new SIMD width short names for fixed and element-sized SIMD fields
- add new custom-sized integer types (address, element) with printf mods
- make current adjustable fp types follow SIMD element size (RT_ELEMENT)
- adjust math macros and definitions to support double-precision arithmetic
- add build/clean scripts, update makefiles with extra targets, MIPS notes
- remove unnecessary limitation on SIMD masks (add AVX-512/ARM-SVE notes)
- distinguish SIMD NEONv1/v2 vanilla ARM builds (cortex-a8/cortex-a15)
- distinguish SIMD v2/v4 64-bit POWER builds (POWER7+VSX/POWER8+VSX2)
- fix non-setting-flags instructions to not interfere with cmp on MIPS, POWER
- fix logging to files on 64-bit Linux systems (from here backported down)
- fix vertical-strips artifacts on SIMD target switch (F8, F11, F8)
- fix full-precision IEEE-compat divps_ld on ARMv7 targets
v0.6.4: Backend improvements, full 64-bit addressing for BASE and SIMD
- adjust rendering backend/structures to work with 64-bit addresses
- switch to UniSIMD-assembler version 0.8.0 for full 64-bit addressing
- double original 32-bit BASE ISA to fixed-32-bit and address-sized subsets
- original instruction mnemonics follow in-heap/code-segment address size
- new instruction mnemonics introduced for fixed-32-bit subset (cmdw*)
- setting-flags instruction mnemonics remapped from (cmdz*) to (cmd*z)
- add combined-arithmetic-jump wrapper for better API stability/efficiency
- add new rtarch headers to house address-sized subset for 64-bit targets
- move original (now address-sized) mappings to rtbase for 32-bit targets
- add canonical forms for BASE div/rem and shifts (not always efficient)
- add setting-flags versions for BASE orr/xor and unsigned shifts
- remap one-operand instructions from cmd**_rr/mm to rx/mx and xr/xm
- move stack instructions to their own section at the end of rtarch headers
- move sregs instructions to their own section at the end of rtarch headers
- add config flags for full-precision SIMD rcpps/rsqps instructions
- add master flags for SIMD compatibility modes to rtarch main header
- add new offset corrections for endianness (from here backported down)
- add Win64 support via TDM64-GCC toolchain (tdm64-gcc-5.1.0-2.exe)
- add NULL-ptr checks to custom allocators (Linux/mmap, Win64/VirtualAlloc)
- fix setting-flags instructions for 64-bit POWER running 32-bit ISA
- fix non-setting-flags instructions (neg*x) to not set flags on MIPS
v0.6.3: Backend improvements, 64/32-bit hybrid mode for native 64-bit ABI
- use fixed-sized and adjustable integer types across engine files
- switch to UniSIMD-assembler version 0.7.1 for 64/32-bit hybrid mode
- add a64 (AArch64 native ABI) and x64 (x86_64 native ABI) targets/makefiles
- add m64 (MIPS64 native ABI) and p64 (Power64 native ABI) targets/makefiles
- most of the current ISA remains 32-bit for BASE and SIMD with few exceptions
- adjust backend structures to support 64-bit pointer types in select places
- move sys_alloc/sys_free to platform-specific files and sections
- implement custom allocators (mmap) to limit address range to 32-bit (Linux)
- limit address range to 2GB boundary as MIPS64 sign-extends 32-bit mem-loads
- use mutex to protect system allocators from race conditions (Windows XP)
- treat code labels as 64-bit in label_ld/st and jmpxx_mm instructions
- implement 64-bit versions of stack_sa/la instructions on MIPS and POWER
- some addresses become 64-bit others remain 32-bit to fit integer SIMD path
- relocate bound textures (static data) in heap for 32-bit addresses (MIPS64)
- use existing (in-heap) SIMD info struct for switch0 as stacks become 64-bit
- fix variable SIMD shifts to support little-endian on POWER targets
- fix ASM blocks to only use SIMD registers within VRSAVE segment on POWER
- remove ASM block's zeroing of r15 as unnecessary on x32/x64 targets
- reformat/rework ASM blocks to better respect internal register mapping
- explicitly save/load SIMD registers in ASM blocks across all targets
- drop ASM clobber lists for lack of consistency across targets/SIMD-widths
- fix clang's ASM block l-value errors and other warnings, official support
- adjust natural alignment (in-heap) based on pointer size, 4/8-byte boundary
- wrap quadric debug fields in backend's SIMD info struct with RT_DEBUG flag
- add build instructions to makefiles for Ubuntu 16.04 LTS 64-bit Live CD
- fix divps_ld instruction's encoding on ARM (fixes CORE test 15 with -a)
- use IEEE-compatible div/sqr on legacy ARM and POWER (CORE test 14 with -a)
v0.6.2: Backend improvements, 32-bit MIPS and POWER ISAs + big-endian support
- implement minor adjustments in the code for big-endian support
- switch to UniSIMD-assembler version 0.7 for additional CPU architectures
- add a32 (AArch64:ILP32 ABI) and x32 (x86_64:mx32 ABI) targets/makefiles
- add m32 (MIPS32r5/r6 + MSA) and p32 (POWER + VMX/VSX) targets/makefiles
- add yet another SIMD variant (v4) for x86/SSE4.1 and ARMv8/AArch32
- separate ARMv7/ASIMDv2 (v2) and ARMv8/AArch32 (v4) SIMD variants on ARM
- add ARM builds for Raspberry Pi 2 and 3 in addition to Nokia N900
- use static linking in CORE and SIMD tests for QEMU emulation
- use mmv instruction (blendvps, vmaskmov) on x86/x32 STORE_SIMD macro
- use combined-compare-jumps in tracer for better efficiency (MIPS, POWER)
- remove limitation for BASE instructions to only accept DP offsets
- add new immediate/displacement types, add comment that they are unsigned
- add comments throughout rtarch about instructions' set-flags behavior
- implement full-range 32-bit integer divide on ARMv7 (v1) as 64-bit fp-div
- add widening versions of integer multiply instructions to rtarch definitions
- add remainder wrappers for integer divide instructions to rtarch definitions
- add IEEE-compatible versions of fp div & sqr for ARMv7 and POWER targets
- add "residual correction" to non-IEEE fp div on ARMv7 and POWER targets
- fix "noisy walls" artifacts in CORE test 11 on ARMv7 and POWER targets
- add SIMD tests for fp-to-int round and int-div remainder, run level 18
v0.6.1: Backend improvements, unified quadric solver + runtime SIMD targets
- implement unified quadric solver in rendering backend
- switch to UniSIMD-assembler version 0.6 for additional SIMD targets
- turn off bbox sorting and removal opts for poor scalability
- use built-in scalers in solvers instead of transform matrix
- add new code for scalers/trnode handling in object hierarchy
- add new aliencube and frametable objects
- implement array's bounds highlight for debugging (primary rays only)
- implement scalable (elliptic) bvnodes for better space efficiency
- implement SIMD targets for AVX1, AVX2 and SSE1 in addition to SSE2
- implement SIMD target runtime selection based on CPUID (x86 only)
- switch SIMD target variant and width on key press (F7) and (F8)
- add demo scene 02, implement runtime selection on key press (F11)
- hide all numbers on the screen on key press (F12)
- add CORE test for array's update bounds logic and scalers, run level 16
- add SIMD test for shifts by runtime value & BASE register, run level 16
- reduce code duplication in platform files
- add material's entry points to backend structs (from here backported down)
- implement roots sorting for determinant around zero
- fix quadric roots overdraw on the edges and for mixed quads
- use direct diff in solvers for consistency with the rest of the backend
- propagate local point adjustments and TMASK to secondary rays
- check quadric roots overdraw within XMASK
- implement proper fix for conic singularity
- improve camera's reporting precision for debugging
- improve quadric's debug info formatting
- replace non-standard malloc.h with stdlib.h for malloc/free
- use custom new/delete for scene's internal classes
- allocate rtimag's file objects on the stack, clean up logging
- distinguish separate RT_DEBUG levels (0,1,2)
- enable multi-threading in debug mode, turn off inverse matrix check
v0.6.0: Backend improvements, full set of quadrics + numerical stability
- implement solvers for paracylinder, hypercylinder, hyperparaboloid
- rework bbox adjust to fully support new quadrics
- experiment with linear approximation (refinement) of roots (dropped)
- use Vieta's formulas for better numerical stability in solvers
- renormalize normals to avoid visual artifacts in edge cases
- speed up normals with reciprocal square root
- optimize overdraw of inner and outer roots for all quadrics
- configure runtime optimization flags per scene, adjust scn_test08
- add CORE test for new quadrics, run level 15
- add SIMD test for signed shift, run level 15
- use unfiltered hierarchical list to skip search while inserting element
- handle element removal in ssort, lsort, insert routines
- enable basic hidden-surfaces-removal in bbox_sort
- enable removal while building lists for surface object
- allow array nodes to remove other nodes, extend cases
- adjust spherical early-out to not prevent nodes removal
- enable array's contents removal by bbox for surface's and camera's lists
- make material types mandatory in their names (from here backported down)
- adjust CORE test to ignore isolated pixels, add pixhunt mode
- reduce rounding mode's scope from global to texturing only
- fix visual artifacts in specular highlights
- rename lighting and bvnode feature flags in rendering backend
- add feature flags for colored lights, ambient and diffuse to backend
- fix compilation on 64-bit Linux systems, add 64-bit types (for time args)
- increase divps, sqrps precision for ARM NEON (1 extra step, 2 in total)
- fix divxn in ARM to use signed integers (as the name suggests)
- add margins control to rtgeom, rename base routines, adjust usage in code
- add clip relations handling to rtgeom, new runtime optimization flags
- add future extensions to rtarch, update comments for existing targets
v0.5.9: Performance optimizations, hierarchical lists sorting
- add command line options to CORE and SIMD test frameworks
- add legends with description to all source and header files
- add bounding box sorting routine to rtgeom, add macros for VEC3
- make surface lists hierarchical, enable sorting to reduce overdraw
- redesign rtgeom API to work with nodes, drop dependency on object
- validate configuration values in the engine, fix stride
- adjust backend structs to support wider SIMD (8 and 16 tested)
- add image saving to rtimag, rename and rework routines
- internal engine's state logging (F1) and anti-aliasing (F2)
- cycling through camera list (F3) and screenshots (F4)
- move camera with W, S, A, D and rotate with arrow keys
- separate bounding boxes for trnode and bvnode of the same array
- support bvnode hierarchy of arbitrary depth, allow re-bounding
- rede#date logic in the engine for fine-grained control
- split surface's matrix update into multi-threaded phase
- add imaging mode to CORE test, adjust scn_test12 for better coverage
- fix trnode/bvnode handling in rendering backend, enable bvnodes for shadows
- move texture conversion to CORE test, fix undefined behaviour in scn_test13
- enable hierarchical traversal for surface and shadow lists
- add CORE test for complex hierarchy, run level 14
- add SIMD test for mask helper macro, run level 14
v0.5.8: Performance optimizations, additional CORE and SIMD tests
- add exception handling for worker threads, check for out-of-memory
- refine exception handling, per-frame memory estimates
- extended CORE test coverage, run level 10 to 12
- eliminate division in rtgeom routines to avoid corner cases
- fix bbox adjust optimization, add pixel equality macro to CORE test
- add border margin checks to more rtgeom routines, code clean up
- extended SIMD test coverage, run level 10 to 12
- CORE and SIMD test framework clean up, makefiles adjustments
- add cube root instruction to SIMD test framework
- extended SIMD test coverage, run level 13, move rtarch files to config
- extended CORE test coverage, run level 13, add credits for borrowed code
- adjust bbox faces to have counter-clockwise verts order for front side
v0.5.7: Performance optimizations, CORE test framework
- add runtime flags for respective optimizations
- adjust classes access levels according to actual use
- add new flags for bbox/cbox adjust optimization, multi-threading
- unify runtime optimization flags with conditional compilation
- add CORE test framework, run levels 1 and 2, scene data locking
- fix matrix splitting for scaling fastpath, add new flag
- CORE test framework, run levels 3 to 7
- fix missing condition for boundless surfaces
- CORE test framework, run levels 8 and 9
- fix condition for bbox shadow optimization, adjust run level 6
- handle corner cases properly in rtgeom routines, simplify logic
- rename rtload files to rtimag for image library, update build instructions
v0.5.6: Performance optimizations, bounding volume for array of objects
- calculate bounding sphere for array of objects
- add support for bounding volume arrays (bvnodes) to rendering backend
- add bvnode support to scene format, object classes
- insert bvnodes into surface, shadow lists
- add bvnode properties to scene data
- fix visual artifacts for Linux platform under heavy load (browser + flash)
v0.5.5: Performance optimizations, multi-phase threading and other refinements
- add support for multi-phase (with sync points) threaded update and render
- split part of hierarchical update into a separate multi-threaded phase
- refine bounding and clipping boxes based on custom clippers
- only update changing parts of the object hierarchy and surface data
- move inverse matrix calculation from hierarchical to multi-threaded phase
v0.5.4: Performance optimizations, custom per-side lists
- add generic surface coeffs to object classes
- add clipped surface side visibility routines for given point
- build separate light lists for each side of the surface
- add clipped surface side visibility routines for given bbox
- build separate shadow lists for each side of the surface
- build separate surface lists for each side of the surface
v0.5.3: Performance optimizations, custom shadow lists
- add bounding sphere to object classes
- build custom per-surface per-light shadow lists
- refine shadows optimization with bounding boxes
v0.5.2: Performance optimizations, transform caching and scaling fastpath
- add transform caching (trnodes) to object classes
- insert trnodes into clipper, surface and tile lists
- implement transform caching in rendering backend, optimize for clippers
- refactor transform code in rendering backend for scaling fastpath
- split axis mapping from scale-only transform matrix
- implement scaling fastpath in rendering backend
- add Node class to object classes, rebase Array and Surface
- use trnode's matrix for normals in rendering backend, simplify logic
- add more comments, refine state logging, label names, classes access levels
v0.5.1: Performance optimizations, tiled rendering
- adjust bounding and clipping boxes according to surface shape
- add bounding box geometry, update vertices according to transform
- add bounding box projection routines for tiled rendering
- add tilebuffer support to backend structures, scene manager
- implement tilebuffer in rendering backend, add support for tiles highlight
- clean up state logging macros, refine per-frame allocs estimates
v0.5.0: Performance optimizations, multi-threading support
- add multi-threading framework to scene manager, rendering backend
- add multi-threading implementation for Linux, Win32 platforms
- fix crashes in fullscreen mode for Linux platform, clean up FPS logging
- add support for temporary per-frame heap allocs
- add logging for scene state, support for log redirect
- clean up names for makefile flags, backend flags
v0.4.1: Geometry transform, custom clipping with accumulator
- add clipping accum to backend structures, scene format, object classes
- add sub-array indexing to scene format, object classes
- implement clipping accum in rendering backend
- fix texture load macro, add texture crate01, texture embedding flag
- add box object to scene data, material for crate01, red01, adjust ambient
- add scene namespaces to resolve name collisions
- add clipping accum with boxes, quadric shapes to scene data
v0.4.0: Geometry transform, free-angle rotation and axis scaling
- add transform matrix to backend structures, object classes
- add inverse matrix computation to geometry utils
- implement transform in rendering backend
- add transform to scene data, adjust formatting
v0.3.6: Lighting & effects, full screen anti-aliasing
- add anti-aliasing mode to backend structures, scene manager
- implement anti-aliasing in rendering backend
- add anti-aliasing selection to demo app, refactor key handling
- fix ray vertical positioning for non-aa mode
v0.3.5: Lighting & effects, compute attenuation
- add attenuation properties to backend structures, object classes
- implement attenuation in rendering backend
- add attenuation to scene data
v0.3.4: Lighting & effects, specular highlights for metals and non-metals
- add specular properties to backend structures, object classes
- implement specular highlights in rendering backend
- add specular highlights to scene data, adjust materials naming scheme
v0.3.3: Lighting & effects, compute refractions
- add refraction properties to backend structures, object classes
- implement refractions in rendering backend
- add refractions to scene data
v0.3.2: Lighting & effects, compute reflections
- add surface lists to scene manager
- add reflection properties to backend structures, object classes
- implement reflections in rendering backend
- add reflections to scene data
v0.3.1: Lighting & effects, hard shadows
- add shadows to scene manager
- implement shadows in rendering backend
v0.3.0: Lighting & effects, compute diffuse
- add lighting to backend structures, object classes
- implement lighting in rendering backend
- add lighting to scene data
v0.2.2: Quadric solvers, compute normals
- add normals computation to rendering backend
- add camera animator to scene data
v0.2.1: Quadric solvers, custom clipping by surface
- add custom clipping to backend structures, object classes
- implement custom clipping in rendering backend
- add custom clipping to scene data
v0.2.0: Quadric solvers, multiple of 90 degree rotation
- add quadric surfaces to backend structures, object classes
- implement quadric surfaces in rendering backend
- add cylinders and spheres to scene data
v0.1.2: Plane solver, basic texturing
- add basic texturing to backend structures, object classes
- implement basic texturing in rendering backend
- add simple tile texture to scene data
- implement BMP texture loader and texture conversion for embedding
v0.1.1: Plane solver, axis clipping
- add axis clipping to backend structures, object classes
- implement axis clipping in rendering backend
- adjust scene data for clipped plane to be visible
v0.1.0: Plane solver, first rays out!
- reorganize directory structure for the engine
- add backend structures, object classes, scene manager
- add first scene data, main app window and scene rendering
v0.0.5: Unified SIMD assembler, API freeze for the engine
- instruction naming scheme finalized
- change ARM instructions to set flags
- added framework for internal constants (used by reciprocals)
- fix div in ARM to use signed integers
- increase div, sqr precision for MPE
v0.0.4: SIMD test framework, macro assembler overhaul
- macro expansion reworked for better compiler compatibility
- immediate/displacement parameters handling redesigned
- added reciprocal support for SSE, MPE support refined
v0.0.3: SIMD test framework, run level 9
- tests for integer mul, div, jmp instructions
- SIMD tests for integer add, shl, shr instructions
- SIMD tests for cvt, sqr, rsq instructions
v0.0.2: SIMD test framework, run level 5
- SIMD tests for mul, div, cmp instructions
v0.0.1: SIMD test framework, run level 1
- SIMD tests for add, sub instructions
v0.0.0: Empty project
- initial file set and directory structure