Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Roadmap / Contributing #1

Open
61 of 76 tasks
Dolu1990 opened this issue Nov 14, 2023 · 12 comments
Open
61 of 76 tasks

Roadmap / Contributing #1

Dolu1990 opened this issue Nov 14, 2023 · 12 comments

Comments

@Dolu1990
Copy link
Member

Dolu1990 commented Nov 14, 2023

If you are interrested in contributing to the project, please let me know ^^

Here are the current work-items in completion order

  • Plugin API
  • Pipeline API
  • Basic frontend
  • Decoder
  • Multi issue dispatcher
  • Execute
  • Integer ALU / Shift
  • Basic testbench
  • Writeback
  • Bypass
  • Hazard
  • Branch
  • Multi issue
  • RVLS integration
  • Konata traces
  • Regression framework
  • Load / Store
  • Basic CSR support
  • mul / div / rem
  • Passing riscv-test
  • BTB + RAS predictor
  • Passing riscv-arch-test
  • Passing embench / coremark / dhrystone in dual issue
  • GShare predictor
  • Late ALU support
  • Exception support
  • Interrupt support
  • RV32 + RV64 support
  • Privilege / CSR implemented
  • Run FreeRTOS tests
  • MMU
  • RVA
  • Linux / buildroot / opensbi
  • Multi issue fetch's aligner
  • RVC
  • I$
  • D$
  • Memory coherency / multi-core
  • SoC
  • Floating point
  • Software prefetcher (zicbop -fprefetch-loop-arrays)
  • Hardware prefetcher (l1 I$ D$)
  • PMP
  • CFU
  • SMT support

FMax :

  • F2I writebackis too much

Decoupled todo :

  • Small iterative shift plugin
  • Small iterative mul/div plugin
  • Bit manip extention
  • crypto extention
  • Adding bridges from ibus dbus toward bus standards (Wishbone, Tilelink, AXI, AHB, Avalon, ...)
  • ...

Improvements

  • flush signal propagate from upstream to the fetch down into the aligner, instead of being only used late in the aligner ( very bad timings when RVC is used )
  • When the FPU is enable, the DispatchPlugin can't handle the late RS uses (val skip), as the assumption of a short pipeline isn't true anymore. Need fix
  • Check that out of pipe / fpu do not write X0
  • Check FPU access to io region
  • maybe lsuL1 store should freeze cpu when refill is already using the bank write interface
  • LsuL1 area increase too much with way count
  • The RVC decompressor could be optimized
  • Another pipeline would be needed to support serializing multiple uop from one instruction
  • The current AlignerPlugin can easily support 48 bits / 64 bits instruction
  • prefetcher can access the mmu while the mmu is being refilled (dangerous ?)
  • Coherency bridge to tilelink without fifo
  • LsuPlugin fence missing, especialy as now there is a write buffer, this is needed
  • LsuL1 last stage realy need to be stable and not sensitive to any concurent task progress
  • LsuL1 write buffer
  • LsuL1 doesn't need bank/way arbitration when there is no prefetch / coherency / multi threading
  • LsuCacheless bus cmd persistance need to be implemented
  • DispatchPlugin DONT_FLUSH_FROM_LANES is too pessimistic and reduce perf on lsuL1 (ex : branch -> lw)
  • RamSyncMwXor isn't good, as it use async read, need to implement RamSyncMwMux for sync regfile instead
  • BranchPlugin used in late ALU timings could be improved by precalculating the target PC (when the target PC doesn't come from registers)
  • BranchPlugin used in late ALU could reuse some of the early BranchPlugin results.
  • DispatchPlugin could buffer some instructions (ex 1 for dual issue), allowing to avoid half full dipatch (36% on dhrystone)
  • Add memory region and prevent their accesses via trap
  • Avoid TrapPlugin trap request directly halting fetch, as it create long combinatorial path

Redesign

  • Having the cache out of pip would allow to share it with probe requests to save area
@Dolu1990 Dolu1990 changed the title Roadmap Roadmap / Contributing Nov 21, 2023
@Dolu1990 Dolu1990 pinned this issue Nov 22, 2023
@andreasWallner
Copy link
Collaborator

just to avoid duplicate work: iterative shift is on it's way, just like iterative mul

@bitpasta
Copy link

bitpasta commented Jan 1, 2024

just to let you know - i'm working on a radix2 divider here:
https://github.com/bitpasta/VexiiRiscv/tree/dev_divradix2

@Dolu1990
Copy link
Member Author

lolololol

[Progress] Start VexiiRiscv test simulation with seed 2

OpenSBI v0.8
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name       : NaxRiscv
Platform Features   : timer,mfdeleg
Platform HART Count : 1
Boot HART ID        : 0
Boot HART ISA       : rv32imasu
BOOT HART Features  : scounteren,mcounteren
BOOT HART PMP Count : 0
Firmware Base       : 0x80000000
Firmware Size       : 64 KB
Runtime SBI Version : 0.2

MIDELEG : 0x00000222
MEDELEG : 0x0000b109
[    0.000000] Linux version 5.10.1 (rawrr@rawrr) (riscv32-buildroot-linux-gnu-gcc.br_real (Buildroot 2020.11-rc3-8-g9ef54b7d0b) 10.2.0, GNU ld (GNU Binutils) 2.34) #2 SMP Wed Jan 26 14:18:17 CET 2022
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] Initial ramdisk at: 0x(ptrval) (8388608 bytes)
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000080400000-0x000000008fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080400000-0x000000008fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080400000-0x000000008fffffff]
[    0.000000] SBI specification v0.2 detected
[    0.000000] SBI implementation ID=0x1 Version=0x8
[    0.000000] SBI v0.2 TIME extension detected
[    0.000000] SBI v0.2 IPI extension detected
[    0.000000] SBI v0.2 RFENCE extension detected
[    0.000000] SBI v0.2 HSM extension detected
[    0.000000] riscv: ISA extensions aim
[    0.000000] riscv: ELF capabilities aim
[    0.000000] percpu: Embedded 10 pages/cpu s18700 r0 d22260 u40960
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 64008
[    0.000000] Kernel command line: rootwait console=hvc0 earlycon=sbi root=/dev/ram0 init=/sbin/init
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 241280K/258048K available (4717K kernel code, 553K rwdata, 632K rodata, 166K init, 213K bss, 16768K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 32 local interrupts mapped
[    0.000000] random: get_random_bytes called from start_kernel+0x35c/0x4dc with crng_init=0
[    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[    0.000053] sched_clock: 64 bits at 100MHz, resolution 10ns, wraps every 4398046511100ns
[    0.001054] Console: colour dummy device 80x25
[    0.001405] printk: console [hvc0] enabled
[    0.001405] printk: console [hvc0] enabled
[    0.001949] printk: bootconsole [sbi0] disabled
[    0.001949] printk: bootconsole [sbi0] disabled
[    0.002617] Calibrating delay loop (skipped), value calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[    0.003334] pid_max: default: 32768 minimum: 301
[    0.004330] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.004931] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[    0.011492] rcu: Hierarchical SRCU implementation.
[    0.013344] smp: Bringing up secondary CPUs ...
[    0.013686] smp: Brought up 1 node, 1 CPU
[    0.015071] devtmpfs: initialized
[    0.018177] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.018841] futex hash table entries: 256 (order: 2, 16384 bytes, linear)
[    0.020091] NET: Registered protocol family 16
[    0.045047] clocksource: Switched to clocksource riscv_clocksource
[    0.076845] NET: Registered protocol family 2
[    0.079953] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[    0.080635] TCP established hash table entries: 2048 (order: 1, 8192 bytes, linear)
[    0.081450] TCP bind hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.082188] TCP: Hash tables configured (established 2048 bind 2048)
[    0.082859] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.083429] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[    0.085938] Unpacking initramfs...
[    0.226491] Initramfs unpacking failed: invalid magic at start of compressed archive
[    0.257373] Freeing initrd memory: 8192K
[    0.259617] workingset: timestamp_bits=30 max_order=16 bucket_order=0
[    0.298254] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
[    0.298755] io scheduler mq-deadline registered
[    0.299089] io scheduler kyber registered
[    0.496443] NET: Registered protocol family 10
[    0.499721] Segment Routing with IPv6
[    0.500313] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.503312] NET: Registered protocol family 17
[    0.505957] Freeing unused kernel memory: 164K
[    0.506284] Kernel memory protection not selected by kernel config.
[    0.506742] Run /init as init process
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Saving random seed: [    0.861828] random: dd: uninitialized urandom read (512 bytes read)
OK
Starting network: OK

Welcome to Buildroot
buildroot login: root
root
           _  _                     ___      _
    o O O | \| |   __ _    __ __   | _ \    (_)     ___     __     __ __
   o      | .` |  / _` |   \ \ /   |   /    | |    (_-<    / _|    \ V /
  TS__[O] |_|\_|  \__,_|   /_\_\   |_|_\   _|_|_   /__/_   \__|_   _\_/_
 {======|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|_|"""""|
./o--000'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'"`-0-0-'

root@buildroot:~# cat /proc/cpuinfo
cat /proc/cpuinfo
processor	: 0
hart		: 0
isa		: rv32ima
mmu		: sv32

root@buildroot:~# echo 1+2+3*4 | bc
echo 1+2+3*4 | bc
15
root@buildroot:~# micropython
micropython
MicroPython v1.13 on 2022-01-26; linux version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import math

>>> math.sin(math.pi/4)

0.7071067811865475
>>> from sys import exit

>>> exit()

root@buildroot:~# ls /
ls /
bin      init     linuxrc  opt      run      tmp
dev      lib      media    proc     sbin     usr
etc      lib32    mnt      root     sys      var
root@buildroot:~# 

@bitpasta
Copy link

Wow nice! Congratulations!

@djsftree
Copy link

Congrats indeed!

@Dolu1990
Copy link
Member Author

Dolu1990 commented Apr 8, 2024

Brawww

root@buildroot:~# cat /proc/cpuinfo 
processor	: 0
hart		: 0
isa		: rv32ima
mmu		: sv32

processor	: 1
hart		: 1
isa		: rv32ima
mmu		: sv32
#################
chocolate-doom -1 -timedemo demo1.lmp &
...
timed 5026 gametics in 2724 realtics (64.577827 fps)
#################
chocolate-doom -1 -timedemo demo1.lmp &
chocolate-doom -1 -timedemo demo1.lmp &
...
timed 5026 gametics in 2897 realtics (60.721436 fps)
timed 5026 gametics in 2918 realtics (60.284443 fps)

Brawwwwwww

For reference :

python3 -m litex_boards.targets.digilent_nexys_video --soc-json build/digilent_nexys_video/csr.json --cpu-type=vexiiriscv  --vexii-args="--allow-bypass-from=0 --debug-privileged --with-mul --with-div --div-ipc --with-rva --with-supervisor --performance-counters 0 --fetch-l1 --fetch-l1-ways=4 --lsu-l1 --lsu-l1-ways=4 --fetch-l1-mem-data-width-min=64 --lsu-l1-mem-data-width-min=64  --with-btb --with-ras --with-gshare --relaxed-branch --regfile-async --lsu-l1-refill-count 2 --lsu-l1-writeback-count 2 --with-lsu-bypass --decoders=2 --lanes=2 --lsu-l1-store-buffer-slots=4 --lsu-l1-store-buffer-ops=32" --cpu-count=2 --with-jtag-tap  --with-video-framebuffer --with-sdcard --with-ethernet --with-coherent-dma --l2-bytes=131072

With the chocolate doom patch from litex-hub/linux-on-litex-vexriscv#290, which avoid x11 layer.

@bitpasta
Copy link

bitpasta commented Apr 9, 2024

Nice! :) How does that compare to Vex and Nax?

@Dolu1990
Copy link
Member Author

Dolu1990 commented Apr 9, 2024

It compare quite well also, not all option are turned on.

Vex   : timed 5026 gametics in 4866 realtics (36.150841 fps) ( no l2)
Vexii : timed 5026 gametics in 2724 realtics (64.577827 fps) (128KB-l2)
Nax   : timed 5026 gametics in 2375 realtics (74.067368 fps) (128KB-l2)

Just tried now quad core and octo core works aswell.

@Dolu1990
Copy link
Member Author

root@nexys:~# cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux trixie/sid"
NAME="Debian GNU/Linux"
VERSION_CODENAME=trixie
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@nexys:~# cat /proc/cpuinfo 
processor	: 0
hart		: 0
isa		: rv64imafdc
mmu		: sv39
mvendorid	: 0x0
marchid		: 0x5
mimpid		: 0x0

processor	: 1
hart		: 1
isa		: rv64imafdc
mmu		: sv39
mvendorid	: 0x0
marchid		: 0x5
mimpid		: 0x0

root@nexys:~# 

@Dolu1990
Copy link
Member Author

Dolu1990 commented May 3, 2024

root@nexys:~# neofetch 
       _,met$$$$$gg.          root@nexys 
    ,g$$$$$$$$$$$$$$$P.       ---------- 
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux trixie/sid riscv64 
 ,$$P'              `$$$.     Kernel: 6.1.0-rc2+ 
',$$P       ,ggs.     `$$b:   Uptime: 17 hours, 47 mins 
`d$$'     ,$P"'   .    $$$    Packages: 1698 (dpkg) 
 $$P      d$'     ,    $$P    Shell: bash 5.2.15 
 $$:      $$.   -    ,d$$'    Resolution: 800x600 
 $$;      Y$b._   _,d$P'      WM: wmaker 
 Y$$.    `.`"Y$$$$P"'         Theme: Adwaita [GTK3] 
 `$$b      "-.__              Icons: Adwaita [GTK3] 
  `Y$$                        Terminal: /dev/pts/2 
   `Y$$.                      CPU: (2) 
     `$$b.                    Memory: 97MiB / 472MiB 
       `Y$$b.
          `"Y$b._                                     
              `"""                                    

@jahagirdar
Copy link

Is CFU on the roadmap?

@Dolu1990
Copy link
Member Author

Dolu1990 commented Sep 6, 2024

@jahagirdar Hi,

Yes it is, a few people need it :)
Also, it shouldn't be a dififcult one to add.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants