Skip to content

Newlib retargetable lock init fails on qemu_xtensa #38234

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
stephanosio opened this issue Sep 2, 2021 · 4 comments · Fixed by #38261
Closed

Newlib retargetable lock init fails on qemu_xtensa #38234

stephanosio opened this issue Sep 2, 2021 · 4 comments · Fixed by #38261
Assignees
Labels
area: newlib Newlib C Standard Library area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug Regression Something, which was working, does not anymore
Milestone

Comments

@stephanosio
Copy link
Member

stephanosio commented Sep 2, 2021

Describe the bug
__retarget_lock_init fails on the qemu_xtensa board.

To Reproduce
Build tests/lib/newlib/thread_safety and run it on qemu_xtensa.

Expected behavior
The malloc call in __retarget_lock_init succeeds.

Impact
Any components using the newlib are unusable on Xtensa.

Logs and console output

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
Running test suite newlib_thread_safety_locks
===================================================================
START - test_retargetable_lock_sem
ASSERTION FAIL [*lock != ((void *)0)] @ WEST_TOPDIR/zephyr/lib/libc/newlib/libc-hooks.c:362
        non-recursive lock allocation failed
@ WEST_TOPDIR/zephyr/lib/os/assert.c:45
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x60008db0 (test_retargetable_lock_sem)
E: Halting system

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Toolchain: Zephyr SDK 0.13.0
  • Commit SHA: a77e756

Additional context
For some unknown reason, a77e756 causes the newlib malloc to fail and return NULL on Xtensa.

After full twister run on the tests/lib/newlib/thread_safety, only qemu_xtensa fails and the rest of the platforms succeed.

@stephanosio stephanosio added the bug The issue is a bug, or the PR is fixing a bug label Sep 2, 2021
@stephanosio stephanosio self-assigned this Sep 2, 2021
@stephanosio stephanosio added this to the v2.7.0 milestone Sep 2, 2021
@stephanosio stephanosio added priority: high High impact/importance bug area: newlib Newlib C Standard Library area: Xtensa Xtensa Architecture Regression Something, which was working, does not anymore labels Sep 2, 2021
@stephanosio
Copy link
Member Author

cc @dcpleung

@stephanosio
Copy link
Member Author

stephanosio commented Sep 2, 2021

This is very bizarre. The failure is observed only when the size of sram0_seg is 36816 (36816+4096*n) bytes. SEE #38234 (comment)

sram0_seg:       36816 B        64 MB      0.05%

This goes away when the size of sram0_seg is below or above 36816 bytes ...

@dcpleung any ideas?

@galak
Copy link
Collaborator

galak commented Sep 2, 2021

wonder if this has something to do with bss size change?

@stephanosio
Copy link
Member Author

So I did some investigation and it happens only for the first malloc call whose first sbrk call returns a pointer that falls on a 0x1000 page boundary (at least, what newlib's malloc thinks is page boundary -- yes, we should fix these).

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
sbrk: 0x60009fd0, 48
sbrk: 0x6000a000, 0
0, (nil)
sbrk: 0x6000a000, 4096
1, 0x60009fe0
2, 0x6000a000
3, 0x6000a020
...

This only happens on Xtensa -- I have reproduced a similar case on other archs, but they all work as expected:

ARM (qemu_cortex_m3)

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
sbrk: 0x20001fd0, 48
sbrk: 0x20002000, 0
0, 0x20001fd8
sbrk: 0x20002000, 4096
1, 0x20001ff8
2, 0x20002018
3, 0x20002038
...

SPARC (qemu_leon3)

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
sbrk: 0x4000efd0, 48
sbrk: 0x4000f000, 0
0, 0x4000efd8
sbrk: 0x4000f000, 4096
1, 0x4000eff8
2, 0x4000f018
3, 0x4000f038
...

It looks like a newlib bug. What is special about Xtensa, compared to other archs, is that its MALLOC_ALIGNMENT is 16 rather than 8:
https://github.com/zephyrproject-rtos/newlib-cygwin/blob/4f5997d3c0f9011135e9627dad700c9d64be4a4b/newlib/libc/stdlib/mallocr.c#L1399-L1401
https://github.com/zephyrproject-rtos/newlib-cygwin/blob/b1fe4401fdbd0860e0b91227219b15d2e0142b78/newlib/libc/include/sys/config.h#L198

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area: newlib Newlib C Standard Library area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug Regression Something, which was working, does not anymore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants