Skip to content

newlib: first malloc call may fail on Xtensa depending on image size #38258

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
stephanosio opened this issue Sep 2, 2021 · 20 comments · Fixed by #38733
Closed

newlib: first malloc call may fail on Xtensa depending on image size #38258

stephanosio opened this issue Sep 2, 2021 · 20 comments · Fixed by #38733
Assignees
Labels
area: newlib Newlib C Standard Library area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Milestone

Comments

@stephanosio
Copy link
Member

stephanosio commented Sep 2, 2021

Describe the bug
The first malloc call may fail on the Xtensa platforms (e.g. qemu_xtensa) if the requested block size is such that the malloc function invokes the sbrk function with the heap expansion size that causes the resulting heap address to fall on the 0x1000 boundary (what newlib thinks is "page boundary").

For example, if the initial heap address is 0x60009fd0 and malloc is invoked to allocate a 16-byte block, the malloc attempts to increase the heap size by 48 by calling sbrk, and this causes sbrk to return 0x6000a000, which is a 0x1000-aligned address.

To Reproduce

  1. Check out stephanosio@296cecd (it is critical that you check this out as is and do not cherry-pick, since doing so may change the image size).
  2. Run samples/hello_world on qemu_xtensa and observe the first malloc fail.

Expected behavior
malloc always succeeds.

Logs and console output

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
_sbrk: 0x60004f60, 160
_sbrk: 0x60005000, 0
000: (nil)
_sbrk: 0x60005000, 4096
001: 0x60004f70
002: 0x60005000
003: 0x60005090
004: 0x60005120
005: 0x600051b0
006: 0x60005240
007: 0x600052d0
008: 0x60005360
009: 0x600053f0
010: 0x60005480
011: 0x60005510
012: 0x600055a0
013: 0x60005630
014: 0x600056c0
015: 0x60005750
016: 0x600057e0
017: 0x60005870
018: 0x60005900
019: 0x60005990
020: 0x60005a20
021: 0x60005ab0
022: 0x60005b40
023: 0x60005bd0
024: 0x60005c60
025: 0x60005cf0
026: 0x60005d80
027: 0x60005e10
028: 0x60005ea0
029: 0x60005f30
_sbrk: 0x60006000, 4096
030: 0x60005fc0
031: 0x60006050
032: 0x600060e0
033: 0x60006170

Environment (please complete the following information):

Additional context
This only happens on Xtensa -- I have reproduced a similar case on other archs, but they all work as expected:

Xtensa (qemu_xtensa)

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
sbrk: 0x60009fd0, 48
sbrk: 0x6000a000, 0
0, (nil)
sbrk: 0x6000a000, 4096
1, 0x60009fe0
2, 0x6000a000
3, 0x6000a020
...

ARM (qemu_cortex_m3)

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
sbrk: 0x20001fd0, 48
sbrk: 0x20002000, 0
0, 0x20001fd8
sbrk: 0x20002000, 4096
1, 0x20001ff8
2, 0x20002018
3, 0x20002038
...

SPARC (qemu_leon3)

*** Booting Zephyr OS build zephyr-v2.6.0-2927-ga77e7566a8bb  ***
sbrk: 0x4000efd0, 48
sbrk: 0x4000f000, 0
0, 0x4000efd8
sbrk: 0x4000f000, 4096
1, 0x4000eff8
2, 0x4000f018
3, 0x4000f038
...

This is likely a newlib bug.

What is special about Xtensa, compared to other archs, is that its MALLOC_ALIGNMENT is 16 rather than 8, and this is causing the newlib malloc algorithm to produce a mishap not observed on the archs with the MALLOC_ALIGNMENT of 8.
https://github.com/zephyrproject-rtos/newlib-cygwin/blob/4f5997d3c0f9011135e9627dad700c9d64be4a4b/newlib/libc/stdlib/mallocr.c#L1399-L1401
https://github.com/zephyrproject-rtos/newlib-cygwin/blob/b1fe4401fdbd0860e0b91227219b15d2e0142b78/newlib/libc/include/sys/config.h#L198

@stephanosio stephanosio added bug The issue is a bug, or the PR is fixing a bug area: newlib Newlib C Standard Library labels Sep 2, 2021
@stephanosio
Copy link
Member Author

NOTE: This issue exists to keep track of this rather serious bug with the newlib on Xtensa. It needs to be fixed in the Zephyr SDK, and also in the upstream newlib if it has not already been patched in the latest releases.

stephanosio added a commit to stephanosio/zephyr that referenced this issue Sep 2, 2021
For the Xtensa platforms (e.g. qemu_xtensa), the first `malloc` call
may fail if the newlib heap base address is such that the first `sbrk`
call returns a 4096-byte aligned address.

Here we add a workaround for Xtensa that allocates and immediately
frees a 16-byte memory block during initialisation so that all
subsequent `malloc` calls succeed.

This commit needs to be reverted once the issue zephyrproject-rtos#38258 is fixed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
@stephanosio stephanosio added area: Xtensa Xtensa Architecture priority: low Low impact/importance bug labels Sep 2, 2021
stephanosio added a commit to stephanosio/zephyr that referenced this issue Sep 2, 2021
This commit adds test code to reproduce the Xtensa malloc failure issue
reported in zephyrproject-rtos#38258.

Run the `hello_world` sample in `qemu_xtensa` to see the first malloc
fail as described in the aforementioned issue.

Tested with Zephyr SDK 0.13.0, which uses the newlib 3.3.0.
galak pushed a commit that referenced this issue Sep 2, 2021
For the Xtensa platforms (e.g. qemu_xtensa), the first `malloc` call
may fail if the newlib heap base address is such that the first `sbrk`
call returns a 4096-byte aligned address.

Here we add a workaround for Xtensa that allocates and immediately
frees a 16-byte memory block during initialisation so that all
subsequent `malloc` calls succeed.

This commit needs to be reverted once the issue #38258 is fixed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
@galak galak added priority: high High impact/importance bug and removed priority: low Low impact/importance bug labels Sep 2, 2021
@galak
Copy link
Collaborator

galak commented Sep 2, 2021

@sylvioalves do you know if you guys newlib version has this issue or a fix for it?

@stephanosio
Copy link
Member Author

cc @dcpleung @andyross

@stephanosio
Copy link
Member Author

zephyrproject-rtos/newlib-cygwin@b67ccc2#diff-f056d0212a247fc9bf5137a683fa14d9f1f80f23f76eedb68cf81710207fdb27R198

To all Xtensa maintainers: how important is it that we enforce MALLOC_ALIGNMENT = 16? Can't we live with MALLOC_ALIGNMENT = 8 as do the rest of the archs we have?

My hunch is that this issue will go away if we just remove that line. Of course, the newlib bug persists, but at least Xtensa should no longer be affected by it.

@stephanosio
Copy link
Member Author

Also cc @jcmvbkbc, if you are around.

@jcmvbkbc
Copy link
Contributor

jcmvbkbc commented Sep 2, 2021

how important is it that we enforce MALLOC_ALIGNMENT = 16?

I believe that 16 is not required for the base xtensa architecture with or without standard FP coprocessors. But 16 or even higher alignment may be required in the presence of TIE extensions, so it should really not be hardcoded but taken from the xtensa core configuration.

@stephanosio
Copy link
Member Author

I believe that 16 is not required for the base xtensa architecture with or without standard FP coprocessors. But 16 or even higher alignment may be required in the presence of TIE extensions, so it should really not be hardcoded but taken from the xtensa core configuration.

@jcmvbkbc Thanks for clarifying.

@stephanosio
Copy link
Member Author

zephyrproject-rtos/newlib-cygwin@b67ccc2#diff-f056d0212a247fc9bf5137a683fa14d9f1f80f23f76eedb68cf81710207fdb27R198
I locally built and tested newlib with the above line removed so that MALLOC_ALIGNMENT defaults to 8 instead of 16. As I thought, this issue is no longer present and malloc works.

On qemu_xtensa with patched newlib:

*** Booting Zephyr OS build zephyr-v2.6.0-2928-g296cecd30943  ***
_sbrk: 0x60004fd0, 48
_sbrk: 0x60005000, 0
000: 0x60004fd8
_sbrk: 0x60005000, 4096
001: 0x60004ff8
002: 0x60005018
003: 0x60005038
004: 0x60005058
...

I will post a Zephyr SDK patch for this.

stephanosio added a commit to zephyrproject-rtos/newlib-cygwin that referenced this issue Sep 6, 2021
The current malloc implementation has a bug in which the first malloc
call may fail if MALLOC_ALIGNMENT is 16 and the requested block size is
such that malloc invokes sbrk with the heap expansion size that causes
the resulting heap address to fall on a page boundary.

sys/config.h was defining the minimum MALLOC_ALIGNMENT for the Xtensa
architecture as 16, and this caused the first malloc call to fail on
the Xtensa platforms in some rare cases that satisfy the conditions
described above (see zephyrproject-rtos/zephyr#38258).

Since the Xtensa architecture in itself does not require the malloc
alignment of 16, remove MALLOC_ALIGNMENT definition and use the default
alignment of 8 for all Xtensa platforms.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
stephanosio added a commit to stephanosio/zephyr-sdk-ng that referenced this issue Sep 6, 2021
Pull in the newlib workaround for the Xtensa malloc failure issue
reported in zephyrproject-rtos/zephyr#38258.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
stephanosio added a commit to stephanosio/zephyr-sdk-ng that referenced this issue Sep 6, 2021
Pull in the newlib workaround for the Xtensa malloc failure issue
reported in zephyrproject-rtos/zephyr#38258.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
@stephanosio
Copy link
Member Author

zephyrproject-rtos/sdk-ng#394 will fix this issue.

@sylvioalves
Copy link
Collaborator

@sylvioalves do you know if you guys newlib version has this issue or a fix for it?

I'll check and get back soon.

galak pushed a commit to zephyrproject-rtos/sdk-ng that referenced this issue Sep 10, 2021
Pull in the newlib workaround for the Xtensa malloc failure issue
reported in zephyrproject-rtos/zephyr#38258.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
@stephanosio
Copy link
Member Author

zephyrproject-rtos/sdk-ng#394 has been merged.

Waiting for the Zephyr SDK 0.13.1 to be released and mainlined in the CI.

@gerekon
Copy link

gerekon commented Sep 13, 2021

@stephanosio

or does ESP32-S3's "TIE" not support unaligned accesses at all?

Not sure about this. Need to check.

@cfriedt cfriedt added priority: medium Medium impact/importance bug and removed priority: high High impact/importance bug labels Sep 14, 2021
@cfriedt
Copy link
Member

cfriedt commented Sep 14, 2021

#38261 is a workaround

@cfriedt cfriedt added priority: low Low impact/importance bug and removed priority: medium Medium impact/importance bug labels Sep 14, 2021
@cfriedt
Copy link
Member

cfriedt commented Sep 14, 2021

@iuliana-prodan @dbaluta

@cfriedt cfriedt added this to the v3.0.0 milestone Sep 14, 2021
@stephanosio
Copy link
Member Author

stephanosio commented Sep 14, 2021

As per the discussion in the bug triage meeting today:

We will be keeping this issue open until the new SDK is released so that we can keep track of this problem, and also that we do not forget to remove the workaround added in #38261.

SDK 0.13.1 will not be released for and used by the Zephyr 2.7.0 release because it contains other unrelated fixes that can potentially break other things and we cannot afford that at this stage.

@stephanosio stephanosio modified the milestones: v3.0.0, v2.7.0 Sep 20, 2021
@stephanosio
Copy link
Member Author

As per the Toolchain WG meeting today:

SDK 0.13.1 will be released for and used by the Zephyr 2.7.0 release since it contains a critical ARM-related security fix.

stephanosio added a commit to stephanosio/zephyr that referenced this issue Sep 22, 2021
The commit 9bd1483 was added as a
workaround for the Xtensa initial malloc failure bug.

This bug has been fixed in the Zephyr SDK 0.13.1 release and therefore
this workaround is no longer needed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
stephanosio added a commit to stephanosio/zephyr that referenced this issue Sep 22, 2021
This commit sets the minimum required Zephyr SDK version to 0.13.1,
which fixes the following critical issues:

1. Xtensa initial malloc failure (GitHub issue zephyrproject-rtos#38258)
2. ARMv8-M security extension vulnerability (CVE-2021-35465)

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
@galak galak added priority: high High impact/importance bug has-pr and removed priority: low Low impact/importance bug labels Sep 22, 2021
@galak
Copy link
Collaborator

galak commented Sep 22, 2021

Bumping to high as we released SDK 0.13.1 with fix, and we should make sure the PR to revert workaround gets merged.

@cfriedt
Copy link
Member

cfriedt commented Sep 23, 2021

This will be merged on Friday AM EST, just so that the message on the devel mailing list has time to be circulated, and so that there are enough cycles for CI prior to cutting rc3.

cfriedt pushed a commit that referenced this issue Sep 24, 2021
The commit 9bd1483 was added as a
workaround for the Xtensa initial malloc failure bug.

This bug has been fixed in the Zephyr SDK 0.13.1 release and therefore
this workaround is no longer needed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
cfriedt pushed a commit that referenced this issue Sep 24, 2021
This commit sets the minimum required Zephyr SDK version to 0.13.1,
which fixes the following critical issues:

1. Xtensa initial malloc failure (GitHub issue #38258)
2. ARMv8-M security extension vulnerability (CVE-2021-35465)

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
Rushybrook pushed a commit to Rushybrook/zephyr that referenced this issue Oct 21, 2021
The commit 9bd1483 was added as a
workaround for the Xtensa initial malloc failure bug.

This bug has been fixed in the Zephyr SDK 0.13.1 release and therefore
this workaround is no longer needed.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
Rushybrook pushed a commit to Rushybrook/zephyr that referenced this issue Oct 21, 2021
This commit sets the minimum required Zephyr SDK version to 0.13.1,
which fixes the following critical issues:

1. Xtensa initial malloc failure (GitHub issue zephyrproject-rtos#38258)
2. ARMv8-M security extension vulnerability (CVE-2021-35465)

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
antmak pushed a commit to espressif/newlib-esp32 that referenced this issue Mar 31, 2022
The current malloc implementation has a bug in which the first malloc
call may fail if MALLOC_ALIGNMENT is 16 and the requested block size is
such that malloc invokes sbrk with the heap expansion size that causes
the resulting heap address to fall on a page boundary.

sys/config.h was defining the minimum MALLOC_ALIGNMENT for the Xtensa
architecture as 16, and this caused the first malloc call to fail on
the Xtensa platforms in some rare cases that satisfy the conditions
described above (see zephyrproject-rtos/zephyr#38258).

Since the Xtensa architecture in itself does not require the malloc
alignment of 16, remove MALLOC_ALIGNMENT definition and use the default
alignment of 8 for all Xtensa platforms.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
antmak pushed a commit to espressif/newlib-esp32 that referenced this issue Apr 8, 2022
The current malloc implementation has a bug in which the first malloc
call may fail if MALLOC_ALIGNMENT is 16 and the requested block size is
such that malloc invokes sbrk with the heap expansion size that causes
the resulting heap address to fall on a page boundary.

sys/config.h was defining the minimum MALLOC_ALIGNMENT for the Xtensa
architecture as 16, and this caused the first malloc call to fail on
the Xtensa platforms in some rare cases that satisfy the conditions
described above (see zephyrproject-rtos/zephyr#38258).

Since the Xtensa architecture in itself does not require the malloc
alignment of 16, remove MALLOC_ALIGNMENT definition and use the default
alignment of 8 for all Xtensa platforms.

Signed-off-by: Stephanos Ioannidis <root@stephanos.io>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area: newlib Newlib C Standard Library area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: high High impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants