Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Eagle Stream FSP (0115.D.05) #115

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

niruiyu
Copy link

@niruiyu niruiyu commented Feb 11, 2025

Preserve the bootloader's paging state.

Preserve the bootloader's paging state.

Signed-off-by: Ray Ni <ray.ni@intel.com>
@65a
Copy link

65a commented Feb 12, 2025

Tried this with Coreboot head and the FSP header hookup patch. On a previously working board, I got a General Protection fault in Silicon Init:

!!!! IA32 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 00000001 !!!!
ExceptionData - 00000000
EIP  - 76799950, CS  - 00000010, EFLAGS - 00010012
EAX  - 00000006, ECX - 00000123, EDX - 00000006, EBX - 7678A4A8
ESP  - 767CBF24, EBP - 767CBF40, ESI - 00000006, EDI - 7678E00C
DS   - 00000018, ES  - 00000018, FS  - 00000018, GS  - 00000018, SS - 00000018
CR0  - 80000013, CR2 - 00000000, CR3 - 76FF9000, CR4 - 00000660
DR0  - 00000000, DR1 - 00000000, DR2 - 00000000, DR3 - 00000000
DR6  - FFFF0FF0, DR7 - 00000400
GDTR - 6353DEF0 0000083F, IDTR - 76FC90A0 0000009F
LDTR - 00000000, TR - 00000000
FXSAVE_STATE - 767CBC60
!!!! Find image based on IP(0x76799950) k:\intel\Build\EagleStreamFspPkg\RELEASE_VS2015x86\IA32\UefiCpuPkg\CpuFeatures\CpuFeaturesPei\DEBUG\CpuFeaturesPei.pdb (ImageBase=0000000076795180, EntryPoint=0000000076795483) !!!!
!!!! IA32 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 0000004A !!!!

Going to try setting nothing in the FspsUpd, but I don't think I was setting much (TME never worked, no SGX). CPU is Xeon Max 9480.

@pp3345
Copy link

pp3345 commented Feb 12, 2025

We are seeing the very same issue as @65a on one of our Eagle Stream boards.

@shuoliu0
Copy link

Could you please try with below coreboot change? @pp3345 @65a
https://review.coreboot.org/c/coreboot/+/80360

@65a
Copy link

65a commented Feb 14, 2025

@shuoliu0 Thank you for your hard work on this!

EDIT: unrelated debugging information removed

@65a
Copy link

65a commented Feb 14, 2025

@shuoliu0 Thanks again for your time!

EDIT: Removed mistaken build results with old FSP and new headers.

@shuoliu0
Copy link

@shuoliu0 error above is with the patch applied. I am using vboot if it matters. If I do not set NO_FSP_TEMP_RAM_EXIT, I get a different crash (null pointer dereference), if that matters. I can try to double check both this patch and yours applied again, and that I do not have any stale files. Thank you for your hard work on this!

Hi @65a , in our default config, NO_FSP_TEMP_RAM_EXIT should not be set. So it should be okay that you just go with NO_FSP_TEMP_RAM_EXIT unset.

@shuoliu0
Copy link

@shuoliu0 thanks for having me check again. Either a local workaround for LAPIC configuration or a patch ordering issue was in play, boot was good. Having reverted some local patches, I am somehow getting the wrong DMI 17 table again. Will investigate further, but this works here. Thanks again for your time!

Glad to know!

Copy link

@shuoliu0 shuoliu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @niruiyu , LGTM. Tested with https://review.coreboot.org/c/coreboot/+/80360, pass boot to LinuxBoot.

@65a
Copy link

65a commented Feb 14, 2025

I didn't realize that deleting the .git in 3rdparty/fsp would get re-initialized by the make script, so that build may have been old FSP, new headers. Trying again now, still without x2apic local patch, hoping it still works, but problem may still exist

@65a
Copy link

65a commented Feb 14, 2025

Apologies for my mistake, the problem is indeed still present, same issue. The fault is repeated for each CPU thread. Maybe the pdb provides a clue if you have access to it? Or contents of cr0?

@65a
Copy link

65a commented Feb 14, 2025

As requested, here is without NO_FSP_TEMP_RAM_EXIT, it gets even less far and dies coming into postcar

[DEBUG]  Processing 592 relocs. Offset value of 0x616e7000
[INFO ]  Timestamp - end of romstage: 452919996229
[DEBUG]  BS: romstage times (exec / console): total (unknown) / 11727 ms
[ERROR]  Null dereference at eip: 0x636e7350

@shuoliu0
Copy link

Our used test configs are as below,

coreboot baseline - commit c52ffcede36658e7efd2b8d01d92e7e2eaaa12e7 + https://review.coreboot.org/c/coreboot/+/80360
3rdparty/fsp baseline - commit 15c0f7b + #115
3rdparty/intel-microcode baseline - commit 8ac9378a84879e81c503e09f344560b3dd7f72df

Can I know yours?

P.S. I need to check if the default download git submodules satisfy the requirements above or not, if no, extra fixes are needed.

@shuoliu0
Copy link

Our used test configs are as below,

coreboot baseline - commit c52ffcede36658e7efd2b8d01d92e7e2eaaa12e7 + https://review.coreboot.org/c/coreboot/+/80360 3rdparty/fsp baseline - commit 15c0f7b + #115 3rdparty/intel-microcode baseline - commit 8ac9378a84879e81c503e09f344560b3dd7f72df

Can I know yours?

P.S. I need to check if the default download git submodules satisfy the requirements above or not, if no, extra fixes are needed.

I checked the coreboot's 3rdparty already catch up to date - https://github.com/coreboot/coreboot/tree/main/3rdparty. Could you please check your environment to see if matched?

@65a
Copy link

65a commented Feb 14, 2025

My microcode probably doesn't match, it's a blob from 2022. Can you confirm your 3rdparty/fsp is at 15c0f7b3f723bcd713e5ab11ebc502f30d9084e7? That doesn't appear to contain these changes here. It is probably possible to set https://github.com/niruiyu/FSP/tree/spr_fsp as upstream, but I didn't try it, just copied the files and committed locally to both FSP and then coreboot to ensure the submodule didn't get reverted (how I got burned earlier when I tried to build fresh). I can try the microcode update next if you think that's likely. Please confirm sha256 of the Fsp.fd you are testing is: 90a105d2ecc801f09f63d21a54a5550e530446694571976137eae2466d355e97 or 776100021418dba4fb62446e7c420ce2bd161e849fc6d19c9f72b80a6b479e77?

@65a
Copy link

65a commented Feb 14, 2025

To be more clear about Fsp.fd:
New upstream (This PR) sha256: 776100021418dba4fb62446e7c420ce2bd161e849fc6d19c9f72b80a6b479e77
Old upstream (15c0f7b) at head is: 90a105d2ecc801f09f63d21a54a5550e530446694571976137eae2466d355e97

@65a
Copy link

65a commented Feb 14, 2025

My coreboot tree is rebased on top of HEAD at commit 4985079b161f425137ee9e07ff86ddad1d08727f, it contains your patch from gerrit applied as a diff and committed, my local changes to src/mainboard for this board, and the local commit for the 3rdparty/fsp submodule for these changes.

@shuoliu0
Copy link

4985079b161f425137ee9e07ff86ddad1d08727f

New upstream (This PR) sha256: 776100021418dba4fb62446e7c420ce2bd161e849fc6d19c9f72b80a6b479e77 -> confirmed

@shuoliu0
Copy link

4985079b161f425137ee9e07ff86ddad1d08727f

New upstream (This PR) sha256: 776100021418dba4fb62446e7c420ce2bd161e849fc6d19c9f72b80a6b479e77 -> confirmed

Could you please update your 3rdparty/microcode and try?

@65a
Copy link

65a commented Feb 14, 2025

Trying that now, commit matches yours. May be last flash for today, this board requires a chip clip recover and external ISP on bad flashes, so it's painful.

@shuoliu0
Copy link

Trying that now, commit matches yours. May be last flash for today, this board requires a chip clip recover and external ISP on bad flashes, so it's painful.

I see, thank you for the patience.

@65a
Copy link

65a commented Feb 14, 2025

Booting now, NO_FSP_TEMP_RAM_EXIT is not set. If that triggers the null issue, I'll try one with it set as well, since that was getting farther.

@65a
Copy link

65a commented Feb 14, 2025

It does trigger the same null without NO_FSP_TEMP_RAM_EXIT, building with it and trying now

@shuoliu0
Copy link

It does trigger the same null without NO_FSP_TEMP_RAM_EXIT, building with it and trying now

Made an inter

Tried this with Coreboot head and the FSP header hookup patch. On a previously working board, I got a General Protection fault in Silicon Init:

!!!! IA32 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 00000001 !!!!
ExceptionData - 00000000
EIP  - 76799950, CS  - 00000010, EFLAGS - 00010012
EAX  - 00000006, ECX - 00000123, EDX - 00000006, EBX - 7678A4A8
ESP  - 767CBF24, EBP - 767CBF40, ESI - 00000006, EDI - 7678E00C
DS   - 00000018, ES  - 00000018, FS  - 00000018, GS  - 00000018, SS - 00000018
CR0  - 80000013, CR2 - 00000000, CR3 - 76FF9000, CR4 - 00000660
DR0  - 00000000, DR1 - 00000000, DR2 - 00000000, DR3 - 00000000
DR6  - FFFF0FF0, DR7 - 00000400
GDTR - 6353DEF0 0000083F, IDTR - 76FC90A0 0000009F
LDTR - 00000000, TR - 00000000
FXSAVE_STATE - 767CBC60
!!!! Find image based on IP(0x76799950) k:\intel\Build\EagleStreamFspPkg\RELEASE_VS2015x86\IA32\UefiCpuPkg\CpuFeatures\CpuFeaturesPei\DEBUG\CpuFeaturesPei.pdb (ImageBase=0000000076795180, EntryPoint=0000000076795483) !!!!
!!!! IA32 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 0000004A !!!!

Going to try setting nothing in the FspsUpd, but I don't think I was setting much (TME never worked, no SGX). CPU is Xeon Max 9480.

Made a check on the error reports. This is highly possible caused by a non-update-to-date microcode. However, after you update the 3rdparty/intel-microcode repo.

Per src/soc/intel/xeon_sp/spr/Makefile.mk, the effective microcode are as below,
cpu_microcode_bins += 3rdparty/intel-microcode/intel-ucode/06-8f-08 -> 3a914354e69d78e6f288a00aff48eb7c681cc1bdc46e128960e5b4a281a65f38
cpu_microcode_bins += 3rdparty/intel-microcode/intel-ucode/06-cf-02 -> 8af710fbd24a4e232e06ee2ac74383c70a91d358a565a35b3f1a2e473b4dba92

Could you please confirm this is aligned to your config?

@65a
Copy link

65a commented Feb 14, 2025

Sha256 matches for these microcode files.

I suspect the null could be a failed allocation, but I get a similar issue in postcar with the working FSP with either vboot or X86_64 mode enabled. NO_EXIT is probably a hack that happens to allow for these allocations to continue. Not sure if that has impact...I may need to try to see if I can get FSP more memory space to rule out a side effect. Still waiting for the next one to finish flashing, it is very slow. Thanks for your patience!

@shuoliu0
Copy link

shuoliu0 commented Feb 14, 2025

Sha256 matches for these microcode files.

I suspect the null could be a failed allocation, but I get a similar issue in postcar with the working FSP with either vboot or X86_64 mode enabled. NO_EXIT is probably a hack that happens to allow for these allocations to continue. Not sure if that has impact...I may need to try to see if I can get FSP more memory space to rule out a side effect. Still waiting for the next one to finish flashing, it is very slow. Thanks for your patience!

BTW, we are using configs/builder/config.intel.crb.ac as defconfig, not sure if you are using the same?

CONFIG_VENDOR_INTEL=y
CONFIG_BOARD_INTEL_ARCHERCITY_CRB=y
CONFIG_HAVE_IFD_BIN=y
CONFIG_LINUX_COMMAND_LINE="loglevel=7 earlyprintk=serial,ttyS0,115200 console=ttyS0,115200"
CONFIG_PAYLOAD_LINUX=y
CONFIG_PAYLOAD_FILE="site-local/archercity/linuxboot_bzImage"
CONFIG_HAVE_ME_BIN=y
CONFIG_DO_NOT_TOUCH_DESCRIPTOR_REGION=y
CONFIG_IFD_BIN_PATH="site-local/archercity/descriptor.bin"
CONFIG_ME_BIN_PATH="site-local/archercity/me.bin"
CONFIG_VALIDATE_INTEL_DESCRIPTOR=y
CONFIG_NO_GFX_INIT=y

@65a
Copy link

65a commented Feb 14, 2025

I have very good news! It was the microcode, at least my previous config boots now, sha256 sum of Fsp.fd verified this time :) Really, thanks a lot for your testing. It definitely seems critical that this FSP get paired with the latest microcode, hopefully that helps @pp3345 too. I suspect the null issue is something unrelated, as it occurred before...another bug for another time. Thanks again @shuoliu0!

@shuoliu0
Copy link

I have very good news! It was the microcode, at least my previous config boots now, sha256 sum of Fsp.fd verified this time :) Really, thanks a lot for your testing. It definitely seems critical that this FSP get paired with the latest microcode, hopefully that helps @pp3345 too. I suspect the null issue is something unrelated, as it occurred before...another bug for another time. Thanks again @shuoliu0!

Glad to know!

@pp3345
Copy link

pp3345 commented Feb 14, 2025

Indeed we also had an outdated version of the microcode running. Updating fixed the issue for us as well. Everything's fine then with the new FSP from our perspective. Thanks a lot!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants