-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
MSI X99S MPower Blank Screen then Boot Loops #17
Comments
From this thread, download ME System Tools v9.1 and run MEInfo tool at a command prompt. Attach the output here to see if BootGuard is enabled. |
C:\Users\A\Intel ME System Tools v9.1 r1\MEInfo\Windows64>MEInfoWin64.exe Intel(R) MEInfo Version: 9.1.20.1020 GBE Region does not exist. BIOS Version: M.B0 FW Capabilities: 0x40100940
TLS: Disabled C:\Users\A\Intel ME System Tools v9.1 r1\MEInfo\Windows64> |
Can you show MEInfo -verbose? I don't see BG at the end where it should be. Maybe BG was not a thing back when X99 launched, I don't recall. Have you searched the BIOS for any security options such Secure Boot, Boot Guard etc which might cause such issues? Also, from the System Tools, run Flash Programming Tool with command "fptw64 -f me_b310123.bin -me" with the file (ME region only) linked here. Once it is done, shut down the system, remove power (psu cord + psu switch to off + press power button 1-2 times) and wait for 1 minute. Now check if it boots. |
What's so special about that bin file? Is it neutered? Oh and just to double check, this will modify the bios stored in the bios chip and not stored somewhere else? Bad bios flash I can recover from, others not sure... |
Yes it is but with an older me_cleaner commit (b310123) which removed all extra $FPT modules but leaves the Recovery one (FTPR) completely intact (no LZMA or Huffman FTPR modules removed). If it works then the problem is with newer me_cleaner versions as they probably remove something they shouldn't. If it doesn't work like before, then the issue you have is system specific (BootGuard, TPM, SecureBoot etc technologies). Thus the MEInfo -verbose output and BIOS options I asked for. :) In case you are not familiar with Intel SPI chip image structure, it mainly consists of the regions Flash Descriptor (FD - controls read/write access to the other regions among other functionality), GbE, ME and BIOS. You already recovered from a bad ME region flash (what's what me_cleaner adjusts) so you can indeed reflash the entire SPI chip with whatever method you are using. So yes, you can recover from more than just a bad "bios" flash if that's your question but I suspect your, justified, inclination comes from the common misunderstanding that "BIOS" = "SPI image". The "BIOS" region is just a part of the SPI chip/image and "ME" is another. |
Thanks for the explanation. I'm not familiar with the layout. I just figured the dual bios chip setup on my MB would save my behind so there is no risk in trying (that's what the manufacturer desctiption calls it). During the bios update it says "flashing bios and me". Then it looks like the " bios update" procedure flashes the whole spi and the "dual bios" setup has 2 spi chips that can be swithed. I have classes until 8pm eastern time so I'll try it when I get back (and post the verbose). For this MB secure boot or other windows drm boot crap is disabled by default. |
Verbose MEI Intel(R) MEInfo Version: 9.1.20.1020 FW Status Register1: 0x1E000255 CurrentState: Normal Get ME FWU version command...done Windows OS Version : 6.2.9200 ""
Windows OS Version : 6.2.9200 ""
Get ME FWU info command...done Get ME FWU version command...done Get ME FWU feature state command...done Get ME FWU platform type command...done Get ME FWU feature capability command...done Get ME FWU OEM Id command...done BIOS Version: M.B0 FW Capabilities: 0x40100940
TLS: Disabled Get BIOS flash lockdown status...done Get flash master region access status...done Get ME FWU OEM Tag command...done Get ME FWU Platform Attribute (WLAN ucode) command...done Get ME FWU Info command...done C:\Users\A\Intel ME System Tools v9.1 r1\MEInfo\Windows64> |
Tried the new neutered ME file and flashed it exactly as told. Unfortunately got the same result (and the same heart attack). |
MEInfo does not mention BootGuard and it seems to be disabled at the SPI image provided by MSI as well so it should be safe to assume it is not the problem. You said SecureBoot is disabled and I highly doubt there is a TPM module installed so the only logical conclusion is that this is a BIOS-specific issue. Meaning, something at the BIOS checks the ME and maybe tries to recover it, thus the boot loop? It's a guess. It would be interesting to see when this "check" takes place maybe by removing one "useless" section of the ME for starters and see if even that triggers the problem. But that would require a few more tests and possibly heart attacks. |
I don't have a TPM module. I saw a header for it in the motherboard manual but I didn't buy anything for it. That's a very interesting theory. Well if you want to remove a section I suppose I could try it again. Having recovered from 2 "boot loops", I think it's safe to say the dual bios/spi chip setup should be able to save my behind every time...??? |
If you can recover from a ME brick, you can recovery from "anything". Some OEMs implement BIOS region recovery methods which will work nicely with little user intervention but a ME brick always requires a SPI image reflash. So you're good on that regard, not that I endorse trying your luck constantly. I made a new ME region (same instructions as above, not full SPI image) which is intact with only one stupid module removed called MDMV which is definitely not required for booting or basic ME functionality. We'll see how the system will react. |
Your new image with mdmv removed which you say is not essential boots. Coming to you live after the flash. I guess to confirm it really is not essential lspci still shows MEI [Daniel@Daniel8 ~]$ lspci |
Very interesting. So something that gets removed causes this. All we have to do is move backwards now. Remove one by one until it stops working. I created a set of 5 new ME regions only. Start from the smallest filename (less modules removed, most probable to work) and move towards the largest filename (most modules removed, less likely to work) until it stops working. Previous test = MDMV removed Personally I suspect EFFS as the problem and that's why it's the last test. Maybe GLUT otherwise. We'll see. |
The very first file me_mdmv_mftp.bin failed. One thing I forgot to mention is that the debug led on the motherboard gets stuck at 68. If you download the manual for the motherboard and look at page 51, "68" seems to be mysteriously missing from the table. |
Error code 68 is "PCI Host Bridge Initialization". This should be BIOS related. Have you set any custom options at the BIOS or overclocking? I doubt it as you reflash with the stock BIOS but it never hurts to check. I have created two test ME regions here. These should be safe. File "me_mdmv_missing" lacks partitions which are mentioned at $FPT but cannot be found at flash (no starting address), the system should definitely boot with that. File "me_mdmv_missing_empty" additionally removes partitions which are mentioned at $FPT with starting address and size but are completely empty at flash. Again, I suspect this should work even at your system which behaves strangely. So flash these two test files and we'll go from there if they work. File "me_mdmv" is the working MDMV test which is kept for reference, don't flash that. |
The first one "me_mdmv_missing" worked but the second one did not (boot loops). If I understand correctly, the first one lists the partitions by name only but not where they are in the image. The code for those partitions is still there but is effectively unreachable since they're not mapped anywhere. The second one is what me_cleaner usually does which is keep the partition entry name and location but zero out its code. So is the moral of the story, don't address something that isn't usable? |
The ME firmware has a Flash Partition Table (FPT) which is at the beginning. Each entry is 20 bytes and the first 10 bytes show the Name, Owner, Offset & Size of the given partition. The "missing" partitions have only Size but without a starting offset they are nowhere to be found at the actual ME region flash. So they are just mentions, useless overall. They just take place at the FPT, their sizes are not even found somewhere as padding. The "empty" partitions (first 4 in your case) have both Offset and Size so they take place at the ME region flash but it's just padding/empty. For example PSVN does start at 0xBC0 and it's filled with 0x40 padding bytes. Corna's me_cleaner works differently. It just removes everything except the Recovery (FTPR) region from FPT and fills those removed with padding. This is acceptable for the ME co-processor itself and thus for most systems as proven here. However, in your case, I suspect MSI has implemented a check at the BIOS which possibly verifies that the ME is corrupted. I've seen similar tactics in the past from OEMs such as Gigabyte (BIOS GUID with just the first 0x400 bytes of the ME region or 0x400 EFFS/settings ME subsection), Clevo/Sager (Copy of first 0x100 FPT header bytes exactly before the ME starts, meaning in the FD or GbE space), ASRock (Check if ME version is different from hardcoded values at BIOS AMITSE module and if yes restore back via full BIOS GUID copy of ME region), ASUS (Exact copy of full ME region inside at BIOS GUID) and more... Looking at the MSI SPI image, I couldn't find any obvious clue as to how that check is performed. Meaning, no BIOS GUID or modules with ME keywords which could be used for corruption checks. I believe that either:
The 2nd seems a lot more likely. The first can be easily verified. I created a single ME region test image which has NFTP listed at FPT but the actual NFTP contents of the ME flash are gone/padded. If that works, then the BIOS checks the FPT. If not, the BIOS refuses to boot since the ME reported any error, even a non-critical one. |
Thanks once again for taking the time to go the extra mile in explaining things for a non expert in a clear easy to understand method. |
No, not at all. The board that you got is not bad, the BIOS from MSI might have a bug somewhere as the boot loop does not seem proper whenever a ME loading error/corruption is encountered. Maybe they can fix that if reported. Something like "my ME got corrupted and ended up in a bootloop which does not seem normal", no me_cleaner mentions of course. MSI is actually pretty cool since they have two socketed SPI chips (even if both get corrupted, they can be easily remove and reflashed with 5$ programmers), their in-BIOS flasher reflashes both BIOS and ME (rare, cool and quite a butt-saver when the latter gets corrupted), they update the BIOS regurarly etc. So no, I don't believe you made the wrong choice or that there is a moral to this story. Such recovery methods can actually save systems and users who have no idea what ME even is and a lot of OEMs use different implementations of similar scope either way. Anyway, these are beside the point. In the end, me_cleaner does not work on your board because the BIOS tries to recover from a corrupted ME. With that in mind, I think this issue can be closed now. :) |
I didn't mean the board being physically bad but the "design" (bios) is bad. This board's chips are not socketed. There are just 2 of them so 1 can cover for the other. Just 1 little thing before closing: Obviously no me_cleaner mention but doesn't it seem a bit funny that I would know the ME flashing went bad. How would I have known that watching a 1%,2%,3%... bar. Make a guess because it "crapped out" at 75% (second half)? Or, how would an ME corruption happen spontaneously and I could "diagnose" that as the issue? Oh and thanks again for all your help. I learned a lot from this. Who know bioses were so complicated now. |
Yes, I confused your motherboard with another case I was trying to resolve. From the pictures I can indeed see two soldered SPI chips. I'm not sure I understand the second part with the question. If you mean how to identify a broken ME, there are a lot of indicators as it's deeply integrated into Intel systems. Usual symptoms can be 30-minute shutdowns, fans spinning constantly at full speed, no power management, wrong/half RAM detection, iGPU not working, wrong clocks and no overclocking, BIOS reporting ME version as 0.0.0.0000 or N/A, AMT not working at Corporate/5MB SKUs, BIOS error messages related to ME during booting, bad performance, sleep/wakeup issues etc. Usually a google search will lead people to correct places (my example) to ask for help. There are also Intel tools which can check if the ME is working properly like MEInfo and MEManuf. Thank you as well for indulging my (many) test files. You may not be able to get me_cleaner working (I'm not sure you would want that either way at such a nice/good board, hint) but at least we learnt of this MSI BIOS check that I wasn't aware off. Maybe Corna can add an extra warning regarding OEM ME recovery procedures which will try to reverse me_cleaner's actions. |
Thank you very much @platomav for the support you're putting into this project
I think it's the second one. Even if the BIOS has access to the ME region ( As a reference, here the MSI BIOS checks the status of Intel ME but, if me_cleaner has been used, it just prints an error message. |
Here is the lspci -v info of the ME from linux 00:16.0 Communication controller: Intel Corporation C610/X99 series chipset MEI Controller #1 (rev 05) Here is the windows device manager hardware ids reported: Here is the windows device manager device instance path: |
I tried this on my msi x99s mpower motherboard described here: https://ca.msi.com/Motherboard/X99S-MPOWER.html#hero-overview. Specifically I tried the newest M.B revision of bios. After flashing, my computer rebooted to a black screen. It looks like the video card did not initialize at all. I waited for 5 minutes but nothing happened. After, I force restarted the computer and it went into a boot loop. Tries to turn on, turns off turns on again. Luckily the motherboard has 2 bios chips so I used the secondary bios to flash the primary one with an unmodified M.B revision and everything is ok now. Thank you for your hard work in this project. I suppose I might be willing to try it again since recovering a bricked bios is pretty easy for this motherboard.
Here's the output when I run the script:
[Daniel@Daniel8 me_clean]$ python3 me_cleaner.py E7885IMS.MB0
Full image detected
The ME region goes from 0x1000 to 0x7fffff
Found FPT header at 0x1010
Found 20 partition(s)
ME firmware version 9.1.10.1000
Found FTPR header: FTPR partition spans from 0x48000 to 0xd0000
Removing extra partitions...
Removing extra partition entries in FPT...
Removing EFFS presence flag...
Correcting checksum (0xea)...
Reading FTPR modules list...
Wiping LZMA section (0xadbb4 - 0xd0000)
UPDATE (LZMA, 0x0adbb4 - 0x0addde): removed
ROMP (Huffman, 0x04eac0 - 0x04eec9): NOT removed, essential
BUP (Huffman, 0x04eec9 - 0x05fd1f): NOT removed, essential
KERNEL (Huffman, 0x05fd1f - 0x095093): removed
POLICY (Huffman, 0x095093 - 0x0adbb4): removed
ClsPriv (LZMA, 0x0addde - 0x0ae1b7): removed
SESSMGR (LZMA, 0x0ae1b7 - 0x0b9b51): removed
SESSMGR_PRIV (LZMA, 0x0b9b51 - 0x0bf430): removed
HOSTCOMM (LZMA, 0x0bf430 - 0x0c773a): removed
TDT (LZMA, 0x0c773a - 0x0ccaef): removed
FPF (LZMA, 0x0ccaef - 0x0ce5f2): removed
Done! Good luck!
The text was updated successfully, but these errors were encountered: