Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H7 dual core OTA fix #2966

Merged
merged 1 commit into from
Nov 14, 2024
Merged

H7 dual core OTA fix #2966

merged 1 commit into from
Nov 14, 2024

Conversation

robertc2000
Copy link
Collaborator

I encountered an issue while testing that prevented OTA from working properly on dual core H7 devices (H755 and H745).

These devices normally boot from both cores. CM4 requires by default a firmware at address 0x08100000 and CM7 requires a firmware at address 0x08000000.

The issue is we are using address 0x08100000 to store the new firmware, which means that after the next reset, both cores will start executing their corresponding firmwares. I noticed this always leads to a crash and the board gets bricked after this...

I put up a fix to disable CM4 from executing before resetting.

@scaprile
Copy link
Collaborator

I think we should partition flash differently, in order not to mess with the M4 and be able to update the M7.

@@ -35,6 +35,9 @@ static struct mg_flash s_mg_flash_stm32h7 = {
#define FLASH_OPTSR_PRG 0x20
#define FLASH_SIZE_REG 0x1ff1e880

#define STM_DBGMCU_IDCODE 0x5C001000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this register present in all H7s ? We might break other OTAs by triggering a HardFault on invalid memory access

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I checked that. It appears to be on all H7s.

Comment on lines 127 to 132
if (STM_DEV_ID == 0x450) {
// H745/H755 and H747/H757 are running on dual core
// Disable CM4 to prevent the old and the new firmwares from booting
// simultaneously on each core.
MG_SET_BITS(MG_REG(bank + FLASH_OPTSR_PRG), MG_BIT(22), 0);
}
Copy link
Collaborator

@scaprile scaprile Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quick workaround, not a fix...
The user bought a dual-core chip for a reason, and we are rendering the second core unusable after first upgrade.
The problem with dual-core OTA is that any core firmware might want to be updated.
IMO, at a minimum, we should be able to update the core we are running in without disabling the other core.
Ideally, we should be able to update both cores, but I guess that would require a different schema, as each core has its own image.

Copy link
Collaborator Author

@robertc2000 robertc2000 Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so in this case we could repartition the flash, like you mentioned. How about using only the first bank (only for these dual core boards)? We'll partition this instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So they designed a dual-flash-bank chip, where banks can be swapped, and assigned one bank for one core and the other bank for the other core, effectively rendering the swap feature unusable...
Can we change core base addresses ? If so, how would that work with Cube ? How will users flash their devices ?
Otherwise, we can only work with the portion that belongs to our core, or with both at the same time. The former is the way we've been doing it, the latter requires merging two images into one uploadable chunk; I'd rather avoid that.

One other issue: if we tinker with the M4 memory space, we need to stop that core before doing so, that is, probably the easiest path is to limit to the core we are running in.

When and if a customer request updating the M4, we'll devise a schema to do it; be it a single image or a selector for the core to be updated. For now, IMO, stick to the simplest way to update the M7 that coexists with what Cube will generate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your questions,

Yes, we can change the base addresses for each core. There are some option bytes which control this and modifying the values seems to be persistent across reboots. I tried to figure out how to do it in Cube, but I can't find such an option. Cube seems to generate code linked at the default addresses for each core. (CM4 code is linked at 0x81.... and CM7 at 0x80...). You can change these option bytes though in STM32CubeProgrammer, but I think that is irrelevant for now.

For now, I don't think we should deal with M4 memory space. As long as we use the first bank only for OTA, everything should be fine, since we are updating only M7.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, if it can't be done in a way that is transparent to the user (does not require users to be aware of it and go changing stuff or programming in a different way), I think we'd better stick to a single-bank approach. That has already been done and it only requires thinking flash is half the size and there is no swap, so we manually overwrite the lower portion of what is physical bank 1, and don't mess with bank 2, that belongs to the M4.
Is that correct ?
That will let us easily handle updates for the M4 when and if a customer requests that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly! We'll treat it as a single bank scenario by assuming the 2nd bank does not exist.

@@ -121,6 +124,12 @@ MG_IRAM static bool mg_stm32h7_swap(void) {
flash_clear_err(bank);
// printf("OPTSR_PRG 1 %#lx\n", FLASH->OPTSR_PRG);
MG_SET_BITS(MG_REG(bank + FLASH_OPTSR_PRG), MG_BIT(31), desired);
if (STM_DEV_ID == 0x450) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like H745/H755 and H747/H757 aren't the only ones that use this ID. H743 uses 0x450 as well. I did not expect that, since H743 has a different RM. They shouldn't have allowed to have the same ID, it complicates things. How are we supposed to distinguish between a H743 and a H745 at runtime??

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always do it at compile time, but let me dive into ST waters and let's see what I can find.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. It would be cleaner to do it at runtime though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like single-core chips are in fact dual-core chips where the M4 has been disabled. This means that everything is the same, but probably we can check the BCM bits in the OPTION bytes.
Whether the chip is a single-core or a dual-core one, I think it is safe to say that if the CM4 is gated (no clock, BCM4 = 0 ), we can assume all flash is available.
WDYT ?
745 RM 13.3.14 SYSCFG_UR1
742 RM 12.3.12 shows this register is not present, notice the next register is UR2. The less-significant 16-bit portion of UR4 is reserved.
Worst case: we get a HardFault when accessing SYSCFG_UR1, this is an invasive way to detect a single core...
Best case: the option byte is readable

I don't see any other option if 743s really are 745s with the M4 disabled

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but unfortunately UR1 reads as 0x10001 on both H743 and H745/H755... So no hard faults, but instead it returns the wrong value. It should have been 0x10000 instead.

I guess we'll have to decide at compile time if it's dual core or not. I did not find any other options.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So 743s are in fact 745 chips with either a "broken" or removed M4 core...
The other options I can come up with, require collaboration between cores and a timeout... a big no no no...
Let's add a #define for all dual cores. Does this very same situation replicate with the other dual core families we handle ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the same for all dual core boards we have (H745/H755 and H747/H757). They are all under the same reference manual, anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I would add MG_OTA_H7DUALCORE or something like that (give or take one or two underscores)

@scaprile scaprile merged commit 8b95a22 into master Nov 14, 2024
62 of 77 checks passed
@scaprile scaprile deleted the ota-h7-dual-core branch November 14, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants