-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Thumb mode support #27
Comments
As a quick response - thanks for giving it a try and providing a reference binary, we can go from there, let me have a look at it when I have a chance.
Yeah, but that feature is less a week old, I actually would like to do a more testing for it before making a formal release, so things may not work too well, and thanks for helping to test it again. Anyway, how ARM/Thumb interoperation works, is that if ARM code uses a suitable call instruction (
Ack, that's one of the feature which is missing yet - defining Thumb code (ARM is defined by default with "Make code (c)" command. One reason it's not implemented yet is concerns of what UI to use for it. I can quickly think of 2 options:
Let me know how these sound and if you have better ideas. (One thing which bothers me is what to do when we face an arch with 3 or more alternative ISAs. But I guess, it makes sense to stay realistic and follow YAGNI until it hits ;-). (When it'll require noticeable redesign anyway.)) |
Btw, did you try just to see ARM vs Thumb detection in action with (Please pull, there was a bug when starting up when initial function was Thumb.) |
"c" + "C" sounds good to me the way you described it. Another idea I have is to press "C" several times to switch between various alternative ISAs, when there are more alternatives. |
And by the way, thanks a lot for all your work on ScratchABit and ScratchABlock! |
So, even trying it as is, I see that few Thumb functions were detected, e.g. fun_0000f9fe . Btw, I found it using "Memory map" feature (Shift+i), scrolling around. Yeah, memory map feature should be improved, at the very least it should show current address, yet better a scrollbar. |
Well, that's not how it works. Pressing "c" marke that address as a code and starts analysis from it. Just imagine, that you can load 1MB of code, with start routine quickly ending with indirect jump. Then you found another code address, and - wonder - all the rest of code is direct jumps/call. Then, pressing "c" will lead to recursive descent analysis of this 1MB. There're bunch of side effects of that, like symbols get defines, functions, etc. You can't really "reanalyze" it in different ISA, as all previous artifacts will stay there. The closest thing to that would be implementing undo feature, so after you mispressed "c", you can undo and press "C". This reminds me that I wanted to implement undo, yeah ;-). In the meantime, how it works is that you'd need to manually "u"ndefine all mis-disassmbled instructions one by one, and press a correct key. Realistically, that shouldn't be many, but undo would be better of course. |
You definitely can write a Python plugin/script for SABit's native API, and run it with --script switch. The only caveat is that this API isn't stable. Here's example: https://github.com/pfalcon/xtensa-subjects/blob/master/2.0.0-p20160809/start.sh https://github.com/pfalcon/xtensa-subjects/blob/master/2.0.0-p20160809/proj_init.py
You could easily convert that to suitable calls for SABit, yeah.
And that's the winner. That's how it was intended to be. IDAPython API (which includes (well, included IDC) funcs) was supposed to be a "stable" way to deal with, however unbeautiful that API might be. You just use native Python syntax instead of IDC. Again, here's example: https://github.com/pfalcon/xtensa-subjects/blob/master/2.0.0-p20160809/esp8266_sdk_2_0_0_p20160809_map_script.py Well, going that way, you'll find that a lot of idc funcs are not implemented. While I'm working on "C" command implementation, feel free to beat me on adding few idc.py funcs ;-). |
Thanks. And I appreciate attention and really need help with catching goofs which may preclude other people from using/enjoying it, and to move it forwards. So, issue reports, criticism, patches, spreading the word is appreciated ;-). |
I real and properly working undo function would be the greatest feature possible, I think. I guess that it's a bit difficult to pull it off in a somewhat memory efficient way due to all the side-effects every action can trigger. |
From my experience, when you define something in the wrong instruction set, you usually don't get very far, so it seldomly found other functions that way. But yes, a real undo function would be great for that. |
Oh, this writing between the lines... Of course I didn't try your .def as is, I fixed it up first. So, the CPU plugin which should be used is:
All other ARM CPU plugins should be considered a debugging aids (well, we'll handle these matters in #29). Another issue is that your file is 128K, while you have in your .def:
That's 64K, i.e. half-file is lost. This also leads to exception on trying to save database (Shift+s) after initial loading. Summing up, the correct .def to use is:
Let's stick with this one for further testing to be on the same line, or if you fixed the issues above and have further updates on your side, please post the latest version. |
Ok, "C" command added in b4a390e, let me know how it goes. |
The easiest way to run a script on SABit startup is using |
"C" works fine on undefined bytes, but it does not work on ARM code. Also it does not work on errorneous code "UNEXPECTED value", I always have to undefine all that stuff first. It would be great to be able to mark a whole section of code and be able to undefine or "C" or "c" all that code at once, undefine always only works a single value. |
Yes, that's by design, doing it otherwise would lead to (more) mistakes.
Also as expected, working otherwise would again likely lead to more troubles.
These aren't expected to be there. I'm working thru stabilizing the workings of your patch, and afterwards, ways to reproduce such cases would need to be reported to be investigated/resolved. As was discussed earlier, some (arguably, pretty limited) drawbacks of the handling above will be addresses by implementing undo. |
I have 2 news: good and bad. Good: with your latest init.py (size: 1244310, md5sum: 157a4c80860932f3d5f1691d46baa654), from-scratch analysis result can be saved and disasm-written without exceptions. Bad: this latest init.py still has issues, e.g.:
It's implausible to have an ARM instruction at 0x0000BCF8, and 2 bytes later, at 0x0000bcfa, to have a Thumb instruction. And as the first line shows, at 0x0000b5e8, the CPU was in ARM state, and executed |
Ok, now I hopefully implemented the BLX handling correctly. Please try again. |
As a user I woud like to have a fast possibility to try various decoding options at any given data address. I would like to try decoding it as ARM, as Thumb, as DWord, as String, ... and easily switch between those options. When I try to decode an address with one of those options, and then directly afterwards at the same address with another of those options, ScratchABit should automatically undo the previous option and try the new option instead, and automatically undefine it in between when necessary. If it was longer before, ScratchABit should perhaps ask me, whether I really want to undefine this, or perhaps also whether I want to undefine it until the end of consecutive similarly defined addresses. That's the usability that I would like to have. |
Better, much better. Still some fine details to take care of.
As mentioned elsewhere, BLX has 2 forms: direct-addressing (i.e. immediate offset encoded in instruction) and indirect-addressing (address in a reg).
On this case, the instruction is There're definitely more cases like that - I'm diffing my grep -v filtered version against your latest one, and while there're mostly improvements, there're also some regressions. |
Ok, perhaps I got it right now, feel free to try again. And thanks a lot of the explanations! |
You seem to miss that "easy switch" also means "easily switch by mistake". Let me again point that the act of disassembly is a complex act with side effects. Consider that you pressed "c" and 1MB of code got disassembled, 1000 labels was created, etc. 1MB is helluva of code, so next month(s) you spent renaming those labels to make sense, etc. Then you by mistake pressed "C" there. What happens? All the previous labels don't make sense for new code, so either one month of your work goes to /dev/null if tool automatically deletes them, or you're left with irrelevant labels pointing into the middle of new instructions. 2nd point is establishing baseline for what's "easy" and what's "hard". "Easily do disassembly work" is the reason I embarked on writing ScratchABit. Before that, the choice I had is to typing "TLA 0xabcdef12" and pressing Enter in Radare (and you've got to know all those TLAs!). I thought that life's too short for that, and decided that there should be direct-manipulation alternative (additionally written in interpreted VHLL). But that doesn't mean that I'm, exaggerating, writing a tool where a single (mis)press may get your HDD formatted. You must accept that while many commands are a single press with immediately visible result, some require 2 keypresses, and some even legwork of user cursor keys multiple times (or multiple "u" presses). All that however explains why ScratchABit UX was designed like that, and likely will stay that way. But the reason I wanted it to be written in "interpreted VHLL" was to allow easily hacking on it by as many users as possible and trying wildest ideas. Feel free to hack your way of working in, feel free to try it on different projects, feel free to share results and patches. I can't promise that the core operations will be changes (but happy to hear experience and learn from it), but the aim is to allow users to implement any functionality to plugins, redefine any keys, etc. |
Getting there. Reality just gives examples of what I wrote above ;-) :
As you can see, you make a label/function at 0x0000BC72 which points inside an ARM instruction at 0x0000BC70, ScratchABit doesn't like that:
Can't have ARM instruction at address not divisible by 4. |
But this would be a case of "direct-addressing always flips ARM<->Thumb". It is jumping away from Thumb mode, uses BLX to a direct address, ... but since it's an odd address, it must be Thumb code. Why is it using BLX instead of BL here? |
I assume you means 2nd case, with 0x0000FFE6. So, at 0x387a, there's a Thumb instruction, which Capstone decodes as "blx fcn_0000ffe4". So, instruction at 0xffe4 must be an ARM instruction. Neither 0xffe6 nor 0xffe4 is odd address (0xffe5 would be). Actually, as I mentioned in the other ticket, this version of instruction doesn't even store lowest bits of offset: DDI 0406C.c, page 348:
I.e. immediate offset value stored in the instruction is just extended with 2 zero bits on the left (aka shifted by 2 left/multiplied by 4). "Odd addresses" come into play only if you deal with indirect instructions, with address in a register. |
It turned out that the root cause for these problems is a bug in the disassembler of OpenOCD, so unfortunately the disassembled instructions in my logfiles are wrong: https://sourceforge.net/p/openocd/tickets/175/ I am trying to filter those wrong instructions out now, please try again. |
And a patch for OpenOCD is also on the way: http://openocd.zylin.com/#/c/4382/1 |
Hmm, actually I could fix up the addresses myself too. Ok, please try again. |
Great, I don't see any artifacts caused by init.py. We can take the disassembly produced as a baseline, and now start to look how it can be improved on ScratchABit side. I can see at least a couple of issues, will dig into them as time permits. @thesourcerer8, Feel free to report any issues/possible improvements too, perhaps as a separate tickets, as this is a closed one anyway. |
D'oh, I mixed up with something else ;-). |
Anyway, continuing here re: issues with init.py, this time, more "serious", like possible problems with your tracing (or maybe openocd issues again):
Treating stuff at 0x0001E5E0 as Thumb instruction doesn't make much sense though, following rendition is much better:
As you can see, that part of code are thunks to perform long jumps to the part of code at 0x80000000. |
Another issue was that there was a bunch of UNEXPECTED flags=0x20, i.e. Thumb instructions marked, but not traced. Well, there's little choice apparently to make such instructions "entrypoints", like you did in your original patch. I feel it a little hacky, but as long as it's conditional and there's a way not to do it, I guess it's fine for now: a12404f . Witt that applied, the only (relevant) UNEXPECTED left is which is related to 0x0001e5e0 address above. |
Hi tried ScratchABit now again with my test files: http://www2.futureware.at/~philipp/ssd/analyse/EXT0CB6Q.dec.P21.frmw which contain both ARM and Thumb code, and it starts with ARM code at 0x0 and has a Thumb function that starts at 0x574.
According to a comment on the Kosagi forum, ScratchABit should support both ARM and thumb now, but I couldn't get it working. This is my .def file:
#cpu any_capstone # does not work
cpu arm_32_arm_capstone # works, only ARM
#cpu arm_32_capstone # works, only ARM
#cpu arm_32_thumb_capstone # only THUMB
#cpu arm_thumb # only THUMB
show bytes 4
area .bin 0x00(0x10000) rwx
load EXT0CB6Q.dec.P21.frmw 0x0
[entrypoints]
start = 0x000
Is there a way to define where ARM and where Thumb code is in interactive mode?
The text was updated successfully, but these errors were encountered: