Mips Branch Delay Slot Instruction
- .Compiler effectiveness for single branch delay slot: – Fills about 60% of branch delay slots – About 80% of instructions executed in branch delay slots useful in computation – About 50% (60% x 80%) of slots usefully filled CSE 240A Dean Tullsen Key Points.Hard to keep the pipeline completely full.
- Branch Delay Slot Similarly, due to the parallel execution of the instructions, by the time the branch target gets resolved the following instruction would have been fetched, in other words, the instruction following a branch always executes whether the branch is taken or not.
- I've spoken about the branch instructions before (and delay slots). Memory accesses also have load delay slots. For MIPS, the best case situation is that the memory address is in the cache, which results in a 1-cycle hazard due to an optimized path between DC ('Data-cache' stage) and RF.
I am dealing with a standard MIPS architecture. If I have a branch instruction, for instance, beq, I know the results of the comparison in execute. However, the branching logic is actually in memory.
Filling a delay slot in 32bit jump instructions with a 16bit instruction can cause issues. According to documentation such an operation is unpredictable. Multiple test from test-suite that fail on microMIPS show this spot as source of failure. This patch adds opcode Mips::PseudoIndirectBranch_MM alongside Mips::PseudoIndirectBranch and other instructions that are expanded to jr instruction and do not allow a 16bit instruction in their delay slots. Event Timelinembrkusanin created this revision.Feb 21 2019, 7:30 AM Herald added subscribers: arichardson, sdardis. · View Herald TranscriptFeb 21 2019, 7:30 AM sdardis accepted this revision.Feb 21 2019, 11:17 AM Comment ActionsLGTM apart from some minor nits. Please address them before committing.
This revision is now accepted and ready to land.Feb 21 2019, 11:17 AM sdardis added a subscriber: llvm-commits.Feb 21 2019, 1:47 PM Comment ActionsSorry I didn't spot this earlier, but in future please ensure 'llvm-commits' is one of the subscribers when creating a review request for LLVM. If you add it after creating a review request, manually add it and write something in the comments field to trigger Phabricator into sending an email or abandon the review request and re-open it with the relevant -commits list as an initial subscriber. Posting review requests without the relevant -commits list means that only the subscribers added, subscribers added through Herald rules and initial reviewers will see the request. It is policy that patches are emailed to the relevant list for review[1]. Submitting patches through Phabricator is fine, provided the relevant -commits list is in the subscribers. Thanks. [1] http://www.llvm.org/docs/DeveloperPolicy.html#making-and-submitting-a-patch mbrkusanin updated this revision to Diff 187927.Feb 22 2019, 4:59 AM Comment Actionsmbrkusanin marked 5 inline comments as done.Feb 22 2019, 5:01 AM Closed by commit rL354672: [mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM (authored by petarj). · Explain WhyFeb 22 2019, 6:55 AM This revision was automatically updated to reflect the committed changes. Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2019, 6:55 AM Herald added a subscriber: jrtc27. · View Herald Transcript
|
Part 6: Jumps and Branches
The CPU will execute instructions in sequential order from top to bottom. This is good for simple programs but what if we want to do more than just add two numbers? For more complex programs, we may need loops or other conditions and these are accomplished through jumps and branches.
What is a branch?
A branch is like walking and reaching a fork in the road. There are two paths from where you are standing and depending on which path you choose, you will go somewhere else.
When disassembling a program and viewing the graph mode, a typical representation of a branch may look like this:
The fork in the road is where the end of the first block occurs where there is a green arrow and a red arrow leading to separate blocks of code. The green path will be taken if the branch condition is true and the red path will be taken if the branch condition is false.
In this example, bnez $v0, 0x814, if $v0 does NOT contain the number 0, then the code will take the green path and otherwise take the red path.
Branch Delay Slot
Branching seems simple enough; however, in MIPS implementation it is a little bit more nuanced.
When the processor executes an instruction, the program counter is advanced during the Instruction Fetch (IF) stage.
Due to the pipeline structure where when a jump or branch is being executed and the instruction afterwards would also be put into the pipeline, MIPS implements something called the branch delay slot.
Control hazards occur because the $pc after a branch is not known until it is figured out if the branch should be taken or not.
Mips Instruction Set Branch Delay Slot
Instead of throwing the next instruction away when taking the branch, the instruction that is directly below a branch or jump always runs whether the branch is taken or not. This is why the instruction position after a branch or jump is called the branch delay slot.
Mips Branch Delay Slot Instructions
As mentioned before, $pc is modified by 4 after the instruction fetch stage; however, when the branch is taken, $pc is modified during the branch instruction's execution (EX) stage. This affects what is chosen to be the next instruction after the branch delay slot that will be brought into the pipeline.
For the following instructions:
If the branch is taken, lw $v0, -0x7fd0($gp)
from address 0x814
will be put into the pipeline.
$pc | Cycles | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
0x7e4 | lw $v0, 0x18($fp) | IF | ID | EX | ME | WB | ||||
0x7e8 | bnez $v0, 0x814 | IF | ID | EX | ME | WB | ||||
0x7ec | noop | IF | ID | EX | ME | WB | ||||
0x814 | lw $v0, -0x7fd0($gp) | IF | ID | EX | ME | WB | ||||
0x818 | addiu $a0, $v0, str.x_is_NOT_0 | IF | ID | EX | ME | WB |
If the branch is NOT taken, $pc has not been modified by the branch instruction's execution stage, so the instruction from address 0x7f0
will be in the pipeline:
$pc | Cycles | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
0x7e4 | lw $v0, 0x18($fp) | IF | ID | EX | ME | WB | ||||
0x7e8 | bnez $v0, 0x814 | IF | ID | EX | ME | WB | ||||
0x7ec | noop | IF | ID | EX | ME | WB | ||||
0x7f0 | lw $v0, -0x7fd0($gp) | IF | ID | EX | ME | WB | ||||
0x7f4 | addiu $a0, $v0, str.x_is_0 | IF | ID | EX | ME | WB |
Note: When no work is to be performed upon branching, often a noop is placed in the branch delay slot.
So what is a jump?
Jumps and branches both modify the program counter ($pc) in order to change code flow of the program and both utilize the branch delay slot; however, they are different based on how they modify the program counter.
Branches are conditional and are used to provide logic to the program and make it do different things. Branches require several bits in the machine code instruction for the condition so they have less bits to use for the location of the branch. That's why branches use a specified offset from the current program counter and can't go as far as a jump.
Jumps are unconditional and have more bits to go to a specified 26-bit memory address.
In short, jumps can go to code that is a greater distance in memory using an absolute address whereas branches have a shorter range and are relative because they are conditional.
Further Reading
1. Branch Delay Slot Tricks
2. Delay Slot [ Wikipedia ]
2. Branch Prediction Schemes (IA State)