What information exactly does an instruction cache store?What determines the number of bits for the address field in a cache memory?Implementing multiplicationISA efficiency code compaction and memory traffic5-stage pipelined implementation (RISC) of a microprocessorWhat CPUs use a skewed associative cache?Does a CPU completely freeze when using a DMA?ARM architecture questionwhat exactly is single cycle instruction architectures?How Specifically the Control Unit (CU) WorksCache access time for write back and write through caches
Should breaking down something like a door be adjudicated as an attempt to beat its AC and HP, or as an ability check against a set DC?
Adding spaces to string based on list
Is the Indo-European language family made up?
Is it possible to build VPN remote access environment without VPN server?
Binary Search in C++17
How to know if a folder is a symbolic link?
Why does the 6502 have the BIT instruction?
Line of lights moving in a straight line , with a few following
Why aren't space telescopes put in GEO?
How to respond to an upset student?
What are the real benefits of using Salesforce DX?
What is a Centaur Thief's climbing speed?
Website returning plaintext password
Is "cool" appropriate or offensive to use in IMs?
Is it possible to play as a necromancer skeleton?
How strong are Wi-Fi signals?
Compactness of finite sets
Looking for a soft substance that doesn't dissolve underwater
Employer asking for online access to bank account - Is this a scam?
Construct a word ladder
My employer faked my resume to acquire projects
Is CD audio quality good enough?
Why do airplanes use an axial flow jet engine instead of a more compact centrifugal jet engine?
German equivalent of the French expression "Mais de là à ..."
What information exactly does an instruction cache store?
What determines the number of bits for the address field in a cache memory?Implementing multiplicationISA efficiency code compaction and memory traffic5-stage pipelined implementation (RISC) of a microprocessorWhat CPUs use a skewed associative cache?Does a CPU completely freeze when using a DMA?ARM architecture questionwhat exactly is single cycle instruction architectures?How Specifically the Control Unit (CU) WorksCache access time for write back and write through caches
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
Processors use both data and instruction caches in order to reduce the number of slow accesses to main memory. However, while it is clear to me that the data cache's purpose is to store frequently used data items (such as elements in an array or inside a loop), I cannot see what exactly the instruction cache stores that helps alleviate memory access times.
In the image above, we have an example of an "addi" instruction which adds a constant value to the value stored in general purpose register "r2" and writes the result to general purpose register "r1".
After this instruction is executed, what exactly is saved to the cache?
- It can't just be the opcode - most CPU instruction sets contain just a few hundred unique opcodes or less, so if the instruction cache was pre-loaded with all possible opcodes, it would always have a 100% hit rate. However, that would defeat the purpose of having a cache, plus I've read that instruction cache misses are very much possible.
- It can't be the values from main memory which are loaded into the general purpose registers, since that's exactly what the data cache is for.
Thank you in advance.
memory cpu computer-architecture cache
$endgroup$
add a comment |
$begingroup$
Processors use both data and instruction caches in order to reduce the number of slow accesses to main memory. However, while it is clear to me that the data cache's purpose is to store frequently used data items (such as elements in an array or inside a loop), I cannot see what exactly the instruction cache stores that helps alleviate memory access times.
In the image above, we have an example of an "addi" instruction which adds a constant value to the value stored in general purpose register "r2" and writes the result to general purpose register "r1".
After this instruction is executed, what exactly is saved to the cache?
- It can't just be the opcode - most CPU instruction sets contain just a few hundred unique opcodes or less, so if the instruction cache was pre-loaded with all possible opcodes, it would always have a 100% hit rate. However, that would defeat the purpose of having a cache, plus I've read that instruction cache misses are very much possible.
- It can't be the values from main memory which are loaded into the general purpose registers, since that's exactly what the data cache is for.
Thank you in advance.
memory cpu computer-architecture cache
$endgroup$
$begingroup$
Why do you think it matters to the cache if a particular instruction was executed or not? Instructions usually don't change at runtime.
$endgroup$
– Dmitry Grigoryev
May 14 at 6:50
add a comment |
$begingroup$
Processors use both data and instruction caches in order to reduce the number of slow accesses to main memory. However, while it is clear to me that the data cache's purpose is to store frequently used data items (such as elements in an array or inside a loop), I cannot see what exactly the instruction cache stores that helps alleviate memory access times.
In the image above, we have an example of an "addi" instruction which adds a constant value to the value stored in general purpose register "r2" and writes the result to general purpose register "r1".
After this instruction is executed, what exactly is saved to the cache?
- It can't just be the opcode - most CPU instruction sets contain just a few hundred unique opcodes or less, so if the instruction cache was pre-loaded with all possible opcodes, it would always have a 100% hit rate. However, that would defeat the purpose of having a cache, plus I've read that instruction cache misses are very much possible.
- It can't be the values from main memory which are loaded into the general purpose registers, since that's exactly what the data cache is for.
Thank you in advance.
memory cpu computer-architecture cache
$endgroup$
Processors use both data and instruction caches in order to reduce the number of slow accesses to main memory. However, while it is clear to me that the data cache's purpose is to store frequently used data items (such as elements in an array or inside a loop), I cannot see what exactly the instruction cache stores that helps alleviate memory access times.
In the image above, we have an example of an "addi" instruction which adds a constant value to the value stored in general purpose register "r2" and writes the result to general purpose register "r1".
After this instruction is executed, what exactly is saved to the cache?
- It can't just be the opcode - most CPU instruction sets contain just a few hundred unique opcodes or less, so if the instruction cache was pre-loaded with all possible opcodes, it would always have a 100% hit rate. However, that would defeat the purpose of having a cache, plus I've read that instruction cache misses are very much possible.
- It can't be the values from main memory which are loaded into the general purpose registers, since that's exactly what the data cache is for.
Thank you in advance.
memory cpu computer-architecture cache
memory cpu computer-architecture cache
asked May 13 at 16:39
MartinXMartinX
395
395
$begingroup$
Why do you think it matters to the cache if a particular instruction was executed or not? Instructions usually don't change at runtime.
$endgroup$
– Dmitry Grigoryev
May 14 at 6:50
add a comment |
$begingroup$
Why do you think it matters to the cache if a particular instruction was executed or not? Instructions usually don't change at runtime.
$endgroup$
– Dmitry Grigoryev
May 14 at 6:50
$begingroup$
Why do you think it matters to the cache if a particular instruction was executed or not? Instructions usually don't change at runtime.
$endgroup$
– Dmitry Grigoryev
May 14 at 6:50
$begingroup$
Why do you think it matters to the cache if a particular instruction was executed or not? Instructions usually don't change at runtime.
$endgroup$
– Dmitry Grigoryev
May 14 at 6:50
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
It literally stores lines of machine code from program memory (aka the entire instruction you line in your original post.
The fact you even discuss "storing all possible op codes in cache" points to a deeper misunderstanding. Talking about storing all possible op codes in cache (or any memory for that matter) has no meaning. All the possible opcodes that the processor can run are hard-wired into the logic circuitry of the processor. They aren't "stored" anywhere.
$endgroup$
3
$begingroup$
Note that this is only true for most CPUs. Recent Intel x86s store decoded micro-operations (ie. the output of an early stage of the execution process), and I think AMD may have also switched to a micro-op cache rather than a strict instruction cache.
$endgroup$
– Mark
May 13 at 20:29
2
$begingroup$
@MartinX When you say "the entire instructions were somehow hard wired" are you saying that you thought something like entire "ADD, Reg1, Reg2" was hardwired? And then something like "ADD, Reg2, Reg3" was a separate hard-wiring? Because that's not the case. Not every possible combination of opcode and argument has a unique circuitry hard-wired into the CPU..
$endgroup$
– Toor
May 13 at 21:50
6
$begingroup$
@Mark: Intel P4 had a trace cache instead of an L1i cache. This worked out badly and was a big bottleneck (because it was slow to build traces on misses with its weak decoders). Intel since Sandybridge (realworldtech.com/sandy-bridge) and AMD since Zen still have regular L1i caches that cache x86 machine code bytes, but also have smaller very fast decoded-uop caches. They still have powerful decoders for good throughput on uop cache misses, and it's not a trace cache. (A uop cache line can only cache contiguous uops from one 32B chunk, instead of following jumps.)
$endgroup$
– Peter Cordes
May 13 at 22:04
5
$begingroup$
@Mark: Some older AMD CPUs do store extra metadata alongside the L1i cache: they mark instruction boundaries in the cache to speed up decode. See Agner Fog's microarch pdf. Also David Kanter mentions the pre-decode metadata in realworldtech.com/bulldozer/4. More info about it in his K10 write-up: realworldtech.com/barcelona/4
$endgroup$
– Peter Cordes
May 13 at 22:10
3
$begingroup$
@Toor: Intel calls their decoded-uop cache the "Decode Stream Buffer (DSB)" including in HW perf counter event names. Physically, its very much built as an associative cache with each "way" of a set holding up to 6 uops. It's indexed and tagged by virtual address (so it bypasses TLB lookups). Of course caches are built out of "tightly coupled" SRAM arrays, but what makes them caches is the management system and the lookup / indexing mechanism.
$endgroup$
– Peter Cordes
May 13 at 22:16
|
show 3 more comments
$begingroup$
The Instruction cache stores the most recently used instructions and their addresses so that if an instruction needs to be repeated it doesn't have to be retrieved from main memory - this is much quicker.
For example the first time a loop is performed the instructions will be retrieved from main memory and simultaneously placed into the cache. On subsequent iterations of the loop the instructions can then be quickly retrieved from the fast cache memory.
The addresses are stored in the cache together with information that indicates whether the cache is up-to-date so the CPU control knows whether it can use the cached instructions or needs to go to main memory.
$endgroup$
5
$begingroup$
Good answer. May be worth emphasizing that the instructions are placed into cache as they are retrieved from memory (and indeed, before they are executed) to clear up the OP's potential misunderstanding that the instruction is saved to cache "after [it] is executed."
$endgroup$
– Shamtam
May 13 at 17:07
add a comment |
$begingroup$
The instruction cache stores the individual instructions for the CPU of the currently executing program. It is the program itself. Main memory is often too slow (or has too much latency) to be able to feed the CPU its next instruction every time it is ready for one. This is why a fast cache near the CPU is used, this is the instruction cache.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("schematics", function ()
StackExchange.schematics.init();
);
, "cicuitlab");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "135"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f438294%2fwhat-information-exactly-does-an-instruction-cache-store%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It literally stores lines of machine code from program memory (aka the entire instruction you line in your original post.
The fact you even discuss "storing all possible op codes in cache" points to a deeper misunderstanding. Talking about storing all possible op codes in cache (or any memory for that matter) has no meaning. All the possible opcodes that the processor can run are hard-wired into the logic circuitry of the processor. They aren't "stored" anywhere.
$endgroup$
3
$begingroup$
Note that this is only true for most CPUs. Recent Intel x86s store decoded micro-operations (ie. the output of an early stage of the execution process), and I think AMD may have also switched to a micro-op cache rather than a strict instruction cache.
$endgroup$
– Mark
May 13 at 20:29
2
$begingroup$
@MartinX When you say "the entire instructions were somehow hard wired" are you saying that you thought something like entire "ADD, Reg1, Reg2" was hardwired? And then something like "ADD, Reg2, Reg3" was a separate hard-wiring? Because that's not the case. Not every possible combination of opcode and argument has a unique circuitry hard-wired into the CPU..
$endgroup$
– Toor
May 13 at 21:50
6
$begingroup$
@Mark: Intel P4 had a trace cache instead of an L1i cache. This worked out badly and was a big bottleneck (because it was slow to build traces on misses with its weak decoders). Intel since Sandybridge (realworldtech.com/sandy-bridge) and AMD since Zen still have regular L1i caches that cache x86 machine code bytes, but also have smaller very fast decoded-uop caches. They still have powerful decoders for good throughput on uop cache misses, and it's not a trace cache. (A uop cache line can only cache contiguous uops from one 32B chunk, instead of following jumps.)
$endgroup$
– Peter Cordes
May 13 at 22:04
5
$begingroup$
@Mark: Some older AMD CPUs do store extra metadata alongside the L1i cache: they mark instruction boundaries in the cache to speed up decode. See Agner Fog's microarch pdf. Also David Kanter mentions the pre-decode metadata in realworldtech.com/bulldozer/4. More info about it in his K10 write-up: realworldtech.com/barcelona/4
$endgroup$
– Peter Cordes
May 13 at 22:10
3
$begingroup$
@Toor: Intel calls their decoded-uop cache the "Decode Stream Buffer (DSB)" including in HW perf counter event names. Physically, its very much built as an associative cache with each "way" of a set holding up to 6 uops. It's indexed and tagged by virtual address (so it bypasses TLB lookups). Of course caches are built out of "tightly coupled" SRAM arrays, but what makes them caches is the management system and the lookup / indexing mechanism.
$endgroup$
– Peter Cordes
May 13 at 22:16
|
show 3 more comments
$begingroup$
It literally stores lines of machine code from program memory (aka the entire instruction you line in your original post.
The fact you even discuss "storing all possible op codes in cache" points to a deeper misunderstanding. Talking about storing all possible op codes in cache (or any memory for that matter) has no meaning. All the possible opcodes that the processor can run are hard-wired into the logic circuitry of the processor. They aren't "stored" anywhere.
$endgroup$
3
$begingroup$
Note that this is only true for most CPUs. Recent Intel x86s store decoded micro-operations (ie. the output of an early stage of the execution process), and I think AMD may have also switched to a micro-op cache rather than a strict instruction cache.
$endgroup$
– Mark
May 13 at 20:29
2
$begingroup$
@MartinX When you say "the entire instructions were somehow hard wired" are you saying that you thought something like entire "ADD, Reg1, Reg2" was hardwired? And then something like "ADD, Reg2, Reg3" was a separate hard-wiring? Because that's not the case. Not every possible combination of opcode and argument has a unique circuitry hard-wired into the CPU..
$endgroup$
– Toor
May 13 at 21:50
6
$begingroup$
@Mark: Intel P4 had a trace cache instead of an L1i cache. This worked out badly and was a big bottleneck (because it was slow to build traces on misses with its weak decoders). Intel since Sandybridge (realworldtech.com/sandy-bridge) and AMD since Zen still have regular L1i caches that cache x86 machine code bytes, but also have smaller very fast decoded-uop caches. They still have powerful decoders for good throughput on uop cache misses, and it's not a trace cache. (A uop cache line can only cache contiguous uops from one 32B chunk, instead of following jumps.)
$endgroup$
– Peter Cordes
May 13 at 22:04
5
$begingroup$
@Mark: Some older AMD CPUs do store extra metadata alongside the L1i cache: they mark instruction boundaries in the cache to speed up decode. See Agner Fog's microarch pdf. Also David Kanter mentions the pre-decode metadata in realworldtech.com/bulldozer/4. More info about it in his K10 write-up: realworldtech.com/barcelona/4
$endgroup$
– Peter Cordes
May 13 at 22:10
3
$begingroup$
@Toor: Intel calls their decoded-uop cache the "Decode Stream Buffer (DSB)" including in HW perf counter event names. Physically, its very much built as an associative cache with each "way" of a set holding up to 6 uops. It's indexed and tagged by virtual address (so it bypasses TLB lookups). Of course caches are built out of "tightly coupled" SRAM arrays, but what makes them caches is the management system and the lookup / indexing mechanism.
$endgroup$
– Peter Cordes
May 13 at 22:16
|
show 3 more comments
$begingroup$
It literally stores lines of machine code from program memory (aka the entire instruction you line in your original post.
The fact you even discuss "storing all possible op codes in cache" points to a deeper misunderstanding. Talking about storing all possible op codes in cache (or any memory for that matter) has no meaning. All the possible opcodes that the processor can run are hard-wired into the logic circuitry of the processor. They aren't "stored" anywhere.
$endgroup$
It literally stores lines of machine code from program memory (aka the entire instruction you line in your original post.
The fact you even discuss "storing all possible op codes in cache" points to a deeper misunderstanding. Talking about storing all possible op codes in cache (or any memory for that matter) has no meaning. All the possible opcodes that the processor can run are hard-wired into the logic circuitry of the processor. They aren't "stored" anywhere.
edited May 13 at 17:04
answered May 13 at 16:59
ToorToor
3,070319
3,070319
3
$begingroup$
Note that this is only true for most CPUs. Recent Intel x86s store decoded micro-operations (ie. the output of an early stage of the execution process), and I think AMD may have also switched to a micro-op cache rather than a strict instruction cache.
$endgroup$
– Mark
May 13 at 20:29
2
$begingroup$
@MartinX When you say "the entire instructions were somehow hard wired" are you saying that you thought something like entire "ADD, Reg1, Reg2" was hardwired? And then something like "ADD, Reg2, Reg3" was a separate hard-wiring? Because that's not the case. Not every possible combination of opcode and argument has a unique circuitry hard-wired into the CPU..
$endgroup$
– Toor
May 13 at 21:50
6
$begingroup$
@Mark: Intel P4 had a trace cache instead of an L1i cache. This worked out badly and was a big bottleneck (because it was slow to build traces on misses with its weak decoders). Intel since Sandybridge (realworldtech.com/sandy-bridge) and AMD since Zen still have regular L1i caches that cache x86 machine code bytes, but also have smaller very fast decoded-uop caches. They still have powerful decoders for good throughput on uop cache misses, and it's not a trace cache. (A uop cache line can only cache contiguous uops from one 32B chunk, instead of following jumps.)
$endgroup$
– Peter Cordes
May 13 at 22:04
5
$begingroup$
@Mark: Some older AMD CPUs do store extra metadata alongside the L1i cache: they mark instruction boundaries in the cache to speed up decode. See Agner Fog's microarch pdf. Also David Kanter mentions the pre-decode metadata in realworldtech.com/bulldozer/4. More info about it in his K10 write-up: realworldtech.com/barcelona/4
$endgroup$
– Peter Cordes
May 13 at 22:10
3
$begingroup$
@Toor: Intel calls their decoded-uop cache the "Decode Stream Buffer (DSB)" including in HW perf counter event names. Physically, its very much built as an associative cache with each "way" of a set holding up to 6 uops. It's indexed and tagged by virtual address (so it bypasses TLB lookups). Of course caches are built out of "tightly coupled" SRAM arrays, but what makes them caches is the management system and the lookup / indexing mechanism.
$endgroup$
– Peter Cordes
May 13 at 22:16
|
show 3 more comments
3
$begingroup$
Note that this is only true for most CPUs. Recent Intel x86s store decoded micro-operations (ie. the output of an early stage of the execution process), and I think AMD may have also switched to a micro-op cache rather than a strict instruction cache.
$endgroup$
– Mark
May 13 at 20:29
2
$begingroup$
@MartinX When you say "the entire instructions were somehow hard wired" are you saying that you thought something like entire "ADD, Reg1, Reg2" was hardwired? And then something like "ADD, Reg2, Reg3" was a separate hard-wiring? Because that's not the case. Not every possible combination of opcode and argument has a unique circuitry hard-wired into the CPU..
$endgroup$
– Toor
May 13 at 21:50
6
$begingroup$
@Mark: Intel P4 had a trace cache instead of an L1i cache. This worked out badly and was a big bottleneck (because it was slow to build traces on misses with its weak decoders). Intel since Sandybridge (realworldtech.com/sandy-bridge) and AMD since Zen still have regular L1i caches that cache x86 machine code bytes, but also have smaller very fast decoded-uop caches. They still have powerful decoders for good throughput on uop cache misses, and it's not a trace cache. (A uop cache line can only cache contiguous uops from one 32B chunk, instead of following jumps.)
$endgroup$
– Peter Cordes
May 13 at 22:04
5
$begingroup$
@Mark: Some older AMD CPUs do store extra metadata alongside the L1i cache: they mark instruction boundaries in the cache to speed up decode. See Agner Fog's microarch pdf. Also David Kanter mentions the pre-decode metadata in realworldtech.com/bulldozer/4. More info about it in his K10 write-up: realworldtech.com/barcelona/4
$endgroup$
– Peter Cordes
May 13 at 22:10
3
$begingroup$
@Toor: Intel calls their decoded-uop cache the "Decode Stream Buffer (DSB)" including in HW perf counter event names. Physically, its very much built as an associative cache with each "way" of a set holding up to 6 uops. It's indexed and tagged by virtual address (so it bypasses TLB lookups). Of course caches are built out of "tightly coupled" SRAM arrays, but what makes them caches is the management system and the lookup / indexing mechanism.
$endgroup$
– Peter Cordes
May 13 at 22:16
3
3
$begingroup$
Note that this is only true for most CPUs. Recent Intel x86s store decoded micro-operations (ie. the output of an early stage of the execution process), and I think AMD may have also switched to a micro-op cache rather than a strict instruction cache.
$endgroup$
– Mark
May 13 at 20:29
$begingroup$
Note that this is only true for most CPUs. Recent Intel x86s store decoded micro-operations (ie. the output of an early stage of the execution process), and I think AMD may have also switched to a micro-op cache rather than a strict instruction cache.
$endgroup$
– Mark
May 13 at 20:29
2
2
$begingroup$
@MartinX When you say "the entire instructions were somehow hard wired" are you saying that you thought something like entire "ADD, Reg1, Reg2" was hardwired? And then something like "ADD, Reg2, Reg3" was a separate hard-wiring? Because that's not the case. Not every possible combination of opcode and argument has a unique circuitry hard-wired into the CPU..
$endgroup$
– Toor
May 13 at 21:50
$begingroup$
@MartinX When you say "the entire instructions were somehow hard wired" are you saying that you thought something like entire "ADD, Reg1, Reg2" was hardwired? And then something like "ADD, Reg2, Reg3" was a separate hard-wiring? Because that's not the case. Not every possible combination of opcode and argument has a unique circuitry hard-wired into the CPU..
$endgroup$
– Toor
May 13 at 21:50
6
6
$begingroup$
@Mark: Intel P4 had a trace cache instead of an L1i cache. This worked out badly and was a big bottleneck (because it was slow to build traces on misses with its weak decoders). Intel since Sandybridge (realworldtech.com/sandy-bridge) and AMD since Zen still have regular L1i caches that cache x86 machine code bytes, but also have smaller very fast decoded-uop caches. They still have powerful decoders for good throughput on uop cache misses, and it's not a trace cache. (A uop cache line can only cache contiguous uops from one 32B chunk, instead of following jumps.)
$endgroup$
– Peter Cordes
May 13 at 22:04
$begingroup$
@Mark: Intel P4 had a trace cache instead of an L1i cache. This worked out badly and was a big bottleneck (because it was slow to build traces on misses with its weak decoders). Intel since Sandybridge (realworldtech.com/sandy-bridge) and AMD since Zen still have regular L1i caches that cache x86 machine code bytes, but also have smaller very fast decoded-uop caches. They still have powerful decoders for good throughput on uop cache misses, and it's not a trace cache. (A uop cache line can only cache contiguous uops from one 32B chunk, instead of following jumps.)
$endgroup$
– Peter Cordes
May 13 at 22:04
5
5
$begingroup$
@Mark: Some older AMD CPUs do store extra metadata alongside the L1i cache: they mark instruction boundaries in the cache to speed up decode. See Agner Fog's microarch pdf. Also David Kanter mentions the pre-decode metadata in realworldtech.com/bulldozer/4. More info about it in his K10 write-up: realworldtech.com/barcelona/4
$endgroup$
– Peter Cordes
May 13 at 22:10
$begingroup$
@Mark: Some older AMD CPUs do store extra metadata alongside the L1i cache: they mark instruction boundaries in the cache to speed up decode. See Agner Fog's microarch pdf. Also David Kanter mentions the pre-decode metadata in realworldtech.com/bulldozer/4. More info about it in his K10 write-up: realworldtech.com/barcelona/4
$endgroup$
– Peter Cordes
May 13 at 22:10
3
3
$begingroup$
@Toor: Intel calls their decoded-uop cache the "Decode Stream Buffer (DSB)" including in HW perf counter event names. Physically, its very much built as an associative cache with each "way" of a set holding up to 6 uops. It's indexed and tagged by virtual address (so it bypasses TLB lookups). Of course caches are built out of "tightly coupled" SRAM arrays, but what makes them caches is the management system and the lookup / indexing mechanism.
$endgroup$
– Peter Cordes
May 13 at 22:16
$begingroup$
@Toor: Intel calls their decoded-uop cache the "Decode Stream Buffer (DSB)" including in HW perf counter event names. Physically, its very much built as an associative cache with each "way" of a set holding up to 6 uops. It's indexed and tagged by virtual address (so it bypasses TLB lookups). Of course caches are built out of "tightly coupled" SRAM arrays, but what makes them caches is the management system and the lookup / indexing mechanism.
$endgroup$
– Peter Cordes
May 13 at 22:16
|
show 3 more comments
$begingroup$
The Instruction cache stores the most recently used instructions and their addresses so that if an instruction needs to be repeated it doesn't have to be retrieved from main memory - this is much quicker.
For example the first time a loop is performed the instructions will be retrieved from main memory and simultaneously placed into the cache. On subsequent iterations of the loop the instructions can then be quickly retrieved from the fast cache memory.
The addresses are stored in the cache together with information that indicates whether the cache is up-to-date so the CPU control knows whether it can use the cached instructions or needs to go to main memory.
$endgroup$
5
$begingroup$
Good answer. May be worth emphasizing that the instructions are placed into cache as they are retrieved from memory (and indeed, before they are executed) to clear up the OP's potential misunderstanding that the instruction is saved to cache "after [it] is executed."
$endgroup$
– Shamtam
May 13 at 17:07
add a comment |
$begingroup$
The Instruction cache stores the most recently used instructions and their addresses so that if an instruction needs to be repeated it doesn't have to be retrieved from main memory - this is much quicker.
For example the first time a loop is performed the instructions will be retrieved from main memory and simultaneously placed into the cache. On subsequent iterations of the loop the instructions can then be quickly retrieved from the fast cache memory.
The addresses are stored in the cache together with information that indicates whether the cache is up-to-date so the CPU control knows whether it can use the cached instructions or needs to go to main memory.
$endgroup$
5
$begingroup$
Good answer. May be worth emphasizing that the instructions are placed into cache as they are retrieved from memory (and indeed, before they are executed) to clear up the OP's potential misunderstanding that the instruction is saved to cache "after [it] is executed."
$endgroup$
– Shamtam
May 13 at 17:07
add a comment |
$begingroup$
The Instruction cache stores the most recently used instructions and their addresses so that if an instruction needs to be repeated it doesn't have to be retrieved from main memory - this is much quicker.
For example the first time a loop is performed the instructions will be retrieved from main memory and simultaneously placed into the cache. On subsequent iterations of the loop the instructions can then be quickly retrieved from the fast cache memory.
The addresses are stored in the cache together with information that indicates whether the cache is up-to-date so the CPU control knows whether it can use the cached instructions or needs to go to main memory.
$endgroup$
The Instruction cache stores the most recently used instructions and their addresses so that if an instruction needs to be repeated it doesn't have to be retrieved from main memory - this is much quicker.
For example the first time a loop is performed the instructions will be retrieved from main memory and simultaneously placed into the cache. On subsequent iterations of the loop the instructions can then be quickly retrieved from the fast cache memory.
The addresses are stored in the cache together with information that indicates whether the cache is up-to-date so the CPU control knows whether it can use the cached instructions or needs to go to main memory.
edited May 13 at 17:15
answered May 13 at 17:01
Kevin WhiteKevin White
13.4k11623
13.4k11623
5
$begingroup$
Good answer. May be worth emphasizing that the instructions are placed into cache as they are retrieved from memory (and indeed, before they are executed) to clear up the OP's potential misunderstanding that the instruction is saved to cache "after [it] is executed."
$endgroup$
– Shamtam
May 13 at 17:07
add a comment |
5
$begingroup$
Good answer. May be worth emphasizing that the instructions are placed into cache as they are retrieved from memory (and indeed, before they are executed) to clear up the OP's potential misunderstanding that the instruction is saved to cache "after [it] is executed."
$endgroup$
– Shamtam
May 13 at 17:07
5
5
$begingroup$
Good answer. May be worth emphasizing that the instructions are placed into cache as they are retrieved from memory (and indeed, before they are executed) to clear up the OP's potential misunderstanding that the instruction is saved to cache "after [it] is executed."
$endgroup$
– Shamtam
May 13 at 17:07
$begingroup$
Good answer. May be worth emphasizing that the instructions are placed into cache as they are retrieved from memory (and indeed, before they are executed) to clear up the OP's potential misunderstanding that the instruction is saved to cache "after [it] is executed."
$endgroup$
– Shamtam
May 13 at 17:07
add a comment |
$begingroup$
The instruction cache stores the individual instructions for the CPU of the currently executing program. It is the program itself. Main memory is often too slow (or has too much latency) to be able to feed the CPU its next instruction every time it is ready for one. This is why a fast cache near the CPU is used, this is the instruction cache.
$endgroup$
add a comment |
$begingroup$
The instruction cache stores the individual instructions for the CPU of the currently executing program. It is the program itself. Main memory is often too slow (or has too much latency) to be able to feed the CPU its next instruction every time it is ready for one. This is why a fast cache near the CPU is used, this is the instruction cache.
$endgroup$
add a comment |
$begingroup$
The instruction cache stores the individual instructions for the CPU of the currently executing program. It is the program itself. Main memory is often too slow (or has too much latency) to be able to feed the CPU its next instruction every time it is ready for one. This is why a fast cache near the CPU is used, this is the instruction cache.
$endgroup$
The instruction cache stores the individual instructions for the CPU of the currently executing program. It is the program itself. Main memory is often too slow (or has too much latency) to be able to feed the CPU its next instruction every time it is ready for one. This is why a fast cache near the CPU is used, this is the instruction cache.
edited May 13 at 19:11
answered May 13 at 16:49
evildemonicevildemonic
3,05911027
3,05911027
add a comment |
add a comment |
Thanks for contributing an answer to Electrical Engineering Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f438294%2fwhat-information-exactly-does-an-instruction-cache-store%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Why do you think it matters to the cache if a particular instruction was executed or not? Instructions usually don't change at runtime.
$endgroup$
– Dmitry Grigoryev
May 14 at 6:50