Monday, October 14, 2019
Cache Memory Plays A Lead Role Information Technology Essay
Cache Memory Plays A Lead Role Information Technology Essay Answer: Cache (prominent and pronounced as cash) memory is enormously and extremely fast memory that is built into a computers central processing unit (CPU) or located next to it on a separate chip. The CPU uses cache memory to store instructions that are repeatedly required to run programs, improving overall system speed. It helps CPU to accessing for frequently or recently accessed data. C:UsersraushanPicturespage36-1.jpg References: http://www.wisegeek.com/what-is-cache-memory.htm Reason for Cache Memory: There are various reasons for using Cache in the computer some of the reason is mentioning following. The RAM is comparatively very slow as compared to System CPU and it is also far from the CPU (connected through Bus), so there is need to add another small size memory which is very near to the CPU and also very fast so that the CPU will not remain in deadlock mode while it waiting resources from main memory. this memory is known as Cache memory. This is also a RAM but is very high speed as compare to Primary memory i.e. RAM. In Speed CPU works in femto or nano seconds the distance also plays a major role in case of performance. Cache memory is designed to supply the CPU with the most frequently requested data and instructions. Because retrieving data from cache takes a fraction of the time that it takes to access it from main memory, having cache memory can save a lot of time. Whenever we work on more than one application. This cache memory is use to keep control and locate the running application within fraction of nano seconds. It enhances performance capability of the system. Cache memory directly communicates with the processor. It is used preventing mismatch between processor and memory while switching from one application two another instantaneously whenever needed by user. It keeps track of all currently working applications and their currently used resources. For example, a web browser stores newly visited web pages in a cache directory, so that we can return promptly to the page without requesting it from the original server. When we strike the Reload button, browser compares the cached page with the current page out on the network, and updates our local version if required. References: 1. http://www.kingston.com/tools/umg/umg03.asp 2. http://www.kingston.com/frroot/tools/umg/umg03.asp 3. http://ask.yahoo.com/19990329.html How Cache Works? Answer: The cache is programmed (in hardware) to hold recently-accessed memory locations in case they are needed again. So, each of these instructions will be saved in the cache after being loaded from memory the first time. The next time the processor wants to use the same instruction, it will check the cache first, see that the instruction it needs is there, and load it from cache instead of going to the slower system RAM. The number of instructions that can be buffered this way is a function of the size and design of the cache. The details of how cache memory works vary depending on the different cache controllers and processors, so I wont describe the exact details. In general, though, cache memory works by attempting to predict which memory the processor is going to need next, and loading that memory before the processor needs it, and saving the results after the processor is done with it. Whenever the byte at a given memory address is needed to be read, the processor attempts to get the data from the cache memory. If the cache doesnt have that data, the processor is halted while it is loaded from main memory into the cache. At that time memory around the required data is also loaded into the cache. When data is loaded from main memory to the cache, it will have to replace something that is already in the cache. So, when this happens, the cache determines if the memory that is going to be replaced has changed. If it has, it first saves the changes to main memory, and then loads the new data. The cache sys tem doesnt worry about data structures at all, but rather whether a given address in main memory is in the cache or not. In fact, if you are familiar with virtual memory where the hard drive is used to make it appear like a computer has more RAM than it really does, the cache memory is similar. Lets take a library as an example o how caching works. Imagine a large library but with only one librarian (the standard one CPU setup). The first person comes into the library and asks for A CSA book (By IRV Englander). The librarian goes off follows the path to the bookshelves (Memory Bus) retrieves the book and gives it to the person. The book is returned to the library once its finished with. Now without cache the book would be returned to the shelf. When the next person arrives and asks for CSA book (By IRV Englander), the same process happens and takes the same amount of time. Cache memory is like a hot list of instructions needed by the CPU. The memory manager saves in cache each instruction the CPU needs; each time the CPU gets an instruction it needs from cache that instruction moves to the top of the hot list. When cache is filled and the CPU calls for a new instruction, the system overwrites the data in cache that hasnt been used for the longest period of time. This way, the high priority information thats used continuously stays in cache, while the less frequently used information drops out after an Interval. Its similar to when u access a program frequently the program is listed on the start menu here need not have to find the program from the list on all programs u simply open the start menu and click on the program listed there, doesnt this saves Your time. Working of cache Pentium 4: Pentium 4: L1 cache (8k bytes, 64 byte lines, Four ways set associative) L2 cache (256k,128 byte lines,8 way set associative) References: http://computer.howstuffworks.com/cache.htm http://www.kingston.com/tools/umg/umg03.asp http://www.zak.ict.pwr.wroc.pl/nikodem/ak_materialy/Cache%20organization%20by%20Stallings.pdf Levels of Cache Level 1 Cache (L1): The Level 1 cache, or primary cache, is on the CPU and is used for temporary storage of instructions and data organised in blocks of 32 bytes. Primary cache is the fastest form of storage. Because its built in to the chip with a zero wait-state (delay) interface to the processors execution unit, it is limited in size. Level 1 cache is implemented using Static RAM (SRAM) and until recently was traditionally 16KB in size. SRAM uses two transistors per bit and can hold data without external assistance, for as long as power is supplied to the circuit. The second transistor controls the output of the first: a circuit known as a flip-flop so-called because it has two stable states which it can flip between. This is contrasted to dynamic RAM (DRAM), which must be refreshed many times per second in order to hold its data contents. Intels P55 MMX processor, launched at the start of 1997, was noteworthy for the increase in size of its Level 1 cache to 32KB. The AMD K6 and Cyrix M2 chips launched later that year upped the ante further by providing Level 1 caches of 64KB. 64Kb has remained the standard L1 cache size, though various multiple-core processors may utilise it differently. For all L1 cache designs the control logic of the primary cache keeps the most frequently used data and code in the cache and updates external memory only when the CPU hands over control to other bus masters, or during direct memory access by peripherals such as optical drives and sound cards. http://www.pctechguide.com/14Memory_L1_cache.htm ever_s1 Level 2 Cache (L2): Most PCs are offered with a Level 2 cache to bridge the processor/memory performance gap. Level 2 cache also referred to as secondary cache) uses the same control logic as Level 1 cache and is also implemented in SRAM. Level 2 caches typically comes in two sizes, 256KB or 512KB, and can be found, or soldered onto the motherboard, in a Card Edge Low Profile (CELP) socket or, more recently, on a COAST module. The latter resembles a SIMM but is a little shorter and plugs into a COAST socket, which is normally located close to the processor and resembles a PCI expansion slot. The aim of the Level 2 cache is to supply stored information to the processor without any delay (wait-state). For this purpose, the bus interface of the processor has a special transfer protocol called burst mode. A burst cycle consists of four data transfers where only the addresses of the first 64 are output on the address bus. The most common Level 2 cache is synchronous pipeline burst. To have a synchronous cache a chipset, such as Triton, is required to support it. It can provide a 3-5% increase in PC performance because it is timed to a clock cycle. This is achieved by use of specialised SRAM technology which has been develo ped to allow zero wait-state access for consecutive burst read cycles. There is also asynchronous cache, which is cheaper and slower because it isnt timed to a clock cycle. With asynchronous SRAM, available in speeds between 12 and 20ns, (http://www.pctechguide.com/14Memory_L2_cache.htm) 976 http://www.karbosguide.com/books/pcarchitecture/images/976.png (picture) L3 cache Level 3 cache is something of a luxury item. Often only high end workstations and servers need L3 cache. Currently for consumers only the Pentium 4 Extreme Edition even features L3 cache. L3 has been both on-die, meaning part of the CPU or external meaning mounted near the CPU on the motherboard. It comes in many sizes and speeds. The point of cache is to keep the processor pipeline fed with data. CPU cores are typically the fastest part in the computer. As a result cache is used to pre-read or store frequently used instructions and data for quick access. Cache acts as a high speed buffer memory to more quickly provide the CPU with data. So, the concept of CPU cache leveling is one of performance optimization for the processor. http://www.extremetech.com/article2/0,2845,1517372,00.asp The image below shows the complete cache hierarchy of the Shanghai processor. Barcelona also has a similar hierarchy except that it only has 2MB of L3 cache. L3_Cache_Architecture http://developer.amd.com/PublishingImages/L3_Cache_Architecture.jpg (picture) Cache Memory Organisation In a modern microprocessor several caches are found. They not only vary in size and functionality, but also their internal organization is typically different across the caches. Instruction Cache The instruction cache is used to store instructions. This helps to reduce the cost of going to memory to fetch instructions. The instruction cache regularly holds several other things, like branch prediction information. In certain cases, this cache can even perform some limited operation(s). The instruction cache on UltraSPARC, for example, also pre-decodes the incoming instruction. Data Cache A data cache is a fast buffer that contains the application data. Before the processor can operate on the data, it must be loaded from memory into the data cache. The element needed is then loaded from the cache line into a register and the instruction using this value can operate on it. The resultant value of the instruction is also stored in a register. The register contents are then stored back into the data cache. Eventually the cache line that this element is part of is copied back into the main memory. In some cases, the cache can be bypassed and data is stored into the registers directly. TLB Cache Translating a virtual page address to a valid physical address is rather costly. The TLB is a cache to store these translated addresses. Each entry in the TLB maps to an entire virtual memory page. The CPU can only operate on data and instructions that are mapped into the TLB. If this mapping is not present, the system has to re-create it, which is a relatively costly operation. The larger a page, the more effective capacity the TLB has. If an application does not make good use of the TLB (for example, random memory access) increasing the size of the page can be beneficial for performance, allowing for a bigger part of the address space to be mapped into the TLB. Some microprocessors, including UltraSPARC, implement two TLBs. One for pages containing instructions (I-TLB) and one for data pages (D-TLB). An Example of a typical cache organization is shown below: Cache Memory Principles à ¢Ã¢â ¬Ã ¢ Small amount of fast memory à ¢Ã¢â ¬Ã ¢ Placed between the processor and main memory à ¢Ã¢â ¬Ã ¢ Located either on the processor chip or on a separate module Cache Operation Overview Processor requests the contents of some memory location The cache is checked for the requested data If found, the requested word is delivered to the processor If not found, a block of main memory is first read into the cache, then therequested word is delivered to the processor When a block of data is fetched into the cache to satisfy a single memory reference, it is likely that there will be future references to that same memory location or to other words in the block locality or reference rule. Each block has a tag added to recognize it. Mapping Function An algorithm is needed to map main memory blocks into cache lines. A method is needed to determine which main memory block occupies a cache line. There are three techniques used: Direct Fully Associative Set Associative Direct Mapping: Direct mapped is a simple and efficient organization. The (virtual or physical) memory address of the incoming cache line controls which cache location is going to be used. Implementing this organization is straightforward and is relatively easy to make it scale with the processor clock. In a direct mapped organization, the replacement policy is built-in because cache line replacement is controlled by the (virtual or physical) memory address. Direct mapping assigned each memory block to a specific line in the cache. If a line is all ready taken up by a memory block when a new block needs to be loaded, the old block is trashed. The figure below shows how multiple blocks are mapped to the same line in the cache. This line is the only line that each of these blocks can be sent to. In the case of this figure, there are 8 bits in the block identification portion of the memory address. Consider a simple example-a 4-kilobyte cache with a line size of 32 bytes direct mapped on virtual addresses. Thus each load/store to cache moves 32 bytes. If one variable of type float takes 4 bytes on our system, each cache line will hold eight (32/4=8) such variables. http://csciwww.etsu.edu/tarnoff/labs4717/x86_sim/images/direct.gif The address for this broken down something like the following: Tag 8 bits identifying line in cache word id bits Direct mapping is simple and inexpensive to implement, but if a program accesses 2 blocks that map to the same line repeatedly, the cache begins to thrash back and forth reloading the line over and over again meaning misses are very high. Fully Associative: The fully associative cache design solves the potential problem of thrashing with a direct-mapped cache. The replacement policy is no longer a function of the memory address, but considers usage instead. With this design, typically the oldest cache line is evicted from the cache. This policy is called least recently used (LRU). In the previous example, LRU prevents the cache lines of a and b from being moved out prematurely. The downside of a fully associative design is cost. Additional logic is required to track usage of lines. The larger the cache size, the higher the cost. Therefore, it is difficult to scale this technology to very large (data) caches. Luckily, a good alternative exists. The address is broken into two parts: a tag used to identify which block is stored in which line of the cache (s bits) and a fixed number of LSB bits identifying the word within the blocks. à à à à à à à à à à à à à à à à Tagà à à à à à à à à à à à à à à à word id bits Set Associative: Set associative addresses the problem of possible thrashing in the direct mapping method. It does this by saying that instead of having exactly one line that a block can map to in the cache, we will group a few lines together creating a set. Then a block in memory can map to any one of the lines of a specific set. There is still only one set that the block can map to. à à à à à à à à à à à à à à à à Tagà à à à à à à à à à à à à à à à word id bits
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.