Skip to main content

The Buffer Cache



The Buffer Cache

Now we know that the files are stored on the hard drive and the processes can access these files and create new files on the disk. When a process requests for a file the kernel brings the file into the main memory where user process can change, read or access the file.
The kernel read the super block to figure out the details about the hard drive and the inode table to find the meta data information of any file. So the kernel reads the inode into the memory whenever any process want to access the data and write it back onto the hard disk when the process is done using the file.
The kernel could read and write the file directly from the hard disk and put it in memory and vice versa but the response time and throughput will be very low in this case because of disks sow data transfer speed.
To minimize the frequency of disk usage/access the kernel keeps a buffer to store the recently accessed files and/or frequently accessed files. This buffer is called the buffer cache.
When the process want to read a file the kernel attempts to read this file in the buffer cache, if the data is found in the buffer cache the data/file is sent to the process. If the file is not found in the buffer cache then the file is read from the disk and then kept in the buffer cache so that it can be made available to the process.
To minimize the disk access frequency the kernel may also implement the pre-caching or write delay functionalities.

Buffer Headers

When the system initializes the kernel allocates the space for the buffer cache. The buffer cache contains two regions/arts. One for the data/files that will be read from the disk, second the buffer header.
The data in the buffer cache corresponds to the logical blocks of the disk block of file system. The buffer cache is “in memory” representation of the disk blocks. This mapping is temporary as the kernel may wish t load some other files’ data into the cache at some later stage.
There will never be a case when the buffer has two entries for the same file on disk as this could lead to inconsistencies. There is only and only one copy of a file in the buffer.
The buffer header contains the metadata information like device number and the block number range for which this buffer holds the data. It stores the logical device number and not the physical device number. The buffer header also contains pointer to a data array for the buffer (i.e. pointer to the data region) .
The buffer header also contains the status of the buffer. The status of the buffer could be
 Locked/unlocked
 Buffer contains a valid data or not.
Whether the kernel should write the contents to disk immediately or before reassigning the buffer(write delay)
Kernel is currently reading the data or writing the data.

 Is there any process waiting for the buffer to get free.





Structure of the buffer pool

The kernel caches the least recently used data into the buffer pool. Once a balck from buffer pool is allocated for a file of the system this bliock cannot be used for any other file’s data. The kernel also maintains a free list of buffers. The free list is a doubly circular list of buffers.
When kernel wants to allocate any buffer it removes a node from the free list, usually from the beginning of list but is could take it from middle of the list too. When kernel frees a node from the buffer list it adds this free node at the end of the free list.



When kernel want to access the disk it searches the buffer pool for a particular device number-block number combination (which is maintained in the buffer header).  The entire buffer pool is organized as queues hashed as a function of device number-block number combination. The figure down below shows the buffers on their hash queues.

 
 The important thing to note here is that no two nodes in the buffer pool can contain the data of same disk block i.e. same file.

Scenarios of retrieval of buffer

High level kernel algorithms in file subsystem invoke the algorithms of buffer pool to manage the buffer cache.  The algorithm for reading and writing disk blocks uses the algorithm getblk to allocate buffer from the pool.

The five typical scenarios that kernel may follow in getblk to allocate a buffer in the disk block are
·      Block in the hash queue, and its buffer is free.
·      Cannot find block on the hash queue => allocate a buffer from free list.
·      Cannot find block on the hash queue => allocate a buffer from free list but buffer on the free list marked “delayed write” => flush “delayed write” buffer and allocate another buffer.
·      Cannot find block on the hash queue and free list of buffer also empty.
·      Block in the hash queue, but buffer is busy.
The first scenario - Block in the hash queue, and its buffer is free.


Before continuing to other scenarios lets see what happens after the buffer is allocated. The kernel may read the data, manipulate it and/or change it in the buffer. While doing so the kernel marks the buffer as busy so that no other process can access this block. When the kernel is done using this block it releases the buffer using brelse algorithm.
The second scenario - Cannot find block on the hash queue => allocate a buffer from free list.

The third scenario - Cannot find block on the hash queue => allocate a buffer from free list but buffer on the free list marked “delayed write” => flush “delayed write” buffer and allocate another buffer.
 







The fourth scenario - Cannot find block on the hash queue and free list of buffer also empty.
 





  The fifth scenario - Block in the hash queue, but buffer is busy. 




Algorithms for Reading and writing disk blocks




Advantages of the buffer cache

·      Uniform disk access => system design simpler
·      Copying data from user buffers to system buffers => eliminates the need for special alignment of user buffers.
·      Use of the buffer cache can reduce the amount of disk traffic.
·      Single image of of disk blocks contained in the cache => helps insure file system integrity

Disadvantages of the buffer cache

·      Delayed write => vulnerable to crashes that leave disk data in incorrect state
·      An extra data copy when reading and writing to and from user processes => slow down when transmitting large data
 

Anurag

Comments

Popular posts from this blog

JAVA Scrollbar, MenuItem and Menu, PopupMenu

ava AWT Scrollbar The  object  of Scrollbar class is used to add horizontal and vertical scrollbar. Scrollbar is a  GUI  component allows us to see invisible number of rows and columns. AWT Scrollbar class declaration public   class  Scrollbar  extends  Component  implements  Adjustable, Accessible   Java AWT Scrollbar Example import  java.awt.*;   class  ScrollbarExample{   ScrollbarExample(){               Frame f=  new  Frame( "Scrollbar Example" );               Scrollbar s= new  Scrollbar();               s.setBounds( 100 , 100 ,  50 , 100 );               f.add(s);   ...

Difference between net platform and dot net framework...

Difference between net platform and dot net framework... .net platform supports programming languages that are .net compatible. It is the platform using which we can build and develop the applications. .net framework is the engine inside the .net platform which actually compiles and produces the executable code. .net framework contains CLR(Common Language Runtime) and FCL(Framework Class Library) using which it produces the platform independent codes. What is the .NET Framework? The Microsoft .NET Framework is a platform for building, deploying, and running Web Services and applications. It provides a highly productive, standards-based, multi-language environment for integrating existing investments with next-generation applications and services as well as the agility to solve the challenges of deployment and operation of Internet-scale applications. The .NET Framework consists of three main parts: the common language runtime, a hierarchical set of unified class librari...

Standard and Formatted Input / Output in C++

The C++ standard libraries provide an extensive set of input/output capabilities which we will see in subsequent chapters. This chapter will discuss very basic and most common I/O operations required for C++ programming. C++ I/O occurs in streams, which are sequences of bytes. If bytes flow from a device like a keyboard, a disk drive, or a network connection etc. to main memory, this is called   input operation   and if bytes flow from main memory to a device like a display screen, a printer, a disk drive, or a network connection, etc., this is called   output operation . Standard Input and Output in C++ is done through the use of  streams . Streams are generic places to send or receive data. In C++, I/O is done through classes and objects defined in the header file  <iostream> .  iostream  stands for standard input-output stream. This header file contains definitions to objects like  cin ,  cout , etc. /O Library Header Files There are...