--------------------------------------------------------------------------------------------------------------------------
Prepared By : Uday Shah (HOD - IT)
E-Mail : rupareleducation@gmail.com
Contact No : 7600044051
Data Structure Unit 5 : File Structure and Hasing
Basic concepts of File and file systems:
1.     File System
Service.
·       
Without a file
system, information placed in a storage medium would be one large body of data
with no way to tell where one piece of information stops and the next begins. 
·       
By separating
the data into pieces and giving each piece a name, the information is easily
isolated and identified. 
·       
Taking its name
from the way paper-based information systems are named, each group of data is
called a "file". 
·       
The structure and
logic rules used to manage the groups of information and their names are called
a "file system".
·       
There are many
different kinds of file systems. 
·       
Each one has
different structure and logic, properties of speed, flexibility, security, size
and more. 
·       
Some file
systems have been designed to be used for specific applications. 
Following are some Attributes of a File
·       
Name:  It
is the only information which is in human-readable form.
·       
Identifier: The
file is identified by a unique tag number within file system.
·       
Type: It is
needed for systems that support different types of files.
·       
Location: Pointer
to file location on device.
·       
Size: The
current size of the file.
·       
Protection: This
controls and assigns the power of reading, writing, executing.
·       
Time, date, and
user identification. This is the data for protection, security, and usage
monitoring.
2.  Disk Space
Allocation 
The Disk Space Allocation methods define how the files are
stored in the disk blocks. There are three main disk space or file allocation
methods.
·        
Contiguous Allocation
·        
Linked Allocation
·        
Indexed Allocation
The main idea behind these methods is to provide:
·        
Efficient disk space utilization.
·        
Fast access to the file blocks.
1. Contiguous Allocation
In this each file occupies a
contiguous set of blocks on the disk. For example, if a file requires n blocks
and is given a block b as the starting location, then the blocks assigned to
the file will be: b, b+1, b+2,……b+n-1. 
This means that given the starting
block address and the length of the file in terms of blocks required. 
The directory entry for a file with
contiguous allocation contains
·        
Address
of starting block
·        
Length
of the allocated portion.
Following is figure to show Contiguous
Allocation.
Advantages:
·        
Both
the Sequential and Direct Accesses are supported by this. 
·        
Direct
access, the address of the kth block of the file which starts at
block b can easily be obtained as.
·        
This
is extremely fast since the number of seeks are minimal because of contiguous
allocation of file blocks.
Disadvantages:
·       
This
method suffers from both internal and external fragmentation. This makes it
inefficient in terms of memory utilization.
·       
Increasing
file size is difficult because it depends on the availability of contiguous
memory at a particular instance.
2. Linked List Allocation
·       
In this allocation
system each file is a linked list of disk blocks which need not be contiguous. 
·       
The disk blocks
can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block.
The directory entry contains a pointer to the starting and the ending file block.
·       
Each block
contains a pointer to the next block occupied by the file.
Advantages:
·        
This
is very flexible in terms of file size. File size can be increased easily since
the system does not have to look for a contiguous chunk of memory.
·        
This
method does not suffer from external fragmentation. This makes it relatively
better in terms of memory utilization.
Disadvantages:
·        
Because
the file blocks are distributed randomly on the disk, a large number of seeks
are needed to access every block individually. This makes linked allocation
slower.
·        
It
does not support random or direct access. We cannot directly access the blocks
of a file. 
·        
A
block k of a file can be accessed by traversing k blocks sequentially
sequential access from the starting block of the file via block pointers.
·        
Pointers
required in the linked allocation incur some extra overhead.
3. Indexed Allocation
·       
In this scheme, a
special block known as the Index block contains
the pointers to all the blocks occupied by a file. 
·       
Each file has its
own index block. The ith entry in the index block contains the disk address of
the ith file block. 
·       
The directory
entry contains the address of the index block as shown in the image:
Advantages:
·        
This
supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks.
·        
It
overcomes the problem of external fragmentation.
Disadvantages:
·        
The
pointer overhead for indexed allocation is greater than linked allocation.
·        
For
very small files, say files that expand only 2-3 blocks, the indexed allocation
would keep one entire block (index block) for the pointers which is inefficient
in terms of memory utilization. However, in linked allocation we lose the space
of only 1 pointer per block.
·        
For
files that are very large, single index block may not be able to hold all the
pointers.
MS
Dos Fat File System
·        
The File Allocation Table (FAT) file system is a simple file system originally
designed for small disks and simple folder structures.
·        
The FAT file
system is named for its method of organization, the file allocation table,
which resides at the beginning of the volume. To protect the volume, two copies
of the table are kept, in case one becomes damaged.
·        
In addition, the
file allocation tables and the root folder must be stored in a fixed location
so that the files needed to start the system can be correctly located.
·        
A volume
formatted with the FAT file system is allocated in clusters. The default
cluster size is determined by the size of the volume. For the FAT file system,
the cluster number must fit in 16 bits and must be a power of two.
·        
In Short file allocation
table, FAT is a method of keeping track of the contents of a Hard Drive used
by early Microsoft operating
systems that was first introduced in 1977. 
File Allocation Table
·        
A file allocation
table (FAT) is a file system developed for hard drives that originally used 12
or 16 bits for each cluster entry into the file allocation table. 
·        
It is used by the
operating system (OS) to manage files on hard drives and other computer
systems. It is often also found on in flash memory, digital cameras and
portable devices. 
·        
It is used to
store file information and extend the life of a hard drive.
·        
Most hard drives
require a process known as seeking, this is the actual physical searching and
positioning of the read/write head of the drive. 
·        
The FAT file
system was designed to reduce the amount of seeking and thus minimize the wear
and tear on the hard disc.
·        
FAT was designed
to support hard drives and subdirectories. 
A FAT file
system has four different sections, each as a structure in the FAT partition.
The four sections are:
- Boot Sector:
     
- This
     is also known as the reserved sector it is located on the first part of
     the disc. 
- It
     contains the OS's necessary boot loader code to start a PC system, the
     partition table known as the master boot record (MRB) that describes how
     the drive is organized, and the BIOS parameter block (BPB) which describes
     the physical outline of the data storage volume.
- FAT Region:
     
- This
     region generally encompasses two copies of the File Allocation Table which
     is for redundancy checking and specifies how the clusters are assigned.
- Data Region:
     This is where the directory data and existing files are stored. It uses up
     the majority of the partition.
- Root Directory Region: This region is a directory table that contains the
     information about the directories and files. 
- FAT32
     usually stores the root directory in the data region so it can be expanded
     if needed.
Tree Structure Directory System:
·       
In Tree
structured directory system, any directory entry can either be a file or sub
directory. 
·       
Tree structured
directory system overcomes the drawbacks of two level directory system. 
·       
The similar kind
of files can now be grouped in one directory.
·       
Each user has its
own directory and it cannot enter in the other user's directory. 
·       
However, the user
has the permission to read the root's data but he cannot write or modify this. 
·       
Only administrator
of the system has the complete access of root directory.
·       
Searching is more
efficient in this directory structure. 
·       
A file can be
accessed by two types of path, either relative or absolute.
·       
Absolute path is
the path of the file with respect to the root directory of the system while
relative path is the path with respect to the current working directory of the
system. 
·       
In tree
structured directory systems, the user is given the privilege to create the
files as well as directories.
File Organization
File organization ensures that records are available for processing. It is used to determine an efficient file organization for each base relation.
For example, if we want to retrieve employee records in alphabetical order of name. Sorting the file by employee name is a good file organization. However, if we want to retrieve all employees whose marks are in a certain range, a file is ordered by employee name would not be a good file organization.
Types
of File Organization
There are three types of organizing the file:
1. Sequential access file organization
2. Direct access file organization
3. Indexed sequential access files organization
1. Sequential access file organization
2. Direct access file organization
3. Indexed sequential access files organization
Concept of Field , Record and File
Field: A field is an item of stored data. A field could be a name, a date, an
address, a description, a quantity, etc. When a field is defined it is given a
name (identifier) and a type, that defines the type of data that will be stored
in that field. This is exactly the same as defining a variable within a
program.
For ex. 
          int rno;
          char
name [20];
          float
amt;
According to above rno, name and amt is field of
different type which helps to create a record.
Record: A record is the collection of fields that relate to a
single entity. For example, we could have a student record that includes fields
for the student’s name, address, homeroom, date of birth, etc. A product record
could include fields for the serial number, description, cost price, quantity
in stock, etc.
For
ex. 
          structure
stud 
          {
                   int
rno;
                   char
name [20];
                   float
amt;
};
According to above structure stud is on record and
rno, name and amt is field of structure stud with different type which helps to
create a record.
File: A file is a collection of related records. For
example, a student file might include all of the records of students enrolled
at a school. A police department might keep a file of criminal records, which
includes details about all known criminals. Files are stored on secondary
storage devices such as hard disks, CD-ROMs etc.
Within a file, all records usually have the same
structure. That is, every record in the file contains the same fields. Only the
data stored in the fields of different record will be different.
Indexes: An indexed
file is a computer file with
an index that
allows easy random access to
any record given
its file key.
The key must be such that it uniquely identifies a
record. If more than one index is present the other ones are called alternate
indexes. The indexes are created with the file and maintained by the
system.
Sequential access
file organization : Storing and sorting in contiguous block within
files on tape or disk is called as sequential access file organization. In
sequential access file organization, all records are stored in a sequential
order. The records are arranged in the ascending or descending order of a key
field. Sequential file search starts from the beginning of the file and the
records can be added at the end of the file. In sequential file, it is not
possible to add a record in the middle of the file without rewriting the file.
Advantages
of sequential file 
·       
It is simple to program and easy to design.
·       
Sequential file is best use if storage space.
Disadvantages of sequential file 
·       
Sequential file is time consuming process.
·       
It has high data redundancy.
·       
Random searching is not possible.
Indexed sequential access file organization : Indexed sequential access file combines both sequential file and direct access file organization. In indexed sequential access file, records are stored randomly on a direct access device such as magnetic disk by a primary key. This file have multiple keys. These keys can be alphanumeric in which the records are ordered is called primary key. The data can be access either sequentially or randomly using the index. The index is stored in a file and read into memory when the file is opened.
Advantages of Indexed sequential access file
organization
·       
In indexed sequential
access file, sequential file and random file access is possible.
·       
It accesses the records
very fast if the index table is properly organized.
·       
The records can be
inserted in the middle of the file.
·       
It provides quick access
for sequential and direct processing.
·       
It reduces the degree of
the sequential search.
Disadvantages of Indexed sequential access file
organization
·       
Indexed sequential
access file requires unique keys and periodic reorganization.
·       
Indexed sequential
access file takes longer time to search the index for the data access or
retrieval.
·       
It requires more storage
space.
·       
It is expensive because
it requires special software.
·       
It is less efficient in
the use of storage space as compared to other file organizations.
 What is Hashing?
·       
Hashing is the
process of mapping large amount of data item to smaller table with the help of
hashing function.
·       
Hashing is also
known as Hashing Algorithm or Message Digest Function.
·       
It is a
technique to convert a range of key values into a range of indexes of an array.
·       
It is used to
facilitate the next level searching method when compared with the linear or
binary search.
·       
Hashing allows updating
and retrieving any data entry in a constant time.
·       
Constant time means
the operation does not depend on the size of the data.
·       
Hashing is used
with a database to enable items to be retrieved more quickly.
·       
It is used in
the encryption and decryption of digital signatures.
What
is Hash Table?
·        
Hash table or hash map is a data structure used to store key-value
pairs.
·        
It is a collection of items stored to make it easy to find them
later.
·        
It uses a hash function to compute an index into an array of
buckets or slots from which the desired value can be found.
·        
It is an array of list where each list is known as bucket.
·        
It contains value based on the key.
·        
Hash table is used to implement the map interface and extends
Dictionary class.
·        
Hash table is synchronized and contains only unique elements.
The above figure shows the
hash table with the size of n = 10. Each position of the hash table is called
as Slot. In the above hash table, there are n slots in the table,
names = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Slot 0, slot 1, slot 2 and so on. Hash
table contains no items, so every slot is empty.
Liner
Hashing: 
·       
Linear Hashing is a dynamically updateable disk-based index
structure which implements a hashing scheme and which grows or shrinks one
bucket at a time. 
·       
The index is used to support exact match queries, i.e., find the
record with a given key. 
·       
Compared with the B Tree index which also supports exact match
queries, Linear Hashing has better expected query cost. 
·       
Compared with Extendable Hashing, Linear Hashing does not use a
bucket directory, and when an overflow occurs, it is not always the overflow n
bucket that is split. 
·       
The name Linear Hashing is used because the number of buckets
grows or shrinks in a linear fashion. 
·       
Overflows are handled by creating a chain of pages under the
overflow n bucket. 
The hashing function changes dynamically and at any given instant there can be at most two hashing functions used by the scheme.
:: Best Of Luck :: 
https://myshopprime.com/uday.shah3/shop
Please share and comment it... Thank you.





