1.3 Some basic Concepts
1.4 Advantages and Disadvantages of Database Management System
The traditional file-oriented approach to information processing has for each application a separate master file and its own set of personal files. COBOL language supported these file-oriented applications. It was used for developing applications such as of payroll, inventory, and financial accounting. However, in general an organization needs flow of information across these applications also and this requires sharing of data, which is very difficult to implement in the traditional file approach. In addition, a major limitation of file-based approach is that the programs are dependent on the files and the files are dependent upon the programs.
These file-based approaches, which came into being as the first commercial applications of computers, suffered from the following significant disadvantages:
A database is defined to be a collection of related information stored in a manner that many users share it for different purposes. The content of a database is obtained by integrating data from all the different sources at a centralized location (in general) in an organization. Such data is made available to all users as per his/her requirements and redundant data can be eliminated or at least minimized. The Data Base Management System (DBMS) governs to create an environment in which end user have better access to more and better managed data than they did before the DBMS become the data management standard.
Some of the common database applications are student database system, business inventory, accounting information, organisation data etc. There can be a database, which stores new paper articles, magazines, books, and comics. There is already a well-defined market for specific information using databases for highly selected group of users on almost all subjects. MEDLINE is a well-known database service providing medical information for doctors and similarly WESTLAW is a computer based information service catering to the requirements of lawyers. The key to making all this possible is the manner in which the information in the database is managed. Some commercially available DBMS are INGRES, ORACLE, DB2, Sybase etc. A database management system, therefore, is a combination of hardware and software that can be used to set up and monitor a database, and can manage the updating and retrieval of database that has been stored in it. Most database management systems support the following facilities/capabilities:
(a) Creation, modification and deletion of data file/s;
(b) Addition, modification, deletion of data;
(c) Retrieving of data collectively or selectively.
(d) Sorting or Indexing of data.
(e) Creation of input forms and output reports. There may be either standardized forms/reports or that may be specifically generated according to specific user definition.
(f) Manipulation of stored data with some mathematical functions, support for concurrent transactions
(g) To maintain data integrity and security.
(h) To create an environment for Data warehousing and Data mining.
Figure 1: The DBMS as an interface between physical Database and Users' requests
The DBMS interprets and processes users requests to retrieve information from a database. Figure 1 shows that a DBMS serves as an interface to various types of interactions. The user may key in a retrieval query directly from a terminal, or it may be coded in a high-level language program to be submitted for interactive or batch processing. In most cases, a query request will have to penetrate several layers of software in the DBMS and operating system before the physical database can be accessed.
The DBMS responds to a query by invoking the appropriate subprograms, each of which performs its special function to interpret the query, or to locate the desired data in the database and present it in the desired order as desired by the user. Thus, the DBMS shields database users from the programming they would have to do to organize data for storage, or to gain access to it once it was stored.
Thus, the role of the DBMS is an intermediary between the users and the database, which is very much, like the function of a salesperson in a consumers distributor system. A consumer specifies desired items by filling out an order form, which is submitted to a salesperson at the counter. The salesperson presents the specified items to consumer after they have been retrieved from the storage room. Consumers who place orders have no idea of where and how the items are stored; they simply select the desired items from an alphabetical list in a catalogue. However, the logical order of goods in the catalogue bears no relationship to the actual physical arrangement of the inventory in the storage room. Similarly, the database user needs to know only what data he or she requires; the DBMS will take care of retrieving it.
As discussed in the previous section that file oriented approach have major problems relating to fast changing needs of an organization. Does their any alternative approach exist? Let us examine the alternative approach that is the database approach in this section. But first let us look into the some basic concepts relating to database approach.
Data-items: The term data item is the word for what has traditionally been called the field in data processing and is the smallest unit of data that has meaning to its users. The phrase data element or elementary item is also sometimes used. Data items are grouped together to form aggregates described by various names. For example, the data record is used to refer to a group of data items and a program usually reads or writes the whole records. The data items could occasionally be further broken down into what may be called an automatic level for processing purposes.
For example, a data item such as a date would be a composite value comprising the day, month, and year. But for doing date arithmetic these may have to be first separated before the calculations are performed. Similarly an identification number may be a data item but it may contain further information embedded in it. For example, the IGNOU uses a 9-digit enrollment number. The first two digits of this number reflect the year of admission, the next two digits refer to the Regional Center where the student has first opted for admission, the next four digits are simple sequence numbers, and the last digit is a check digit. For purposes of processing, it may sometimes be necessary to split the data item.
Standardisation of data items can become a fairly serious problem in large organisations with several divisions. Each such unit tends to have its own ways of referring to the data items related to personal accounting, engineering, sales, production, purchase activities, etc. It would be extremely desirable if at the stage of adopting the database approach a commitment from the top management were acquired for prospective standardisation across the enterprise for schemas of the data items.
Entities and Attributes: The real world consists of occasionally tangible objects such as an employee object; a component in an inventory or it may be intangible such as an event, a job description, identification numbers, or an abstract construct. All such items about which relevant information is stored in the database are called Entities. The qualities of the entity that we store as information are called the attributes. An attribute may be expressed as a number or as a text. It may even be a scanned picture, a sound sequence, and a moving picture that is now possible in some visual and multi-media databases.
Data processing normally concerns itself with a collection of similar entities and records information about the same attributes of each of them. In the traditional approach, a programmer usually maintains a record about each entity and a data item in each record relates to each attribute. Similar records are grouped into files and such a 2-dimensional array is sometimes referred to as a flat file.
Logical and Physical Data: One of the key features of the database approach is to bring about a distinction between the logical and the physical structures of the data. The term logical structure refers to the way the programmers see it and the physical structure refers to the way data are actually recorded on the storage medium. For example, in distributed databases some records may physically be located at significantly remote places, yet are part of the overall database.
Schema and Subschema: The database does not focus on the logical organization and decouples it from the physical representation of data; it is useful to have a term to describe the logical database description. A schema is a logical database description and is drawn as a chart of the types of data that are used. It gives the names of the entities and attributes, and specifies the relationships between them. It is a framework into which the values of the data item can be fitted. Like an information display system such as that giving arrival and departure time at airports and railway stations, the schema will remain the same though the values displayed in the system will change from time to time.
The term schema is used to mean an overall chart of all the data item types and record-types stored in a database. The term sub schema refers to the some view of the data-item of a record types which a particular user application. Therefore, many different sub schemas can be derived from one schema. A simple analysis to distinguish between the schema and the sub schema may be that if the schema represented a road map of Delhi showing major historical sites, educational institutions, railway stations, roadway stations and airports, a sub schema could be a similar map showing one route each from the railway station or the airport to the IGNOU campus at Maidan Garhi.
Data Dictionary: It holds detailed information about the different structures and data types: the details of the logical structure that are mapped into the different structure, details of relationship between data items, details of all users privileges and access rights, performance of resource with details.
The last two items discussed in this section will be further elaborated in the subsequent sections.
One of the main advantages of using a database system is that the organization can exert, via the Database Administrator (DBA), centralized management and control over the data. The database administrator is the focus of the centralized control. Any application requiring a change in the structure of a data record requires an arrangement with the DBA, who makes the necessary modifications. Such modifications do not effect other applications or users of the record in question. Therefore, these changes meet another requirement of the DBMS: data independence. The following are the important advantages of DBMS:
Centralized control of data by the DBA avoids unnecessary duplication of data and effectively reduces the total amount of data storage required. It also eliminates the extra processing necessary to trace the required data in a large storage of data. Another advantage of avoiding duplication is the elimination of the inconsistencies that tend to be present in redundant data files. Any redundancies that exist in the DBMS are controlled and the system ensures that these multiple copies are consistent.
A database allows the sharing of data under its control by any number of application programs or users.
Centralized control can also ensure that adequate checks are incorporated in the DBMS to provide data integrity. Data integrity means that the data contained in the database is both accurate and consistent. Therefore, data values being entered for storage could be checked to ensure that they fall within a specified range and are of the correct format. For example, the value for the age of an employee may be in the range of 16 and 75. Another integrity check that should be incorporated in the database is to ensure that if there is a reference to certain object, that object must exist. In the case of an automatic teller machine, for example, a user is not allowed to transfer funds from a nonexistent saving account to a checking account.
Data is of vital importance to an organization and may be confidential. Unauthorized persons must not access such confidential data. The DBA who has the ultimate responsibility for the data in the DBMS can ensure that proper access procedures are followed, including proper authentication schemes for access to the DBMS and additional checks before permitting access to sensitive data. Different levels of security could be implemented for various types of data and operations. The enforcement of security could be data value dependent (e.g., a manager has access to the salary details of employees in his or her department only), as well as data-type dependent (but the manager cannot access the medical history of any employees, including those in his or her department).
Since the database is under the control of the DBA, she or he should resolve the conflicting requirements of various users and applications. In essence, the DBA chooses the best file structure and access method to get optimal Performance for the response-critical applications, while permitting less critical applications to continue to use the database, albeit with a relatively slower response.
Data independence is usually considered from two points of view: physical data independence and logical data independence. Physical data independence allows changes in the physical storage devices or organization of the files to be made without requiring changes in the conceptual view or any of the external views and hence in the application programs using the database. Thus, the files may migrate from one type of physical media to another or the file structure may change without any need for changes in the application programs.
Logical data independence implies that application programs need not be changed if fields are added to an existing record; nor do they have to be changed if fields not used by application programs are deleted. Logical data independence indicates that the conceptual schema can be changed without affecting the existing external schemas. Data independence is advantageous in the database environment since it allows for changes at one level of the database without affecting other levels. These changes are absorbed by the mappings between the levels. (Please refer to next section for details on the terms used in this para). Logical data independence is more difficult to achieve than physical data independence. Since application programs are heavily dependent on the logical structure of the data they access.
The concept of data independence is similar in many respects to the concept of abstract data type in modern programming languages like C++. Both hide implementation details from the users. This allows users to concentrate on the general structure rather than low-level implementation details.
A significant disadvantage of the DBMS system is cost. In addition to the cost of purchasing or developing the software, the hardware has to be upgraded to allow for the extensive programs and the workspaces required for their execution and storage. The processing overhead introduced by the DBMS to implement security, integrity, and sharing of the data causes a degradation of the response and through-put times. An additional cost is that of migration from a traditionally separate application environment to an integrated one.
While centralization reduces duplication, the lack of duplication requires that the database be adequately backed up so that in the case of failure the data can be recovered. Backup and recovery operations are fairly complex in a DBMS environment, and this is exacerbated in a concurrent multi-user database system. Furthermore, a database system requires a certain amount of controlled redundancies and duplication to enable access to related data items.
Centralization also means that the data is accessible from a single source namely the database. This increases the potential severity of security breaches and disruption of the operation of the organization because of downtimes and failures. The replacement of a monolithic centralized database by a federation of independent and cooperating distributed databases resolves some of the problems resulting from failures and downtimes.