Skip to main content

Data Warehouse - An Introduction


A data warehouse is defined as a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions. More generally, data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker, such as executive, manager, and analyst, to arrive at better and faster decisions. Data warehouses provide access to data for complex analysis, knowledge discovery, and decision-making. style="mso-spacerun: They support high performance demands on an organization's data and information. It provides an enormous amount of historical and static data from three tiers:
1.      Relational databases
2.      Multidimensional OLAP applications
3.      Client analysis tools
Several types of applications such as online analytical processing (OLAP), decision-support systems (DSS) and data mining are being supported. OLAP is a term used to describe the analysis of complex data from the data warehouse.
OLAP is a software technology that allows users to easily and quickly analyze and view data from multiple points-of-view. OLAP provides dynamic and multi-dimensional support to executives and managers who need to understand different aspects of the data. Activities that are supported include:
§  Analyzing financial trends
§  Creating slices of data
§  Finding new relationships among the data
§  Drilling down into sales statistics
§  Doing calculations through different dimensions where each category of data (that is, product, location, sales numbers, time period, etc.) is considered a dimension.
There are OLAP tools that use distributed computing capabilities for analyses that require more storage and processing power than can be economically and efficiently located on an individual desktop.
DSS support an organization's leading decision makers with higher-level data for complex and critical decisions. A DSS queries a data warehouse or an OLAP database for relevant information that can be compared in order to make a business decision and predict the impact of that decision.
Finally, data mining is being used for knowledge discovery, the process of searching data for unanticipated new knowledge.
Knowledge workers and decision makers use tools ranging from parametric queries to ad hoc queries to data mining. Thus, the access component of the data warehouse must provide support of structured queries (both parametric and ad hoc). These together make up a managed query environment.

Comments

Popular posts from this blog

Standard and Formatted Input / Output in C++

The C++ standard libraries provide an extensive set of input/output capabilities which we will see in subsequent chapters. This chapter will discuss very basic and most common I/O operations required for C++ programming. C++ I/O occurs in streams, which are sequences of bytes. If bytes flow from a device like a keyboard, a disk drive, or a network connection etc. to main memory, this is called   input operation   and if bytes flow from main memory to a device like a display screen, a printer, a disk drive, or a network connection, etc., this is called   output operation . Standard Input and Output in C++ is done through the use of  streams . Streams are generic places to send or receive data. In C++, I/O is done through classes and objects defined in the header file  <iostream> .  iostream  stands for standard input-output stream. This header file contains definitions to objects like  cin ,  cout , etc. /O Library Header Files There are...

Genetic Algorithm: Population, Fitness Function, Parent Selection, Cross over, Mutation

Genetic Algo Population Population is a subset of solutions in the current generation. It can also be defined as a set of chromosomes. There are several things to be kept in mind when dealing with GA population − The diversity of the population should be maintained otherwise it might lead to premature convergence. The population size should not be kept very large as it can cause a GA to slow down, while a smaller population might not be enough for a good mating pool. Therefore, an optimal population size needs to be decided by trial and error. The population is usually defined as a two dimensional array of –  size population, size x, chromosome size . Population Initialization There are two primary methods to initialize a population in a GA. They are − Random Initialization  − Populate the initial population with completely random solutions. Heuristic initialization  − Populate the initial population using a known heuristic for the problem. It has been observed that the e...

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

Normalization   is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly.  Anomalies in DBMS There are three types of anomalies that occur when the database is not normalized. These are – Insertion, update and deletion anomaly. Let’s take an example to understand this. Example : Suppose a manufacturing company stores the employee details in a table named employee that has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name, emp_address for storing employee’s address and emp_dept for storing the department details in which the employee works. At some point of time the table looks like this: emp_id emp_name emp_address emp_dept 101 Nikhil Kangra D001 101 Nikhil Kangra D002 123 Ashish Shimla D890 166 Rahul Pathankot D900 166 Rahul Pathankot D004 The above table is not normalized.  Update anomaly : In the above table we have two rows for employee Nikhil as he belongs ...