Skip to main content

Data Warehouse - An Introduction


A data warehouse is defined as a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions. More generally, data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker, such as executive, manager, and analyst, to arrive at better and faster decisions. Data warehouses provide access to data for complex analysis, knowledge discovery, and decision-making. style="mso-spacerun: They support high performance demands on an organization's data and information. It provides an enormous amount of historical and static data from three tiers:
1.      Relational databases
2.      Multidimensional OLAP applications
3.      Client analysis tools
Several types of applications such as online analytical processing (OLAP), decision-support systems (DSS) and data mining are being supported. OLAP is a term used to describe the analysis of complex data from the data warehouse.
OLAP is a software technology that allows users to easily and quickly analyze and view data from multiple points-of-view. OLAP provides dynamic and multi-dimensional support to executives and managers who need to understand different aspects of the data. Activities that are supported include:
§  Analyzing financial trends
§  Creating slices of data
§  Finding new relationships among the data
§  Drilling down into sales statistics
§  Doing calculations through different dimensions where each category of data (that is, product, location, sales numbers, time period, etc.) is considered a dimension.
There are OLAP tools that use distributed computing capabilities for analyses that require more storage and processing power than can be economically and efficiently located on an individual desktop.
DSS support an organization's leading decision makers with higher-level data for complex and critical decisions. A DSS queries a data warehouse or an OLAP database for relevant information that can be compared in order to make a business decision and predict the impact of that decision.
Finally, data mining is being used for knowledge discovery, the process of searching data for unanticipated new knowledge.
Knowledge workers and decision makers use tools ranging from parametric queries to ad hoc queries to data mining. Thus, the access component of the data warehouse must provide support of structured queries (both parametric and ad hoc). These together make up a managed query environment.

Comments

Popular posts from this blog

Standard and Formatted Input / Output in C++

The C++ standard libraries provide an extensive set of input/output capabilities which we will see in subsequent chapters. This chapter will discuss very basic and most common I/O operations required for C++ programming. C++ I/O occurs in streams, which are sequences of bytes. If bytes flow from a device like a keyboard, a disk drive, or a network connection etc. to main memory, this is called   input operation   and if bytes flow from main memory to a device like a display screen, a printer, a disk drive, or a network connection, etc., this is called   output operation . Standard Input and Output in C++ is done through the use of  streams . Streams are generic places to send or receive data. In C++, I/O is done through classes and objects defined in the header file  <iostream> .  iostream  stands for standard input-output stream. This header file contains definitions to objects like  cin ,  cout , etc. /O Library Header Files There are...

Genetic Algorithm: Population, Fitness Function, Parent Selection, Cross over, Mutation

Genetic Algo Population Population is a subset of solutions in the current generation. It can also be defined as a set of chromosomes. There are several things to be kept in mind when dealing with GA population − The diversity of the population should be maintained otherwise it might lead to premature convergence. The population size should not be kept very large as it can cause a GA to slow down, while a smaller population might not be enough for a good mating pool. Therefore, an optimal population size needs to be decided by trial and error. The population is usually defined as a two dimensional array of –  size population, size x, chromosome size . Population Initialization There are two primary methods to initialize a population in a GA. They are − Random Initialization  − Populate the initial population with completely random solutions. Heuristic initialization  − Populate the initial population using a known heuristic for the problem. It has been observed that the e...

C++ (Object and Class)

The major purpose of C++ programming is to introduce the concept of object orientation to the C programming language. Object Oriented Programming is a paradigm that provides many concepts such as  inheritance, data binding, polymorphism etc. The programming paradigm where everything is represented as an object is known as truly object-oriented programming language.  Smalltalk  is considered as the first truly object-oriented programming language. OOPs (Object Oriented Programming System) Object  means a real word entity such as pen, chair, table etc.  Object-Oriented Programming  is a methodology or paradigm to design a program using classes and objects. It simplifies the software development and maintenance by providing some concepts: Object Class Inheritance Polymorphism Abstraction Encapsulation C++ Object In C++, Object is a real world entity, for example, chair, car, pen, mobile, laptop etc. In other words, object is an ent...