Skip to main content

Data Warehouse - An Introduction


A data warehouse is defined as a subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions. More generally, data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker, such as executive, manager, and analyst, to arrive at better and faster decisions. Data warehouses provide access to data for complex analysis, knowledge discovery, and decision-making. style="mso-spacerun: They support high performance demands on an organization's data and information. It provides an enormous amount of historical and static data from three tiers:
1.      Relational databases
2.      Multidimensional OLAP applications
3.      Client analysis tools
Several types of applications such as online analytical processing (OLAP), decision-support systems (DSS) and data mining are being supported. OLAP is a term used to describe the analysis of complex data from the data warehouse.
OLAP is a software technology that allows users to easily and quickly analyze and view data from multiple points-of-view. OLAP provides dynamic and multi-dimensional support to executives and managers who need to understand different aspects of the data. Activities that are supported include:
§  Analyzing financial trends
§  Creating slices of data
§  Finding new relationships among the data
§  Drilling down into sales statistics
§  Doing calculations through different dimensions where each category of data (that is, product, location, sales numbers, time period, etc.) is considered a dimension.
There are OLAP tools that use distributed computing capabilities for analyses that require more storage and processing power than can be economically and efficiently located on an individual desktop.
DSS support an organization's leading decision makers with higher-level data for complex and critical decisions. A DSS queries a data warehouse or an OLAP database for relevant information that can be compared in order to make a business decision and predict the impact of that decision.
Finally, data mining is being used for knowledge discovery, the process of searching data for unanticipated new knowledge.
Knowledge workers and decision makers use tools ranging from parametric queries to ad hoc queries to data mining. Thus, the access component of the data warehouse must provide support of structured queries (both parametric and ad hoc). These together make up a managed query environment.

Comments

Popular posts from this blog

Standard and Formatted Input / Output in C++

The C++ standard libraries provide an extensive set of input/output capabilities which we will see in subsequent chapters. This chapter will discuss very basic and most common I/O operations required for C++ programming. C++ I/O occurs in streams, which are sequences of bytes. If bytes flow from a device like a keyboard, a disk drive, or a network connection etc. to main memory, this is called   input operation   and if bytes flow from main memory to a device like a display screen, a printer, a disk drive, or a network connection, etc., this is called   output operation . Standard Input and Output in C++ is done through the use of  streams . Streams are generic places to send or receive data. In C++, I/O is done through classes and objects defined in the header file  <iostream> .  iostream  stands for standard input-output stream. This header file contains definitions to objects like  cin ,  cout , etc. /O Library Header Files There are...

Genetic Algorithm: Population, Fitness Function, Parent Selection, Cross over, Mutation

Genetic Algo Population Population is a subset of solutions in the current generation. It can also be defined as a set of chromosomes. There are several things to be kept in mind when dealing with GA population − The diversity of the population should be maintained otherwise it might lead to premature convergence. The population size should not be kept very large as it can cause a GA to slow down, while a smaller population might not be enough for a good mating pool. Therefore, an optimal population size needs to be decided by trial and error. The population is usually defined as a two dimensional array of –  size population, size x, chromosome size . Population Initialization There are two primary methods to initialize a population in a GA. They are − Random Initialization  − Populate the initial population with completely random solutions. Heuristic initialization  − Populate the initial population using a known heuristic for the problem. It has been observed that the e...

Difference between net platform and dot net framework...

Difference between net platform and dot net framework... .net platform supports programming languages that are .net compatible. It is the platform using which we can build and develop the applications. .net framework is the engine inside the .net platform which actually compiles and produces the executable code. .net framework contains CLR(Common Language Runtime) and FCL(Framework Class Library) using which it produces the platform independent codes. What is the .NET Framework? The Microsoft .NET Framework is a platform for building, deploying, and running Web Services and applications. It provides a highly productive, standards-based, multi-language environment for integrating existing investments with next-generation applications and services as well as the agility to solve the challenges of deployment and operation of Internet-scale applications. The .NET Framework consists of three main parts: the common language runtime, a hierarchical set of unified class librari...