Microsoft SQL Server (MSSQL)

Microsoft® SQL Server™ 2000 is a set of components that work together to meet the data storage and analysis needs of the largest Web sites and enterprise data processing systems.Its a family of products that meet the data storage requirements of the largest data processing systems and commercial Web sites, yet at the same time can provide easy-to-use data storage services to an individual or small business.

The data storage needs of a modern corporation or government organization are very complex. Some examples are:

  • Online Transaction Processing (OLTP) systems must be capable of handling thousands of orders placed at the same time.
  • Increasing numbers of corporations are implementing large Web sites as a mechanism for their customers to enter orders, contact the service department, get information about products, and for many other tasks that previously required contact with employees. These sites require data storage that is secure, yet tightly integrated with the Web.
  • Organizations are implementing off-the-shelf software packages for critical services such as human resources planning, manufacturing resources planning, and inventory control. These systems require databases capable of storing large amounts of data and supporting large numbers of users.
  • Organizations have many users who must continue working when they do not have access to the network. Examples are mobile disconnected users, such as traveling sales representatives or regional inspectors. These users must synchronize the data on a notebook or laptop with the current data in the corporate system, disconnect from the network, record the results of their work while in the field, and then finally reconnect with the corporate network and merge the results of their fieldwork into the corporate data store.
  • Managers and marketing personnel need increasingly sophisticated analysis of trends recorded in corporate data. They need robust Online Analytical Processing (OLAP) systems easily built from OLTP data and support sophisticated data analysis.
  • Independent Software Vendors (ISVs) must be able to distribute data storage capabilities with applications targeted at individuals or small workgroups. This means the data storage mechanism must be transparent to the users who purchase the application. This requires a data storage system that can be configured by the application, and then tune itself automatically so that the users do not need to dedicate database administrators to constantly monitor and tune the application.

Features of SQL Server 2000

Internet Integration
The SQL Server 2000 database engine includes integrated XML support. It also has the scalability, availability, and security features required to operate as the data storage component of the largest Web sites. The SQL Server 2000 programming model is integrated with the Windows DNA architecture for developing Web applications, and SQL Server 2000 supports features such as English Query and the Microsoft Search Service to incorporate user-friendly queries and powerful search capabilities in Web applications.

Scalability and Availability
The same database engine can be used across platforms ranging from laptop computers running Microsoft Windows® 98 through large, multiprocessor servers running Microsoft Windows 2000 Data Center Edition. SQL Server 2000 Enterprise Edition supports features such as federated servers, indexed views, and large memory support that allow it to scale to the performance levels required by the largest Web sites

Enterprise-Level Database Features
The SQL Server 2000 relational database engine supports the features required to support demanding data processing environments. The database engine protects data integrity while minimizing the overhead of managing thousands of users concurrently modifying the database. SQL Server 2000 distributed queries allow you to reference data from multiple sources as if it were a part of a SQL Server 2000 database, while at the same time, the distributed transaction support protects the integrity of any updates of the distributed data. Replication allows you to also maintain multiple copies of data, while ensuring that the separate copies remain synchronized. You can replicate a set of data to multiple, mobile, disconnected users, have them work autonomously, and then merge their modifications back to the publisher

Ease of installation, deployment, and use
SQL Server 2000 includes a set of administrative and development tools that improve upon the process of installing, deploying, managing, and using SQL Server across several sites. SQL Server 2000 also supports a standards-based programming model integrated with the Windows DNA, making the use of SQL Server databases and data warehouses a seamless part of building powerful and scalable systems. These features allow you to rapidly deliver SQL Server applications that customers can implement with a minimum of installation and administrative overhead.

Data warehousing
SQL Server 2000 includes tools for extracting and analyzing summary data for online analytical processing. SQL Server also includes tools for visually designing databases and analyzing data using English-based questions

Relational Database Components

The database component of Microsoft® SQL Server™ 2000 is a Structured Query Language (SQL)–based, scalable, relational database with integrated Extensible Markup Language (XML) support for Internet applications. Each of the following terms describes a fundamental part of the architecture of the SQL Server 2000 database component:

A database is similar to a data file in that it is a storage place for data. Like a data file, a database does not present information directly to a user; the user runs an application that accesses data from the database and presents it to the user in an understandable format. Database systems are more powerful than data files in that data is more highly organized. In a well-designed database, there are no duplicate pieces of data that the user or application must update at the same time. Related pieces of data are grouped together in a single structure or record, and relationships can be defined between these structures and records. When working with data files, an application must be coded to work with the specific structure of each data file. In contrast, a database contains a catalog that applications use to determine how data is organized. Generic database applications can use the catalog to present users with data from different databases dynamically, without being tied to a specific data format.

    A database typically has two main parts: first, the files holding the physical database and second, the database management system (DBMS) software that applications use to access data. The DBMS is responsible for enforcing the database structure, including:
  • Maintaining relationships between data in the database.
  • Ensuring that data is stored correctly, and that the rules defining data relationships are not violated.
  • Recovering all data to a point of known consistency in case of system failures.

Relational Database
Although there are different ways to organize data in a database, relational databases are one of the most effective. Relational database systems are an application of mathematical set theory to the problem of effectively organizing data. In a relational database, data is collected into tables (called relations in relational theory). A table represents some class of objects that are important to an organization. For example, a company may have a database with a table for employees, another table for customers, and another for stores. Each table is built of columns and rows (called attributes and tuples in relational theory). Each column represents some attribute of the object represented by the table. For example, an Employee table would typically have columns for attributes such as first name, last name, employee ID, department, pay grade, and job title. Each row represents an instance of the object represented by the table. For example, one row in the Employee table represents the employee who has employee ID 12345. When organizing data into tables, you can usually find many different ways to define tables. Relational database theory defines a process called normalization, which ensures that the set of tables you define will organize your data effectively.

SQL Server 2000 supports having a wide range of users access it at the same time. An instance of SQL Server 2000 includes the files that make up a set of databases and a copy of the DBMS software. Applications running on separate computers use a SQL Server 2000 communications component to transmit commands over a network to the SQL Server 2000 instance. When an application connects to an instance of SQL Server 2000, it can reference any of the databases in that instance that the user is authorized to access. The communication component also allows communication between an instance of SQL Server 2000 and an application running on the same computer. You can run multiple instances of SQL Server 2000 on a single computer. SQL Server 2000 is designed to support the traffic of the largest Web sites or enterprise data processing systems. Instances of SQL Server 2000 running on large, multiprocessor servers are capable of supporting connections to thousands of users at the same time. The data in SQL Server tables can be partitioned across multiple servers, so that several multiprocessor computers can cooperate to support the database processing requirements of extremely large systems. These groups of database servers are called federations. Although SQL Server 2000 is designed to work as the data storage engine for thousands of concurrent users who connect over a network, it is also capable of working as a stand-alone database directly on the same computer as an application. The scalability and ease-of-use features of SQL Server 2000 allow it to work efficiently on a single computer without consuming too many resources or requiring administrative work by the stand-alone user. The same features allow SQL Server 2000 to dynamically acquire the resources required to support thousands of users, while minimizing database administration and tuning. The SQL Server 2000 relational database engine dynamically tunes itself to acquire or free the appropriate computer resources required to support a varying load of users accessing an instance of SQL Server 2000 at any specific time. The SQL Server 2000 relational database engine has features to prevent the logical problems that occur if a user tries to read or modify data currently used by others.

Structured Query Language
To work with data in a database, you have to use a set of commands and statements (language) defined by the DBMS software. Several different languages can be used with relational databases; the most common is SQL. The American National Standards Institute (ANSI) and the International Standards Organization (ISO) define software standards, including standards for the SQL language. SQL Server 2000 supports the Entry Level of SQL-92, the SQL standard published by ANSI and ISO in 1992. The dialect of SQL supported by Microsoft SQL Server is called Transact-SQL (T-SQL). T-SQL is the primary language used by Microsoft SQL Server applications.

Extensible Markup Language
XML is the emerging Internet standard for data. XML is a set of tags that can be used to define the structure of a hypertext document. XML documents can be easily processed by the Hypertext Markup Language, which is the most important language for displaying Web pages. Although most SQL statements return their results in a relational, or tabular, result set, the SQL Server 2000 database component supports a FOR XML clause that returns results as an XML document. SQL Server 2000 also supports XPath queries from Internet and intranet applications. XML documents can be added to SQL Server databases, and the OPENXML clause can be used to expose data from an XML document as a relational result set.

Data Warehousing and Online Analytical Processing

Microsoft® SQL Server™ 2000 provides components that can be used to build data warehouses or data marts. The data warehouses or data marts can be used for sophisticated enterprise intelligence systems that process queries required to discover trends and analyze critical factors. These systems are called online analytical processing (OLAP) systems. The data in data warehouses and data marts is organized differently than in traditional transaction processing databases.

Enterprise-level relational database management software, such as SQL Server 2000, was designed originally to centrally store the data generated by the daily transactions of large companies or government organizations. Over the decades, these databases have grown to be highly efficient systems for recording the data required to perform the daily operations of the enterprise. Because the system is based on computers and records the business transactions of the enterprise, these systems are known as online transaction processing (OLTP) systems.

OLTP Systems
The data in OLTP systems is organized primarily to support transactions, such as:

  • Recording an order from a point-of-sale terminal or entered through a Web site.
  • Placing an order for more supplies when inventory levels drop to a defined level.
  • Tracking components as they are assembled into a final product in a manufacturing facility.
  • Recording employee data.
  • Recording holders of licenses, such as restaurant or driver licenses.

Individual transactions are completed quickly and access relatively small amounts of data. OLTP systems are designed and tuned to process hundreds or thousands of transactions being entered at the same time.

Although OLTP systems excel at recording the data required to support daily operations, OLTP data is not organized in a manner that easily provides the information required by managers to plan the work of their organizations. Managers need summary information from which they can analyze trends that affect their organization or team. They need to find the critical factors affecting the success of their organization, and how best to adjust those factors to improve the success of the enterprise. They need to find how the workload of their enterprise is affected by seasonal and yearly trends so that they can predict how many employees and resources will be required to perform future work.

OLAP Systems
Systems designed to handle the queries required to discover trends and critical factors are called online analytical processing (OLAP) systems. OLAP queries typically require large amounts of data. For example, the head of a government motor vehicle licensing department could ask for a report that shows the number of each make and model of vehicle registered by the department each year for the past 20 years. Running this type of query against the original detail data in an OLTP system has two effects:

  • The query takes a long time to aggregate (sum) all of the detail records for the last 20 years, so the report is not ready in a timely manner.
  • The query generates a very heavy workload that at least slows down the normal users of the system from recording transactions at their normal pace.

Another issue is that many large enterprises do not have only one OLTP system that records all the transaction data. Most large enterprises have multiple OLTP systems, many of which were developed at different times and use different software and hardware. In many cases, the codes and names used to identify items in one system are different from the codes and names used in another. Managers running OLAP queries generally need to be able to reference the data from several of these OLTP systems.

OLAP data is organized into multidimensional cubes. The structure of data in multidimensional cubes gives better performance for OLAP queries than data organized in relational tables. The basic unit of a multidimensional cube is called a measure. Measures are the units of data that are being analyzed. For example, a corporation that operates hardware stores wants to analyze revenue and discounts for the different products it sells. The measures are the number of units sold, revenue, and the sum of any discounts. The measures are organized along dimensions. In this example, a three dimensional cube could have these dimensions: time, store, and products. Think of these dimensions as forming the logical x, y, and z axis of a three-dimensional, virtual cube.

Each dimension is divided into units called members. The members of a dimension are typically organized into a hierarchy. Similar members are grouped together as a level of the hierarchy. For example, the top hierarchy level of a time dimension can be years, with months at the next level, then weeks, days, and finally hours at the bottom level of the hierarchy. At each intersection of the three dimensions, the values for the measures that match those three dimension values are recorded. For example, suppose that the hour starting at 1:00 P.M. Saturday, Feb. 19, 2000 is a time dimension member, Store #2 of Albany, New York is a store dimension member, and Easy-Clean Mops are a product dimension member. Where these three dimensions meet, the cell records that 10 mops were sold for revenues of $90.00 and an average discount of $1.00.

The specific dimensions and measures defined for the cubes in any particular OLAP system depend on the kinds of analysis important to the enterprise. Transforming OLTP data from relational tables into OLAP cubes, and the design of the cubes, is a complex area that is the subject of many third-party books.

OLAP systems operate on OLAP data in data warehouses or data marts. A data warehouse stores enterprise-level OLAP data, while a data mart is smaller and typically covers a single function in an organization.

Articles | Case Studies | Search | Email | Site map