Thursday, 20 October 2011

Explain Secondary Index

There are up to 32 unique and non-unique secondary indexes can be defined for a table. Comparing to primary indexes, Secondary indexes allow access to information in a table by alternate, less frequently used paths. A secondary index is a subtable that is stored in all AMPs, but separately from the primary table. The subtables, which are built and maintained by the system, contain the following;

  • RowIDs of the subtable rows
  • Base table index column values
  • RowIDs of the base table rows (points)
As shown in the following figure, the secondary index subtable on each AMP is associated with the base table by the rowID.

Secondary index are optional. Unlike the primary index, a secondary index can be added or dropped without recreating the table. There can be one or more secondary indexes in the CREATE TABLE statement, or add them to an existing table using the CREATE INDEX statement or ALTER TABLE statement. DROP INDEX can be used to dropping a named or unnamed secondary index. Since secondary indexes require subtables, these subtables require additional disk space and, therefore, may require additional I/Os for INSERTs, DELETEs, and UPDATEs.

If a Teradata SQL request uses secondary index values in a WHERE constraint, the optimizer may use the rowID in a secondary index subtable to access the qualifying rows in the data table. If a secondary index is used only periodically by certain applications and is not routinely used by most applications, disk space can be saved by creating the index when it is needed and dropping it immediately after use.

A unique secondary index is very efficient, it typically allows access of only two AMPs, requires no spool file, and has one row per value, therefore, when a unique secondary index is used to access a row, two AMPs are involved. Unique secondary indexes can thus improve performance by avoiding the overhead of scanning all AMPs. For example, if a unique secondary index is defined on the department_name column of the Customer_service.department table (assuming that no two departments have the same name), then the following query is processed using two AMPs:

SELECT department_number FROM customer_service.department
        WHERE department_name = 'Education';

In this example, the request is sent to AMP n, which contains the rowID for the secondary index value "Education", this AMP, in turn, sends the request to AMP m, where the data row containing that value is stored. Note that the rowID and the data row may reside on the same AMP, in which case only one AMP is involved.

A non-unique secondary index (NUSI) may have multiple rows per value. As a general rule, the NUSI should not be defined if the maximum number of rows per value exceeds the number of data blocks in the table.  A NUSI is efficient only if the number of rows accessed is a small percentage of the total number of data rows in the table. It can be useful for complex conditional expressions or processing aggregates. For example, if the contact_name column is defined as a secondary index for the customer_service.contact table, the following statement can be processed by secondary index:

    SELECT * FROM customer_service.contact
        WHERE contact_name = 'Mike';

After request is submitted, the optimizer first will determine if it is faster to do a full-table scan of the base table rows or a full-table scan of the secondary index subtable to get the rowIDs of the qualifying base table rows; then place those rowIDs into a spool file; finally use the resulting rowIDs to access the base table rows.
Non-unique secondary indexed accessed is used only for request processing when it is less costly than a complete table search.

Teradata allows the use of Primary and Secondary Indexes, however, there are few differences between the two.  In Teradata, Primary Indexes are required and Secondary Indexes are optional.

Each table must have a Primary Index for distributing the rows among the VPROCs.  Each table can have only one Primary Index, but it can have up to 32 Secondary Indexes.  Both Primary and Secondary Indexes may contain up to 16 columns, and they both may be unique or non-unique.

Row distribution among VPROCs is only affected by the Primary Index not the Secondary Index.  Secondary Indexes may be created and dropped at any time, but if the Primary Index must be changed or dropped, the table must be dropped and recreated.   Again, both the Primary and Secondary Indexes will affect the table performance, so poorly chosen indexes can cause some VPROCs to work harder than others.

Explain Primary Index

It is used for data distribution among the AMPs. There is always one index in a table and it is must for every table (even if you don't define it explicitly).

A primary index should be defined to provide a nearly uniform distribution of rows among the  AMPs, the more unique the index, the more even the distribution of rows and the better  space utilization.
The index should be defined on as few columns as possible. 
Primary index can be either Unique or non-unique. A unique index must have a unique value in the corresponding fields of every row.

A non-unique index permits the insertion of duplicate field values. The unique primary index is more efficient. Once created, the primary index cannot be dropped or modified, the  index must be changed by recreating the table. 

Once created, the primary index cannot be dropped or modified, the index must be changed by recreating the table. 

Creating index with create table command
create table organic (serial_No integer,
                                   organic_name char(15),
                                   Carbon_number smallint,
                                    amount smallint)  
unique primary index (serial_No);

create table inorganic (serial_No integer,
                                      inorgnic_name char(15),
                                      anion char(5),
                                      cation char(6),
                                      amount smallint)  
unique primary index (serial_No); 

non unique primary index:
create table student (student_ID char(15),
                                  student_name char(20),
                                  Department char(10))  
primary index (student_ID); 

Thursday, 29 September 2011

Need to analyze before creating a table in Teradata


Primary Index
The primary index (PI) distributes the records in a table across the AMPs, by hashing the columns that make up the PI to determine which records go to which AMP. If no PI is specified when a table is created, the first column of the table will be used as the PI.
When creating a table, care needs to be taken to choose a column or set of columns that evenly distribute the data across the AMPs. A PI that distributes data unevenly will at the very least impact the performance of the table, and depending on the size of the table, has the potential to negatively impact the entire system.
Even distribution of the PI isn’t the only criteria to use when choosing a PI. Consideration should also be given to how the data will be queried. If the data can be evenly distributed using different sets of columns, then the determination of which columns to use should be based on how the data will be queried and what other tables it will be joined to. If two tables that are frequently joined have the same PI, then joining them doesn’t require the records to be redistributed to other AMPs to satisfy a query.
A PI doesn’t have to be the same as the primary key (PK) of a table. The purpose of a PI is to evenly distribute the data, while the purpose of a PK is to identify unique records. The PI and PK can be the same, but it isn’t required.
Skew Factor
A table that has perfectly distributed data has a skew factor of 0%. The higher the skew factor is, the more unevenly data in a table is distributed. As a general rule, tables with a skew factor higher than 50% should be evaluated to determine if a different primary index would distribute the data better, and thereby improve performance.
Tables with fewer records than the number of AMPs will have a higher skew factor that 0%, simply because the records cannot be evenly distributed across all of the AMPs. For tables that have fewer records than the number of AMPs the skew factor of the table may not be improved by choosing a different primary index.
Table Creation
Along with choosing the PI of a table, another choice needs to be made when creating a table. The two options are SET and MULTISET. A SET table prohibits duplicate records with identical values in every column from existing in the table, while a MULTISET table allows them.
When using FastLoad or the TPT Load Operator, if you attempt to insert duplicate records into a SET table, the duplicates are discarded without any notification that an attempt to insert duplicates took place. 
A SET table with no unique primary index has a performance overhead because every record that is inserted must be evaluated to determine if a duplicate already exists in the table. This overhead can be minimized by defining a unique index on the table (see the Teradata Database Database Design manual for more information on minimizing duplicate row checks for tables without unique primary indexes).
Create table syntax examples:
CREATE SET TABLE … (results in a SET table being created)
CREATE MULTISET TABLE … (results in a MULTISET table being created)
CREATE TABLE … (results in a SET table being created with Teradata semantics mode and results in a MULTISET table being created with ANSI semantics mode)


Friday, 16 September 2011

Imp Concepts


Create New Table from Existing Table.
CREATE TABLE <databasename>.<tablename> AS
(Select   <columnnames>
FROM    <tablename>
WHERE condition)
with data;

CREATE TABLE <databasename>.<tablename> AS
(Select   <columnnames>
FROM    <tablename>
WHERE condition)
with no data;


What is Perm space , Spool space and Temp space.
Perm space=> It is a maximum amount of disk space for storing user data rows in any table located in the database. However, if no tables are stored within a database,  it is not required to have PERM space. Although a database without PERM space cannot store tables, it can store views and macros because they are physically stored in the Data Dictionary (DD) PERM space and require no user storage space. The DD is in a “database” called DBC.

Spool Space=> SPOOL space is workspace used for the temporary storage of rows during the execution of user SQL statements. unused PERM space is automatically available for use as SPOOL.

Temp Space=> TEMP space is allocated to any databases/users where Global Temporary Tables are used. Unused perm space is available for TEMP space.


Thursday, 15 September 2011

some useful query

Use of Extract Function.
SELECT DATE;
 
Date      
 ----------
 2011-09-22 
 
SELECT EXTRACT (YEAR FROM date) as "year";
year   
 -------
 2011    
 
SELECT EXTRACT (MONTH FROM date) as "month";
month   
 --------
 9     
 
To get the current system date.
select DATE;

To get the current system time and current session ‘Time Zone’ displacement.
select CURRENT_TIME;

To get all the objects in a database.
help database databasename;

It returns the current system timestamp (including year, month and day) and current session Time Zone displacement.
select CURRENT_TIMESTAMP;

 To get the amount of space occupy by each table in there respective Database.

select databasename,
       tablename,
       CAST ( sum(currentperm)/1024/1024/1024 AS DECIMAL(38,2) ) AS "SIZE(GB)"
from dbc.tablesize
where databasename = 'DATABASENAME'
group by databasename , tablename;

 To find all the Database and the amount of space used by it.

SELECT DatabaseName,
       SUM(MaxPerm)/(1024*1024*1024) (DECIMAL(15,6)) as "Max Perm (GB)",
       SUM(CurrentPerm)/(1024*1024*1024) (DECIMAL(15,6)) as "Current Perm (GB)",
       ((SUM(CurrentPerm))/ NULLIFZERO (SUM(MaxPerm)) * 100) (DECIMAL(15,6)) as "Percent Used"
FROM DBC.DiskSpace
WHERE MAXPERM >0
GROUP BY 1
ORDER BY 4 DESC;

Wednesday, 14 September 2011

Introduction of Teradata Database


The concept of teradata system is little bit different. There is only one database(your teradata system) with lots of (schemas/users). Here (schemas/users) is called database.
In this system dbc is the owner of all metadata. In dbc you can find all the object of teradata.

To get all tables name which consist the particular column.
select tablename, columnname from dbc.columns where columnname ='Emp_Id' and databasename = 'EMPDB';
To see all the objects in a particular database.
Help database databasename;