SQL Hadoop: March 2014

Tuesday, March 18, 2014

SSIS's BULK insert error msg

Problem

When you are trying Bulk insert from csv file, here is the error message you may run into:

“Could not bulk insert because file 'C:csv_filename.txt' could not be opened. Operating system error code 3(The system cannot find the path specified.).”

Solution

First of all, confirm that you are specifying the UNC (Universal Naming Convention) path and not just drive letters. If you are trying to create the file to a remote location then the path should follow UNC, i.e:

\\Server_Name or IP_ADDRESS\Shared_Folder_PATH(Name)\FILE_NAME

Note that the path may be OK if you are trying to create the file on the SAME server running SQL Server.

Secondly, make sure that the SQL Server service account has permissions to SQL Server instance.
For doing this you can use a Domain user or, create a new local user and start the services with that account.
Go to the lower left corner of the desktop, click START, input “services.msc”, then choose “SQL Server” as shown below:

Check Log On AS, a new window will pop up.

The default setting is Local System account. Try changing it with a domain user account and restart SQL Server Service after new changes.

Last but not least, make sure that this account has Read and Write permissions on the folder where you are creating the file. To do so, right click on the folder --> sharing and security --> permissions.

That’s it! Now the error message should have been swept away.

Sunday, March 16, 2014

SSAS error msg: Unable to retrieve data mining algorithms

Problem

When you tried to create a new mining structure in SSAS, you may notice a warning message appears in the window as shown below:

“Unable to retrieve a list of supported data mining algorithms. Make sure you are connected to the correct Analysis Services server instance and the Analysis Services server named localhost is running and configured properly. You can continue with a default list of data mining algorithms.”

Solution:

Step 1: Check Analysis Service has been started from SQL Server Configuration Manager.

Step 2: Check the deployment property to make sure that it has been set up correctly.

Friday, March 14, 2014

Snowflake Schemas Vs. Star Schemas

Today we will briefly discuss the difference between star and snowflake schemas, and what you should consider when you make decisions for a data warehouse design.

What are star schemas and snowflake schemas?

In dimensional data warehousing, star schemas represent a dimensional model which is composed of a central fact table and multiple “de-nomalized” dimension tables around the fact table. The hierarchy dimension are flattened into a single table. For example, in the sample data warehouse of AdventureWorks, the customer dimension may contain attributes about customers such as title, name, date of birth, address in one table. Or for a data warehouse in healthcare, the provider dimension would contain attributes about providers and also practices that providers belong to. Thus, there are some redundancy for star schemas.

On the other hand, a snowflake schema can be regarded as a special type of a star schema. It contains a central fact table and multiple normalized tables to represent one dimension. For example, the Customer dimension can be represented by Customer table and Geography table as shown below.

In the example of the provider dimension, it may have one Practice table containing all practices’ attributes and another table contains only providers’ attributes and a foreign key pointing to the Practice table.

Differences or trade-offs

In terms of disk space, star schemas cost more space because snowflake schemas contain more normalized dimension tables, and thus removes some of the attribute redundancy in dimension tables. (But compared to the size of fact tables, the dimension tables usually are negligible.)

However, in terms of performance, star schemas usually give better performance in most cases because if a query in snowflake schemas may need many joins to do a data aggregation.

Also, a dimension table in snowflake schema may have one or more parent tables. So the dimensions in a snowflake schema must be highly controlled and managed to avoid update and insert anomalies.

Thus, Kimball design tips recommended star schemas in regards of understandability and performance. Even when using snowflake schemas, views are suggested to be placed on top of the dimensions so that they appear to the user as a single dimension.

Cases to choose snowflake schemas

There are some user cases that snowflake schemas may be a better choices:

Based on query frequency: Large customer dimensions such as Amazon or Facebook, where, for example, 90 percent of the fact table measurements involve anonymous visitors about whom you collect little detail, and 10 percent involve reliably registered customers about whom you collect much detailed data by tracking many dimensions. Moving the 10 percent registered customers as a separate dimension can improve performance.
Sparsely populated attributes, where most dimension records are NULL, are moved to a sub-dimension.
Low cardinality attributes that are queried independently. For example, a product dimension may contain thousands of products, but only a handful of product types. So if the product types are often queried independently, it is better to have a separate product type dimension. Another example could be the provider dimension may contain a very large number of demographic attributes at the practice level. Rather than repeat a large number of attributes in the provider dimension, we might create a special type of normalized practice dimension called a “dimension outrigger” where the attributes are stored in the provider dimension.
Attributes that are part of a hierarchy and are queried independently. Examples include the year, quarter, and month attributes of a date hierarchy; and multiple enterprise calendar dimensions.

In summary, before making decision on which schema to use, you should have business requirements defined about how this data warehouse is used. You should reflect how users want to aggregate and “slice and dice” historical fact table data. You must embrace the goal of enhancing business value as its primary purpose. Work closely with business users, uncover the needs of your business users, then design a system that meet their needs!

Pages

Tuesday, March 18, 2014

SSIS's BULK insert error msg

Problem

Solution

Sunday, March 16, 2014

SSAS error msg: Unable to retrieve data mining algorithms

Problem

Solution:

Friday, March 14, 2014

Snowflake Schemas Vs. Star Schemas

Today we will briefly discuss the difference between star and snowflake schemas, and what you should consider when you make decisions for a data warehouse design.

What are star schemas and snowflake schemas?

Differences or trade-offs

Cases to choose snowflake schemas