BI prestidigitation

Friday, July 22, 2016

All member

All member is a very handy member to use in some MDX calculations. Remember, when you drop any measure and no other dimension members, it doesn't mean that they aren't in use. Every cell (tuple) has a coordinate. If you don't select any member from hierarchy it doesn't mean that your current coordinate is unknown, it's .DefaultMember, most likely it's .All member (you can change default member in Attribute properties).

New clients case

In order to count clients who have first order in current period we may use min date of invoice in current period and compare with all over the history min date. If they match, then it's a new client.

SUM(
   [Client].[Client].[Client].Members,
   IIF(
       [Measures].[Invoice First Date] > 0 --check the first invoice this period
       and
       [Measures].[Invoice First Date] = ([Date].[Day].[All],[Date].[Month].[All],[Date].[Year].[All],[Measures].[Invoice First Date]), --check the first invoice all time and compare with current value
       1,
       NULL
   )
)

As you may mention the first month obviously has the biggest value, that's because the start point to record invoices is Oct 2012. Now we have a great tooling to analyze the efficiency of marketing department promotions by comparing previous periods with promotion periods aftereffect.

Thursday, July 14, 2016

Set operation within MDX and months of cooperation case

When it comes to MDX optimization most likely you have to avoid Set functions, really. Sometimes it's impossible, but avoiding them as often as they show up is the key to reach good performance. The idea behind SSAS is every measure has a coordinate (tuple) and you can't put there more then one member. For example, if you have the following calculation:

([Date].[Day].&[20141001],[Measures].[Invoice Amount])

It returns amount for the hard-coded day, but what are you going to do when you want, say, the same for two days? Probably first your guess is appending missing member:

([Date].[Day].&[20141001],[Date].[Day].&[20141002],[Measures].[Invoice Amount])

The result is NULL. Why? You put here two members in one tuple. It's like you set the coordinate (X1,X2,Y1,Z1) which is invalid, because there can be the only one X. In order to calculate it you should tell what to do with this members. You may aggregate it by native aggregation function:

Aggregate(
{[Date].[Day].&[20141001],[Date].[Day].&[20141002]},
[Measures].[Invoice Amount]
)

Now it results properly as you wished. Note that, you cannot use Aggregate function with calculated measures, you have to set certain one instead (Sum, Avg, Min, etc).

Let's get to performance testing to feel the difference. What we get when recalculate the same measure by days within one months or using only on month member.

The difference is huge and when you have to recalculate the measure for every tuple. It's non so noticeable for small number of tuples, but when your cube is huge, your performance will be very low. So as I said at the beginning — avoid using Set functions within MDX. Let's get to the real example.

Months of cooperation case

Let's say, we want to analyze how many months the client work with us. In order to do so we have to add two extra measures min and max date by invoices.

Since the measures are done we get the result in YYYYMMDD format and can't get count of months
arithmetically. Actually we can get it by using Set functions without extra measures:

As you see it's memory high-cost operation. Let's use new measures without Set operations. We have to find diffs between years and months:

The code:

(
   Round(
       ([Measures].[Invoice Last Date] - [Measures].[Invoice First Date]) / 10000 -- (20151009 - 20140906) / 10000 = 1 (years diff)
   ) * 12 -- 1 * 12 = 12 (convert years count to months count)
)
+
(
   Round([Measures].[Invoice Last Date] / 100)
   -
   Round([Measures].[Invoice Last Date] / 10000) * 100
) -- 201510 - 201500 = 10 (last months count)
-
(
   Round([Measures].[Invoice First Date] / 100)
   -
   Round([Measures].[Invoice First Date] / 10000) * 100
) -- 201409 - 20140900 = 9 (first months count)
+
1 -- extra missing month

Also you can do the same with cropping value using left and right functions and a little bit slower than previous code.

The code:

(
    Left([Measures].[Invoice Last Date],4)
    -
    Left([Measures].[Invoice First Date],4)
) * 12
+
Right(Left([Measures].[Invoice Last Date],6),2)
-
Right(Left([Measures].[Invoice First Date],6),2)
+
1

To sum it up once again: try to avoid Set MDX functions for performance reasons. Yo can find list of Set functions here.

Friday, July 8, 2016

Testing DWH

I expect this post will be updated over time. Here I want to describe my test DB to handle business cases.


List of tables

Data within tables

Let's create some Cube-orientated DWH:

create view [dbo].[dim_Product] as   
select  
    [ID]  
    ,[Name]  
from [Product]

create view [dbo].[dim_Client] as  
select   
    [ID]  
    ,[Name]  
from [Client]
  
create view [dbo].[dim_Date] as  
select   
    [Day]  
    ,[DayName]  
    ,[Month]  
    ,[MonthName]  
    ,[Year]  
from [Date] 
 
create view [dbo].[dim_Invoice] as  
select   
    [ID]  
    ,[Number]     
from [Invoice_Header]
  
create view [dbo].[dim_Price] as  
select   
    [Price]  
from [Invoice_Detail]  

create view [dbo].[fact_Invoice] as  
select   
     [Invoice_Detail].[ID]  
     ,[Product]  
     ,[Date]  
     ,[Client]  
     ,[Price]  
     ,[Amount]  
     ,[Qty]  
from [Invoice_Detail]  
     left join [Invoice_Header]  
     on [Invoice_Detail].[ID] = [Invoice_Header].[ID]

Now we can create relationships between dim and fact tables:

Here we have a cube:

Hello world!

Hey, it's my personal blog about BI area. I work as BI architect since 2014 and accumulated experiences I want both to share to people and to store for myself. I use SSAS, MSSQL, MySQL, PostgreSQL in daily work.

I'll cover mostly DWH + MDX algorithms for interesting business cases I work on. Some small tech tricks may appear here also. I'll use a fake DB to show examples.