Quantcast
Channel: Analysis Services - Chris Webb's BI Blog
Viewing all 104 articles
Browse latest View live

A Quick Look At Azure Data Catalog

$
0
0

I’ve been meaning to look at Azure Data Catalog for a long time but somehow never got round to it until now. I’m not sure why – perhaps it’s because there’s so much other cool stuff happening in the world of Power BI and Azure, and Azure Data Catalog seems a bit, um, boring in comparison. Maybe it’s a bit too much like documentation and all that other stuff we instinctively try to avoid. Whatever the reason, now I have looked at it I’m impressed and I can see its uses, not just for Power BI but also for traditional corporate BI scenarios. It certainly deserves a lot more attention from the Microsoft BI community than it seems to be getting at the moment.

First of all, what is Azure Data Catalog anyway? There’s lots of good documentation, videos and a tutorial here:
https://azure.microsoft.com/en-us/documentation/services/data-catalog/

…but the short answer is that it’s a cloud based service for cataloguing all the data sources you want to use for BI. It doesn’t store any of the data itself (except for a small amount for preview purposes, and you can turn that off), just the connection details; it also stores metadata (column names, data types etc) as well as descriptions, documentation and contact details for people who own or know about the data. End users can search the catalogue for data and even, for some data source types like SSAS, click a button to open that data source directly in Excel or Power BI Desktop.

image

Is it any good? Well, I managed to get going with it very easily. As well as an Azure subscription it does require you to have Azure Active Directory set up though, which is a pain – some of my customers (mostly the ones that have bought into Office 365 and Power BI) have it already, but I know for others it would be a showstopper. After setting everything up I was able to register SQL Server, SSAS and other types of data source very quickly using both the desktop application and the browser interface. It’s certainly simple enough for a reasonably technical end user to use and the implementation is very solid. I do have a few minor quibbles (I found the search results interface slightly confusing at first but soon came to grips with it) and it’s clear that it’s still very much a work in progress (right now the browser interface supports registering more data types than the desktop app, for example) but nothing very significant.

There is an obvious need for Azure Data Catalog if you are going to use Power BI and have embraced the whole concept of self-service BI: it makes it much easier for your users to find and consume the various different data sources that you have available. However, as I said earlier, I think on-premises users of SSAS and SSRS could also benefit from it too. It’s becoming increasingly common for me to see organisations with several SSAS cubes in production across various servers, each of them with a confusing list of measures and dimensions. Equally I’ve seen plenty of SSRS implementations with hundreds of reports, with nobody knowing what many of these reports do or even if they are used at all. Azure Data Catalog could provide a convenient central place to document and discover these cubes and reports. I guess it all depends on whether you can be bothered to do this documentation and annotation though, rather than whether you have a suitable tool for this purpose; I don’t think many organisations can be bothered, unfortunately.

One last thing: Azure Data Catalog is meant to include replacement functionality for the old Power BI for Office 365 Data Catalog but as yet it doesn’t allow you to publish or share Power Query queries. This is something that I and other Power Query/Power BI fans would love to see, especially if (unlike the old Data Catalog) it let you publish updates to your code without needing to manually update each Excel workbook/Power BI report that used it afterwards.



Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 1

$
0
0

Filtering out empty values is something that SSAS does a lot of during query execution, and it is also a common cause of performance problems. In this series of posts (similar to my series earlier this year on results serialisation) I’ll look at the different types of non empty filtering that can occur in an MDX query, how they can be monitored using Profiler and what you can do to improve their performance.

Some of this information has come from an old white paper, but I’ve found that some of what that paper says is now out of date and I’m extremely grateful to Akshai Mirchandani of Microsoft for answering my questions on this subject. Even with the long-suffering Akshai’s help a lot of the information here is based on my own research and therefore potentially incomplete/inaccurate, potentially different for different versions of SSAS (I’m using SSAS 2014 for this series) and could potentially change again in the future, so take due care!

The first question to ask is: what counts as non empty filtering? There are actually several different operations that the engine treats as a non empty filter, a few of which I was surprised by; here are the ones I know about.

1) The NON EMPTY statement

Most MDX queries generated by client tools include a NON EMPTY statement on the rows and columns axis. For example, take a look at the results returned by this MDX query on the Adventure Works cube:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

As you can see, the Product Category Components has no data and returns a null. One way to remove the row for Components would be to add a NON EMPTY statement to the rows axis:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

2) The NONEMPTY() function

Often confused with the NON EMPTY statement, but not the same thing: the NON EMPTY statement can only be used on an axis in a SELECT statement, whereas the NONEMPTY() function can be used in any MDX expression. Continuing our example, here’s how to use it to remove the Component category:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
NONEMPTY(
 [Product].[Category].[Category].MEMBERS,
{[Measures].[Internet Order Quantity]})
ON 1
FROM
[Adventure Works]

The NONEMPTY() function is much more flexible than the NON EMPTY statement but essentially does the same thing – it isn’t any faster in what it does, but it does allow you to make certain assumptions about your data that can improve query performance (more of that later). One thing to remember is to always set the second parameter, because if you don’t you may get unexpected results.

There is also a NONEMPTYCROSSJOIN() function but it is deprecated and you should not be using it – everything that it does can be done more reliably with other functions.

3) Autoexists

Autoexists is not a feature of the MDX language but rather something that SSAS does automatically to remove tuples from a set that it knows must always be null. It’s described in great detail here, but it’s quite easy to illustrate. We already know from the queries above which categories have data; similarly the following query shows there is data for all colours except Grey and Silver/Black:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]

image

However if you crossjoin every category and every colour on the rows axis, you don’t see every combination of category and colour returned:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Category].[Category].MEMBERS
*
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]

image

There is no row for the category Bikes and the colour White, for example. This is because the SSAS engine knows from the data in the dimension that no product exists that is both the colour White and in the category Bikes, so it doesn’t return that combination from the crossjoin – that particular tuple could never contain any data so there’s no point returning it. Notice that there are combinations, such as Components/Black, that exist in the dimension and are present on the rows axis but still return null because there is no value for Internet Order Quantity.

It’s important to remember that autoexists only takes place when you are working with sets of members from different hierarchies on the same dimension, never with sets of members from different dimensions.

4) The EXISTS() function and the EXISTING keyword

The EXISTS() function and the EXISTING keyword allow you to take advantage of autoexists for filtering inside your own expressions without having to actually do a crossjoin (there’s another variant of EXISTS() with a third parameter that behaves more like NONEMPTY() but it’s very rarely used so I’m going to ignore it).

For example, here’s how you can use the EXISTS() function to return all the categories that have a product that is White:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
EXISTS(
[Product].[Category].[Category].MEMBERS
, {[Product].[Color].&[White]})
ON 1
FROM
[Adventure Works]

image

The EXISTING keyword is used within calculations to apply autoexists filtering to a set based on the other hierarchies from the same dimension. The following query contains a calculated measure that counts the number of members on the Color level of the Color hierarchy, and unsurprisingly returns the same value each time it’s called:

WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT([Product].[Color].[Color].MEMBERS)
SELECT
{MEASURES.COLOURCOUNT}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

However, if you add the EXISTING keyword just before the set in the calculated measure definition, like so:

WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT(EXISTING [Product].[Color].[Color].MEMBERS)
SELECT
{MEASURES.COLOURCOUNT}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

Then you’ll see that the calculation now returns the number of members on the Color level of the Color hierarchy after autoexists filtering has been applied; so for example the first line shows there are five distinct colours associated with the Category Bikes:

image

Summary

OK, after that somewhat lengthy introduction, in part 2 I’ll show you how to use Profiler to monitor what’s going on inside SSAS when you do all of these different types of non empty filtering.


Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 2

$
0
0

In part 1 of this series I introduced all of the different types of non empty filtering that can occur in Analysis Services and MDX. In this post I’ll show you how you can monitor each of these types of non empty filtering using Profiler using the queries from part 1 as examples.

Profiler events

The three Profiler events we’re going to be most concerned with here are:

  • Calculate Non Empty Begin, which is raised when the engine starts a non empty filter
  • Calculate Non Empty Current, which will be raised one or more times when the non empty filter is in progress and which will give us more detail about what’s happening
  • Calculated Non Empty End, which is raised when the engine finishes evaluating a non empty filter

I’ve also included the Query Begin/End events in my traces just to show when query execution starts and finishes. All the queries in this post will be run on a warm cache; although the values used for non empty filtering can be cached, SSAS can’t cache the output of a non empty filter (unless you do something like this) so for our purposes whether the cache is warm or cold is irrelevant.

image

NON EMPTY and NONEMPTY()

In the simplest cases SSAS treats the NON EMPTY statement and the NONEMPTY() function the same, so let’s look again at the following query from my previous post:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
NON EMPTY
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

Here’s what our Profiler trace shows us when this query is run:

image

There are several things to notice here:

  • The Duration column tells us how long, in ms, both the query took to run (on the Query End line) and the non empty filter took to evaluate (on the Calculate Non Empty End line).
  • The EventSubclass column, for the Calculate Non Empty Current events, shows the different stages of evaluation. The possible values here are:
    • 1 – cell values are evaluated for the filtering
    • 2 – calculated members are evaluated (this is no longer used from SSAS 2008 on)
    • 3 – the tuples are rendered in the resultset
  • The IntegerData column shows what type of non empty filtering is taking place. The possible values here are:
    • 1 – All non empty filtering using NON EMPTY and NONEMPTY() using a fast algorithm (as shown in the screenshot above)
    • 2 – EXISTING operations using a fast algorithm.  This is a bit misleading though, because due to an internal bug this event will not fire for most uses of the EXISTING statement but will fire for some operations related to regular autoexists.
    • 3 – Autoexists operations using a fast algorithm
    • 11 – All non empty filtering using NON EMPTY and NONEMPTY() using a slower algorithm, for when complex calculations need to be evaluated
    • 12 – as (2) but with the slower algorithm (theoretically possible but should never occur in practice)
    • 13 – as (3) but with the slower algorithm (again, theoretically possible but should never occur in practice)
  • The ProgressTotal column for the Calculate Non Empty Current (as with the Serialize Results Current event) and EventSubclass 1 should return the number of tuples evaluated for the non empty filter. If the number of tuples is greater than 1000 you’ll see multiple events, one for each 1000 tuples, with the last event showing the overall total. Unfortunately there’s a bug here that means it returns one less than the actual number of tuples evaluated. What’s more, in most cases, the non empty filter operations in the SSAS engine are difficult to match to whatever’s going on in the query, so it’s only really useful as a relative measure of how expensive the operation is.

Putting this all together, for this query we can see that there was one non empty filter operation, it was using the fast NON EMPTY algorithm, the ProgressTotal column shows 3 tuples were returned (ie the three Categories on rows) and it took 1ms.

Autoexists

Now let’s look at the autoexists query from the previous post in this series:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Category].[Category].MEMBERS
*
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]

image

Here’s what Profiler shows:

image

Even with what looks like a single autoexists operation there are three sets of Non Empty events here, two of which have IntegerData 3 (for autoexists) and one with IntegerData 2 (for EXISTING which, as noted above, is to be expected here with autoexists). The ProgressTotal column returns 26 for a query that returns 24 rows which I guess is near enough correct to be useful.

EXISTS() And Existing

As you might expect, the EXISTS() function produces a result very similar to a straightforward autoexists. Taking the EXISTS() query from my previous post:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
EXISTS(
[Product].[Category].[Category].MEMBERS
, {[Product].[Color].&[White]})
ON 1
FROM
[Adventure Works]

image

You can see there’s just one Non Empty Begin/End event pair generated, with IntegerData 3, and ProgressTotal returns 0 which means that one tuple is returned:

image

Whereas the use of EXISTING inside a calculated measure, like so:

WITH
MEMBER MEASURES.COLOURCOUNT AS
COUNT(EXISTING [Product].[Color].[Color].MEMBERS)
SELECT
{MEASURES.COLOURCOUNT}
ON 0,
[Product].[Category].[Category].MEMBERS
ON 1
FROM
[Adventure Works]

image

Causes multiple Non Empty Begin/End pairs, very probably one for each member on the hierarchy based on the number of events and the values in the ProgressTotal column (the ProgressTotal values for the second, third, fourth and fifth non empty filters tally with the four values returned by the calculated measure; I don’t know what the first non empty filter is doing) :

image

The WHERE Clause And Subselects

Weirdly enough, the presence of a WHERE clause or a subselect in a query also triggers Non Empty Begin/End events [Subselects are something that a lot of SSAS developers get confused by, especially when it comes to the differences between a subselect and the WHERE clause – I recommend watching this short video if you’re unsure of what they do].

The following query with a WHERE clause generates a single Non Empty Begin/End event pair:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
[Adventure Works]
WHERE([Customer].[Total Children].&[5])

image

image

Probably the only value in knowing that this happens is that you can ignore it when you see it.

A query with a subselect instead of a WHERE clause, while it returns exactly the same results in this case, produces different activity in Profiler:

SELECT
{[Measures].[Internet Order Quantity]}
ON 0,
[Product].[Color].[Color].MEMBERS
ON 1
FROM
(SELECT {[Customer].[Total Children].&[5]} ON 0
FROM
[Adventure Works])

image

image

Again, if you have a subselect in your query (they are very common in MDX generated by SSRS and Excel) you should ignore this activity. In both cases you’ll note that the IntegerData column shows 3 for autoexists and ProgressTotal shows 0.

Summary

What’s clear from these examples is that trying to relate what’s going on in the query to what you see in Profiler is quite tricky even for seemingly simple queries; for most real-world queries it would be almost impossible to do so with total confidence. That said, when I’m tuning queries I usually comment out large parts of the code to try to isolate problems, thus creating much simpler queries, and I hope the value of this post will lie in you being able to spot similar patterns in Profiler to the ones I show here when you do the same thing. In part 3 of this series I’ll show you some practical examples of how all this information can help you tune your own queries.


Monitoring SSAS Multidimensional Non Empty Filtering Using Profiler, Part 3

$
0
0

In Part 1 of this series I introduced the different types of non empty filtering that occur in Analysis Services Multidimensional and in Part 2 I showed how you can use monitor this activity using Profiler. In this, the final part of the series, I’m going to show some examples of how you can use this information while tuning MDX queries.

Let’s start by looking at the following query:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

 

It returns 19004 rows – all of the combinations of Customer and Subcategory that have a value in the year 2003:

image

Here’s what you can see in Profiler:

image

There are two Non Empty operations here: the ProgressTotal column shows that first is the NON EMPTY statement on the rows axis; the second we can ignore because it’s the evaluation of the WHERE clause. The Duration column shows that the first Non Empty operation takes just 54ms and the query as a whole takes 1021ms.

Now, let’s make things a bit more complicated by adding an extra filter so we only see the Customer/Subcategory combinations where Internet Sales Amount is less than $10:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

image

Here’s what Profiler shows:

image

The query now takes 2512ms. But why is it slower? The obvious assumption to make is that it’s the Filter() that has slowed things down, but it looks like the Filter() and the NON EMPTY are now being evaluated as a single operation because the first Non Empty operation in the trace is now taking 2408ms – the majority of the query duration.

Removing the NON EMPTY statement from the rows axis and putting the logic to filter out the customers with no sales into the Filter() function, like so:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
,
[Measures].[Internet Sales Amount]<10
AND
(NOT ISEMPTY([Measures].[Internet Sales Amount])))
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

 

…only makes things worse, increasing query duration to 4139ms. This confirms our suspicion that Filter() is the problem here and that NON EMPTY can remove the empty customers faster than Filter() can.

The problem with the last query but one is that the NON EMPTY statement is being applied after the Filter(). Wouldn’t it be faster to remove the empty customers first and then filter out the ones where Internet Sales Amount is less than $10, so the slower Filter() can be applied over a smaller set?

There are two ways we can do this. First of all, we can use the NonEmpty() function instead of the NON EMPTY statement to remove the empty customers. NonEmpty() is not faster than the NON EMPTY statement per se, but it does allow us to change the order that the different types of filtering are applied here, and that can make all the difference to performance. Here’s a new version of the query:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
FILTER(
NONEMPTY(
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS,
[Measures].[Internet Sales Amount]),
[Measures].[Internet Sales Amount]<10)
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

image

Query duration is now down to 217ms and the first Non Empty operation is only 57ms.

There’s another way of doing this. For MDX geek-points you could use the ultra-obscure HAVING clause in your query to do the filtering after the NON EMPTY, like so:

SELECT
{[Measures].[Internet Sales Amount]}
ON 0,
NON EMPTY
[Customer].[Customer].[Customer].MEMBERS
*
[Product].[Subcategory].[Subcategory].MEMBERS
HAVING
[Measures].[Internet Sales Amount]<1000
ON 1
FROM
[Adventure Works]
WHERE([Date].[Calendar Year].&[2003])

From what I can see, the HAVING clause performs a few milliseconds faster than the previous query – measurable but not something a user would notice. I also tested a variation on Mosha’s classic calculated measure approach for Count/Filter optimisation but that performed worse than the two previous queries.


Profiler, Extended Events And Analysis Services

$
0
0

Last week one of the attendees on my SSAS cube design and performance tuning course in London told me that he had been prevented from running a Profiler trace on his SSAS Multidimensional instance by a DBA because “he should be using Extended Events”. He wasn’t too pleased, and I can understand why. This, plus the recent discussion about Profiler and Extended Events for the SQL Server relational engine provoked by Erin Stellato’s recent blog post on the subject, made me think it was worth writing a few words on this subject myself.

Microsoft is clear that in the long term, Extended Events (also commonly known as XEvents) are the replacement for Profiler. This page shows Profiler listed as functionality that will not be supported in a future version of SQL Server – although, importantly, it is not deprecated yet. Profiler is still officially supported for SSAS and the SQL Server relational engine in SQL Server 2016. What’s more, in my opinion it will be a long time before anyone doing serious performance tuning work with SSAS will be able to forget about Profiler and use Extended Events exclusively. Let me explain why.

First of all there is the fact that support for Extended Events in SSAS was only introduced with SSAS 2012. If you are using an earlier version you can’t use them. Even if you have some instances of 2012 or 2014 and some on earlier versions the desire to have a consistent set of tools for performance analysis and monitoring means you probably won’t want to use Extended Events yet.

Then there is the fact that in both SSAS 2012 and 2014 there is no user interface for working with Extended Events– instead, to create, start and stop a trace session you have to use XMLA commands. There are plenty of blog posts out there explaining how to do this but it’s still incredibly time-consuming and fiddly to do. Even in SSAS 2016, where there is a user interface for working with Extended Events and viewing their output, it’s pretty awful and nowhere near as good as Profiler (which is far from perfect itself, but at least useable). Perhaps at some point someone in the community will create a user-friendly tool for working with Extended Events, in the same way that the community created DAX Studio to make up for the shocking fact that even in SQL Server 2016 there is no proper support for running DAX queries in SQL Server Management Studio. I would prefer it if Microsoft did the job itself, though, and started taking tooling for BI seriously.

Thirdly, if you want to do anything useful with the .xel files beyond open them up in SQL Server Management Studio, you’re going to need to use some TSQL and functions like sys.fn_xe_file_target_read_file. What happens if you don’t have an instance of the SQL Server relational engine handy though? Most SSAS shops use SQL Server as their data source, but not all – some use Oracle, or Teradata, or other relational databases, and for them installing an instance of SQL Server somewhere just to work with .xel files many not be an option.

Ah, you say, but on the other hand Extended Events have many advantages over Profiler traces: for example, they are much more lightweight! As Adam Saxton says here:

[Extended Events] won’t have the same impact on performance that a traditional Profiler Trace has. For example, it is reported that 20,000 events/sec on a 2ghz CPU with 1GB of RAM takes less than 2% of the CPU.

If I was building a monitoring application (something like, say, SQL Sentry’s excellent Performance Advisor for Analysis Services) then this might be relevant. But 99% of the time I’m not running a Profiler trace on a Production server, I’m working on a dev or test server, and I always try to prevent other people doing stuff on a server while I’m doing tuning work too, so this is irrelevant. It’s a mistake to assume that Analysis Services users use Profiler for the same kinds of thing that SQL Server relational engine users do. For me, I use Profiler to get roughly the same kind of information that a SQL Server developer gets from a query plan: I use it to find out what’s going on in the engine when I run a single query, so the performance overhead is not something I care about.

That said, it certainly seems to be the case that Extended Events will provide more information than I can get from a Profiler trace and allow me to do more things with that data than I can with Profiler. In SSAS 2016 there are several events that are only available via Extended Events and not via Profiler, although I have no idea what they do; I’m sure, with a bit of research, I can find out. Will any of them be useful? I have no idea yet but I suspect a few will be.

Don’t get me wrong, I think Extended Events are a great technology and something all SSAS developers and administrators should learn. There’s still a lot of UI work to do by Microsoft before they are in a position to replace Profiler, but as I said earlier Microsoft hasn’t deprecated Profiler yet so it has given itself a lot of time to do this work. My only problem is with people like the aforementioned DBA who go around telling people they should be using Extended Events with SSAS right now because that’s what they’ve heard is the best practice and that’s what the recommendation is for the SQL Server relational engine. I’ll still need to use Profiler for a few more years yet.


What Monthly Updates For SQL Server Management Studio And SSDT Mean For SSAS Developers

$
0
0

Last month I upgraded the laptop I use for teaching to SQL Server 2016 and, as is always the way, not long afterwards I ran into a problem while I was doing a demo during a course: the SSAS Deployment Wizard didn’t work for SSAS 2016 Tabular projects created at the new 1200 compatibility level. I made some enquiries and was told that the tooling team hadn’t had time to do the necessary dev work before RTM but had finished it soon after, and that the new update would be in the July release of SQL Server Management Studio.

This reminded me of something I had heard about but forgotten, thinking it wasn’t all that relevant to me: SQL Server Management Studio is now no longer bundled with SQL Server but is instead a free, separate download that is updated monthly. This official blog post from May has all the important details; Aaron Bertrand’s blog post from April is a great summary of what this means for SQL Server in general (his interview with Kent van Hyning, the Engineering Manager for SQL Server Client Tools is also worth reading); and the SQL Server Release Services blog is the place to look for new update announcements. To get back to my story, when the July update for SSMS was released a few days ago I hit the Check For Updates menu option in SSMS, it updated, and I got a working version of the Deployment Wizard (the Changelog has a complete list of all of the fixes/improvements in this version). The moral of this story is that even though all of the publicity around the new direction for SQL Server Management Studio is directed at the SQL Server relational engine community, it’s equally important for us BI developers – and we are going to have to make the effort to update SSMS regularly from now on.

Of course a similar change has happened with SQL Server Data Tools, which is also now getting monthly updates. I’ve heard a few complaints around bugs and instability in SSDT recently and it looks like these issues are being fixed fairly rapidly: check out the list of SSAS-related fixes in the July update for SSDT in the post on the SSDT team blog and the associated Changelog. Again, as BI developers we’re going to have to get into the habit of updating SSDT on a regular basis.

While getting bugs fixed quickly is great, I really hope that this new focus on tooling means that we get new BI-related features in SSMS and SSDT. SSMS has, in particular, been shockingly bad at supporting BI developers: for example, I find it unbelievable that we still don’t have a DAX query window in SSMS when the language has been in SSAS Tabular since SQL Server 2012 (I know the community have filled the gap with DAX Studio, but that’s not the point). Aaron Nelson recently announced a Trello board where ideas for new features in SSMS can be debated but there’s very little BI-related stuff there at the time of writing.

Finally, a recent Reddit AMA with the SQL Server Tooling Team gave away a few interesting titbits about what might be happening in the future for BI:

  • In response to a question about version control for SSIS/SSAS/SSRS:
    In regards to AS and RS version control. We are considering improvements to AS model where we actually break up the single model file into independent objects (tables, measure, roles, etc…).
  • Regarding MSBuild support for SSAS and SSRS:
    At least for SSAS and SSRS, we have MSBuild support on the backlog, but it isn’t the highest priority at the moment.
    The SSIS team has plans for MSBuild support and are considering to support this in a future release of SSDT. Stay tuned!
  • Regarding tooling support for the new SSAS 2016 1200 compatibility level:
    The plan is to gradually improve the Tabular experiences in SSDT and SSMS, particularly for the new compat level 1200. We now have a better scripting language (TMSL) and a way better object model (TOM) and this sets us up for introducing improvements with the upcoming monthly releases. If everything goes to plan, you will see a significant new capability in the SSDT Tabular August release, and it will give you an idea where things are going. Sorry for not giving more details at this time. Let’s count the chickens when they hatch.
  • On the question of why there’s no support for DAX in SSMS:
    These things simply didn’t fit into the SQL Server 2016 release timeframe. You’ll see improvements in future versions of the tools.

Excel 2016 PivotTable MDX Changes Lead To Big Query Performance Gains

$
0
0

Yesterday, the Excel team announced a bunch of performance improvements in Excel 2016 for PivotTables connected to Analysis Services (Multidimensional or Tabular), Power Pivot and Power BI. If you haven’t read the official blog post already, here it is:

https://blogs.office.com/2016/07/07/faster-olap-pivottables-in-excel-2016/

In that post, Microsoft point out that how much of a performance increase you get will depend on a number of factors. I guess they have to do this to manage expectations. However I’m going to come right out and say that these changes are probably the most important thing that has happened in the last five years for Analysis Services or Power Pivot query performance and if you are using Excel PivotTables with SSAS or Power Pivot for reporting, you should upgrade to Excel 2016 (you’ll need the Office 365 click-to-run version) just for these changes.

The key improvement that Microsoft have made here is to stop Excel bringing back unnecessary subtotal values in the MDX queries that it generates. This has been a gigantic problem for years and several people have blogged about it: I did here, and among other people Rui Quintino and Richard Lees have posts on the subject that are worth reading. A lot of my consultancy work involves SSAS and MDX performance tuning and I think I see around five to ten customers every year who are seriously affected by this – and I’m sure this is just the tip of the iceberg. Even if you don’t think you are affected, you may have users who are putting up with slow refresh times for Excel-based reports without your knowledge.

Let’s go into some technical detail, see what the original problem was and what has changed in Excel 2016. I’m going to provide some examples using Excel 2013/2016 against SSAS Multidimensional but as I said the same problems occur (although less severely) with SSAS Tabular and Power Pivot.

Consider the following Excel 2013 PivotTable connected to the Adventure Works cube:

image

image

It’s typical of the kind of thing an end user wants to build, and there are two things to point out:

  1. Notice there are three fields that have been placed on the Rows axis of the PivotTable: the Gender and Marital Status hierarchies from the Customer dimension, and the Category hierarchy from the Product dimension. It’s pretty common for users to put many more fields together on an axis when they want to display detailed data.
  2. The user has also turned off grand totals and subtotals in the PivotTable so that only the detailed data is shown. This often happens in combination with (1) because if the user only wants to see the detailed data the subtotals and grand totals are a distraction.

In this case the PivotTable layout has been switched Tabular and the Repeat All Item Labels option is selected to make the PivotTable look like a table, but these are just aesthetic changes.

This PivotTable contains 12 rows, but if you look at the MDX query that is generated by Excel 2013 to populate it, it returns 36 rows of data (actually the MDX in this case returns 36 columns, but that’s a quirk of Excel) because the subtotals and grand total that Excel is not displaying are still being returned. Here’s the query:

SELECT
NON EMPTY
CrossJoin(
CrossJoin(
Hierarchize(
{DrilldownLevel(
{[Customer].[Gender].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)}),
Hierarchize(
{DrilldownLevel({[Customer].[Marital Status].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)})),
Hierarchize(
{DrilldownLevel({[Product].[Category].[All Products]}
,,,INCLUDE_CALC_MEMBERS)}))
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME
ON COLUMNS
FROM
[Adventure Works]
WHERE ([Measures].[Internet Sales Amount])
CELL PROPERTIES VALUE, FORMAT_STRING,
LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

And here’s a sample of what it returns, with the unwanted values highlighted:

image

You can probably guess that these unwanted subtotals and grand totals make the query slower, and the more fields you put together on the rows or columns of a PivotTable the greater the number of subtotals/grand totals are returned and slower things get. PivotTables connected to SSAS or Power Pivot often become completely unusable with more than six or seven fields put together on the same axis, and the only workarounds before Excel 2016 are either to write custom MDX in a named set (I show how to do this here) or if you are using SSAS Multidimensional try to use member properties instead of separate fields (as Richard Lees shows here), but neither are really great alternatives.

Building the same PivotTable using Excel 2016, however, gives the following MDX:

SELECT
NON EMPTY
CrossJoin(
CrossJoin(
Hierarchize(
{[Customer].[Gender].[Gender].AllMembers}),
Hierarchize(
{[Customer].[Marital Status].[Marital Status].AllMembers})),
Hierarchize(
{[Product].[Category].[Category].AllMembers}))
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME
ON COLUMNS  FROM [Adventure Works]
WHERE ([Measures].[Internet Sales Amount])
CELL PROPERTIES VALUE, FORMAT_STRING,
LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

Which returns just the data needed for the PivotTable:

image

What has changed? Focusing on just the Gender hierarchy, the difference between the two queries is that in Excel 2013 the selection on Gender is given using the DrillDownLevel() function on the All Member of the hierarchy. Isolated and put in its own query, the set expression used looks like this:

SELECT
{[Measures].[Internet Sales Amount]}
ON COLUMNS,
DrilldownLevel(
{[Customer].[Gender].[All Customers]}
,,,INCLUDE_CALC_MEMBERS)
ON ROWS
FROM
[Adventure Works]

image

Notice how the All Member, used by Excel to return subtotals and grand totals, is returned by the set. Excel 2016 instead just asks for the members on the Gender level of the Gender hierarchy, not including the All Member. Again, put inside its own query, you can see what it returns:

SELECT
{[Measures].[Internet Sales Amount]}
ON COLUMNS,
{[Customer].[Gender].[Gender].AllMembers}
ON ROWS
FROM
[Adventure Works]

image

Apart from the fact that no All Member is returned, it’s also a lot cleaner and easier to read.

I’ve been very excited about this change since I first heard it was happening, and I think it will benefit a lot of people. A big thanks to Alexander Lahuerta and the rest of the Excel team for making this happen!


What’s New In Analysis Services 2016 Multidimensional?

$
0
0

With the recent release of SQL Server 2016 some of you old faithful SSAS Multidimensional users out there might be wondering if it’s worth upgrading. There is an official page describing what’s new in Analysis Services 2016 here:

https://msdn.microsoft.com/en-us/library/bb522628.aspx

…but it’s mostly concerned with SSAS Tabular and, unfortunately, it’s somewhat misleading. For example, it says:

a number of enhancements have been made to multidimensional models; for example, distinct count ROLAP optimization for data sources like DB2 and Oracle, drill-through multi-selection support with Excel 2016, and Excel query optimizations.

In fact the drillthrough/multi-select improvements (which I blogged about here) already shipped as part of SSAS 2014 and are reliant on improvements in Excel 2016 as much as in SSAS; similarly the Excel 2016 query optimisations are not reliant on any changes in SSAS 2016 and will benefit users of all versions of SSAS.

So what has actually changed with SSAS 2016 Multidimensional? I don’t know all the details on every change, but here’s what I know right now:

  • As the above quote shows, there have been improvements in ROLAP distinct count performance for DB2 and Oracle. I’ve also heard there are improvements for the SQL generated in ROLAP mode to take advantage of SQL Server’s columnstore indexes.
  • We now have Database Consistency Checker for SSAS, which you can read more about here or in Dustin Ryan’s post here. For Multidimensional it will only tell you whether your partition indexes are corrupt (it does a lot more for Tabular) and if they are you need to delete the database and then either restore it or re-deploy/reprocess.
  • Extended Events for SSAS now have UI support in SQL Server Management Studio, although the UI still looks like it needs some work. I still see myself using Profiler for my SSAS performance tuning work for the foreseeable future – or at least until I work out what the extra information that Extended Events give you is useful for. I know other people, like Bill Anton, are more excited about Extended Events and their possibilities though.
  • You can now use computer accounts to be members of the Analysis Services Administrators group in SQL Server Management Studio.
  • If you are developing applications that need to create, alter, process or otherwise do stuff to an Analysis Services database in .NET code then you need to be aware that Analysis Management Objects (AMO) has been refactored to include a second assembly which “paves the way for future extensions to AMO, with clear division between generic and context-specific APIs”.
  • There’s a new default setting for the MemoryHeapType server property that helps to avoid memory fragmentation and is relevant to Tabular and Multidimensional; Marco Russo has the details here.

Not the most exciting or inspiring set of changes, I have to say, even if there’s lots of cool new stuff in SSAS Tabular 2016. There are also usually other fixes and improvements like the MemoryHeapType property that get added but are never documented, and when I find out about them I’ll be sure to blog or tweet. If you find any please let me know!



Deprecated And Discontinued Functionality In SSAS 2016

$
0
0

Some time ago I blogged about the deprecated and discontinued functionality in SSAS 2014, so I thought it would be a good idea to follow my last post on what’s new in SSAS 2016 Multidimensional with a discussion of what’s going or gone from it.

The same page that I linked to last time has been updated for 2016, and there are four more subpages with all the details. There’s nothing much interesting to say about the breaking changes (basically AMO has been rejigged) or behaviour changes (there’s no in-place upgrade for Tabular models using DirectQuery – you have to open the project and edit some settings) but the other two pages do have some news worthy of comment:

Discontinued Functionality

Here’s the official definition of a discontinued feature:

A discontinued feature is one that is no longer supported. It might also be physically removed from the product.

A few comments about what is now discontinued:

  1. The Non_Empty_Behavior property for calculated measures. To be honest, I’m happy to see this go: it doesn’t usually make much difference to performance and in most cases people use it incorrectly too. It really should be removed from Form View in the SSDT cube editor.
  2. COM assemblies. Note that this is not the same thing as .NET assemblies like the Analysis Services Stored Procedure Project! If you are using a custom MDX function implemented in a COM assembly you will probably find that the equivalent function implemented in a .NET assembly is a lot slower, but in my experience it’s almost always possible to avoid custom functions completely and use pure MDX – and this will give you the best query performance.
  3. I’m a bit sad to see the CalculationCurrentPass() and CalculationPassValue() functions die because you could do cool things like this with them, but I haven’t actually needed to use them for a long, long time.

Deprecated Functionality

Here’s the official definition of a deprecated feature:

A deprecated feature is a feature will be cut from the product in a future release, but is still supported and included in the current release to maintain backward compatibility. Typically, a deprecated feature is removed in a major release, often within two releases of the original announcement. For example, deprecated features announced in SQL Server 2012 are likely to be unsupported by SQL Server 2016.

A few comments on what’s been deprecated:

  1. The only thing that someone may possibly be using on the list of features that will not be supported in the next major release is linked dimensions; remote partitions and remote linked dimensions were always a very bad idea.
  2. The death of dimension writeback (note: not the same thing as writing values back to cells) is a bit of a shame: I never needed to use it, and most client tools didn’t support it, but it always struck me as one of those features that people might have used more if they had known about it.
  3. I see session cubes (as used by Excel PivotTables’ grouping functionality) will not be supported in a future release, which is probably a good thing given all the problems they cause. However I bet there are a lot of Excel workbooks out there that will be affected when this does go.
  4. Local cubes will also go in a future release, though I doubt anyone uses them anymore. That said, I think it would be really useful if there was something similar that allows you to create a Power Pivot model that was a local copy of an SSAS Tabular database, with the option to filter the data in it. Offline access is not so important these days but this would allow users to create their own customised Power Pivot models from a properly designed, central model rather than always having to start from scratch.
  5. Profiler for trace capture is also deprecated, and I discussed this in-depth here.

 


Creating OR Filters With Cube Functions For Power Pivot And Analysis Services Reports In Excel

$
0
0

The Excel Cube Functions are incredibly powerful, and I’m still amazed at the kind of problems they can solve. This post describes how they can be used to build a report with a complex OR filter using data from Power Pivot (it’s equally applicable to SSAS) that shows a number of advanced uses of these functions. No knowledge of MDX or DAX is needed but if you’re new to the Excel Cube Functions I recommend that you watch this video of a presentation by Peter Myers, which provides an excellent introduction to them.

The Problem

Imagine you’ve got a very simple Power Pivot model that looks like this:

image

There’s a Date table with dates, months and years in, and an Internet Sales table with sales data in and a measure called Sales Amount. Here’s what the data looks like in a PivotTable:

image

Now, imagine that you want a report with the Sales Amount measure on columns and Years on rows, and you want to filter the data so that you only see values for Mondays in July or Wednesdays in September. Using the Fields, Items and Sets functionality you could filter the data to only show the day/month combinations you need for each year, but since you can’t put a named set into the Filter area of a PivotTable you would have to use Excel formulas to sum up the combinations to get the totals you need:

image

Lukcily it is possible to build the report you need using the Cube Functions! Here’s how:

Step 1: Build Your Combinations Using CubeMember()

The first thing to point out is that the CubeMember() function does not have to just return a member, it can return a combination of members (in MDX this is known as a tuple). These combinations can be built in several ways, one of which is by using cell references to other cells that themselves contain CubeMember() functions. It’s probably easier to explain this by showing a worksheet that contains six cells with CubeMember() functions in. Here it is with the formulas visible:

image

Cells B5 and B10 contain references to days of the week; cells B6 and B11 contain references to months. Cells B7 and B12 contain CubeMember() functions that return the combinations we want to filter by: Mondays in July and Wednesdays in September respectively.

Here’s what the formulas above return:

image

You’ll notice that the ‘combination’ cells only show the month names, not the day/month combinations – this is just a feature of the CubeMember() function and can be a bit misleading, but rest assured they do return the combinations you need.

Step 2: Build A Set Using CubeSet()

The CubeSet() function is also able to build sets using cell references to cells containin CubeMember() functions. In this case I want a set containing the two ‘combination’ CubeMember() functions from B7 and B12. I can do this by using the formula:

=CUBESET($B$2,($B$7,$B$12), "The set of combinations")

Here are the formulas on the worksheet at this point:

image

And here’s the output:

image

Step 3: Reference The CubeSet() Function In Your Report

Now you have a CubeSet() function that returns the two day/month combinations, you can use this in a cube function report. When you reference a cell containing the CubeSet() function in a CubeValue() formula, the CubeValue() formula will return the aggregated value of all of the combinations in the CubeSet(). So for example, here’s a report with the Sales Amount measure on columns, Years on rows, and displaying the Sales Amount for each year filtered by the two day/month combinations:

image 

And here’s the actual output:

image

Compare the numbers from the report at the bottom with the values calculated from the PivotTable in the screenshot earlier in this post, and you’ll see that we have indeed shown just the combined Sales Amount for Mondays in July and Wednesdays in September, broken down by Year.

You can download the example Power Pivot workbook for this post here.


SSAS Multidimensional Caching-Related Performance Problems With Power BI Tables

$
0
0

Last week I was doing some performance tuning for a customer using Power BI on top of a large SSAS Multidimensional cube via a Live connection. Some of their reports were performing particularly badly and I uncovered a problem with the DAX generated by Power BI for tables with a large number of rows, fields from two or more large hierarchies and totals turned on.

The problem is very easy to reproduce; I’m going to do it using a simplified version of the Adventure Works cube that contains only the Date and Product dimensions. Take a look at the following table from a Power BI report:

image

It has the Date attribute from the Date dimension, and the Product attribute from the Product dimension, on rows and two measures on columns. The table has a large number of rows in it (both Date and Product are fairly large hierarchies) and if you look at the DAX generated by Power BI you can see that it only requests the first 501 rows. That’s fine – the real problem is that Power BI also generates a second DAX query to get the two values displayed in the Total line at the bottom of the table. The DAX looks something like this, and is easily identifiable because it uses the Row() function:

EVALUATE
  CALCULATETABLE(
    ROW(
      "Sales_Amount", 'Internet Sales Facts'[Sales Amount],
      "Tax_Amt", 'Internet Sales Facts'[Tax Amt]
    ),
    KEEPFILTERS(
      GENERATE(
        KEEPFILTERS(VALUES('Product'[Product.Key0])),
        CALCULATETABLE(
          FILTER(
            KEEPFILTERS(VALUES('Order Date'[Date.Key0])),
            OR(
              NOT(ISBLANK('Internet Sales Facts'[Sales Amount])),
              NOT(ISBLANK('Internet Sales Facts'[Tax Amt]))
            )
          )
        )
      )
    )
  )

This query has something in it – I don’t know what – that means that it cannot make use of the Analysis Services Storage Engine cache. Every time you run it SSAS will go to disk, read the data that it needs and then aggregate it, which means you’ll get cold-cache performance all the time. On a big cube this can be a big problem. This is very similar to problems I’ve seen with MDX queries on Multidimensional and which I blogged about here; it’s the first time I’ve seen this happen with a DAX query though. I suspect a lot of people using Power BI on SSAS Multidimensional will have this problem without realising it.

This problem does not occur for all tables – as far as I can see it only happens with tables that have a large number of rows and two or more hierarchies in. The easy way to check whether you have this problem is to refresh your report, run a Profiler trace that includes the Progress Report Begin/End and Query Subcube Verbose events (and any others you find useful) and then refresh the report again by pressing the Refresh button in Power BI Desktop without changing it at all. In your trace, if you see any of the Progress Report events appear when that second refresh happens, as well as Query Subcube Verbose events with an Event Subclass of Non-cache data, then you know that the Storage Engine cache is not being used.

image

Also look at the Duration column in the trace for these events which shows the time in milliseconds that they took to execute. This will tell you how much of an impact this problem is having on your report refresh times.

The easy way to stop this happening is to turn off the totals row in the table:

image

Displaying the totals in a separate table also seems to avoid the problem, although of course it doesn’t look as good. Only putting one hierarchy in the table apart from your measures, also seems to solve the problem. You could also try all the usual methods to improve SSAS performance such as building aggregations on the cube.

The dev team is aware of this problem but it’s unlikely to be fixed in the short term.


Defining Variables In DAX Queries

$
0
0

Variables are the best thing to happen to DAX since, well forever – they are so cool I’m almost ready to like DAX as much as I like MDX. There are already several good articles and blog posts out there describing how to use them (see here and here), but I was looking at a Profiler trace the other day and saw something I hadn’t yet realised about them: you can declare and use variables in the DEFINE clause of a DAX query. Since my series of posts on DAX queries still gets a fair amount of traffic, I thought it would be worth writing a brief post showing how this works.

Say you have the following table (called Sales) in your model:

image

You can declare DAX variables in the DEFINE clause of a query like so:

DEFINE
    VAR MyMonth = "January"
    VAR FilteredMonths =
        FILTER ( VALUES ( Sales[Month] ), Sales[Month] = MyMonth )
EVALUATE
CALCULATETABLE ( Sales, FilteredMonths )

This query returns the following result:

image

The benefits of using variables in this way are the same as you get when using variables in measures and calculated columns: improved readability, less repetition and potential performance improvements.

I also wondered whether I would be able to refer to these variables inside measures declared in the DEFINE clause, but unfortunately you can’t. The following query:

DEFINE
    VAR MyMonth = "January"
    VAR FilteredMonths =
        FILTER ( VALUES ( Sales[Month] ), Sales[Month] = MyMonth )
    MEASURE Sales[FirstMeasure] =
        COUNTROWS ( FilteredMonths )
EVALUATE
ROW ( "First Measure", [FirstMeasure] )

…returns the error

“Failed to resolve name ‘FilteredMonths’. It is not a valid table, variable or function name”.

However if you define your calculations inside the query itself, for example using the Summarize() or AddColumns() functions, or like so:

DEFINE
    VAR MyMonth = "January"
    VAR FilteredMonths =
        FILTER ( VALUES ( Sales[Month] ), Sales[Month] = MyMonth )
EVALUATE
ROW (
    "First Calc", COUNTROWS (FilteredMonths),
    "Second Calc", CALCULATE (SUM(Sales[Sales]), FilteredMonths)
)

…the query works:

image

In a lot of cases, multiple calculations contain some of the same logic and being able to use variables to share tables and values between calculations opens up some really interesting opportunities for performance optimisations.


Nested Variables In DAX

$
0
0

Last week, at the SQL Server Days conference in Belgium, Kasper mentioned in his presentation that it was possible to define variables inside variables in DAX. So, for example, you could define a measure like so:

MyMeasure =
var Outer1 =
               var Inner1 = 1
               var Inner2 = 2
               return Inner1 + Inner2
var Outer2 = 3
return Outer1 + Outer2

This measure returns 6 as you might expect:

image

There aren’t any performance benefits to doing this, although of course it helps with code readability and organisation (thanks to Marius for confirming this).

With my newly rekindled love of DAX I thought this was quite interesting. I’m not really sure why though, given that it’s not particularly useful; I think Matt might be right:

image


SSAS Documentation With Squarehouse Commentary

$
0
0

A few months ago I found out about a new tool for managing and viewing Analysis Services Multidimensional documentation called Squarehouse Commentary. I’ve now had the chance to play around with it and talk to the company that makes it and I have to say I’m impressed: in my opinion it’s by far the best tool of its type that I’ve seen. Rather than generating a large pile of documentation that no-one will read and which will get left, forgotten on a fileshare or a SharePoint site, it allows you to synch the descriptions created in its own web application with the description properties in your SSAS cubes and dimensions. It also automatically generates actions so that end users can easily view rich web-based documentation for hierarchies and measures direct from whatever client tool they are using in a few clicks.

This video does a good job of showing off the main features:

Overall, definitely something to check out. It doesn’t support Tabular models yet, but I’m told that this is in the planning stages.


First Thoughts On The Integration Of Power Query/M Into Analysis Services Tabular

$
0
0

Last Friday’s big news was the release of the first CTP for Analysis Services v.next. Among several major new pieces of functionality (Ragged hierarchies! Drillthrough that works properly, even for calculations! Table-level security!) probably the biggest is the integration of Power Query/M into Analysis Services. As you can probably guess, I’m incredibly pleased that my two favourite technologies have got together. The technical details are given in this blog post, which I suggest you read if you haven’t done so already, but what I think is missing is an explanation of why this is so important and what kind of opportunities it opens up – hence this post. Of course this is just my take on the subject and not what Microsoft may actually thinking; it’s also very early days, so as the functionality develops and I have more chance to think about this my opinions may change. If you have any ideas on this subject I would be interested to hear them so please leave a comment!

Why this had to happen: Power BI

There is an obvious reason why Microsoft decided to integrate Power Query/M into SSAS, and that is because it needs to support the conversion of Power BI models into Analysis Services Tabular models. There are two scenarios where this will be necessary.

The first is the ability to convert a Power BI model into an Azure Analysis Services Tabular model (listed as ‘planned’ here), something that will be a key selling point for Azure Analysis Services when it releases. The engine behind Power BI is essentially the same as the one used in Analysis Services so migrating the data model should be straightforward, but since Power BI uses Power Query/M to load data then a migrated Azure Analysis Services model will also have to use Power Query/M.

The second scenario is similar to the first. We now know that on-premises Power BI will be delivered through Reporting Services v.next, and it’s reasonable to assume Reporting Services will need a database engine to store the data for published Power BI reports. That engine will have to be an Analysis Services instance of some kind (either standalone or running in-process inside Reporting Services) and again for that to work Analysis Services will have to support the same data access mechanisms as Power BI.

Better support for a larger number of data sources

I’ve just argued why Microsoft was obliged to include this functionality in SSAS v.next but in fact there are many positive reasons for doing this too. The most obvious one is to do with support for more data sources. At the moment SSAS Tabular supports a pretty good range of data sources, but the world of BI is getting more and more diverse and in order to stay relevant SSAS needs to support far more than it does today. By using Power Query/M as its data access mechanism, SSAS v.next will immediately support a much larger number of data sources and this number is going to keep on growing: any investment that Microsoft or third parties make for Power BI in this area will also benefit SSAS. Also, because Power Query/M can query and fold to more than just relational databases, I suspect that in the future this will allow for DirectQuery connections to many of these non-relational data sources too.

Different data sources for partitions in the same table

Another benefit of this change is that we’ll have a lot more flexibility with partitioning tables in an SSAS Tabular model. As the blog post says:

As long as a partition’s M query adheres to the column mappings of the table, you are free to perform any transformations and pull in data from any data source defined in the model.

In SSAS 2016 the partitions in a table all have to get data from the same data source whereas in v.next we’ll be able to get data from different data sources in different partitions, and this opens up some interesting new possibilities. For example, I can imagine a simple budgeting application where the partitions in a table get data from different Excel workbooks stored in OneDrive for Business, and where the each partition gets processed automatically when changes are saved to one of these workbooks.

Does this replace SSIS and my data warehouse? 

The short answer is no. Power Query/M is not a full-featured ETL tool and I don’t think it ever will be; it certainly does not have the kind of functionality needed to perform enterprise-level ETL right now. My view is that Microsoft have built Power Query/M into SSAS for the reasons above and not to encourage enterprise SSAS users to do their own quick-and-dirty ETL when loading data (although there is a risk that that will happen anyway). That said, I think the dividing line between corporate and self-service BI will become increasingly blurred over the next few years as the Microsoft BI stack develops, and we’ll see Analysis Services being used in self-service scenarios as well as the more traditional corporate ones.

Centralised data source objects

One last thing to point out is that the way SSAS v.next makes a distinction between data sources and other queries is very interesting. In Power BI and Power Query it’s easy to end up with data source connection information duplicated across multiple queries unless you know what you’re doing, and this can cause no end of problems later on in a project. As far as I can see, in SSAS v.next a “data source object” is an M query that only contains the connection to external data, while all other queries have to reference a data source to be able to access external data. This means, as the blog post says:

Referring to data source objects helps to centralize data source settings for multiple queries and simplifies deployments and maintenance if data source definitions must be updated later on. When updating a data source definition, all M queries that refer to it automatically use the new settings.

I wonder whether this concept is coming to Power BI and Power Query at some point? I hope so – it makes a lot of sense.



The Show Hidden Cubes SSAS Connection String Property

$
0
0

If you need to write queries in SQL Server Management Studio against an SSAS cube that has its Visible property set to false, you have a problem because when you connect to your database you can’t see the cube or its metadata! Luckily you can override this and make the cube visible by setting the following connection string property when you connect:

Show Hidden Cubes=true

image

Connection string properties can be set in the Additional Connection Parameters tab of the connection dialog that appears when you open a new MDX query window.

Unfortunately this doesn’t make any objects in the cube that are not visible, like measures or dimensions, visible again – it just makes the cube itself visible. However, if you’re working on the Calculations tab of the Cube Editor in SSDT it is possible to make all hidden objects visible as I show here.


More Detail On Detail Rows Expressions In SSAS Tabular V.Next

$
0
0

My second-favourite feature in SSAS Tabular v.next after Power Query integration is the Detail Rows expression property for measures – it not only brings drillthrough on measures to Tabular, it means that we can define meaningful drillthrough on any measure, no matter how it is calculated. There’s a basic description of the functionality in this blog post but I thought it would be useful to walk through a simple example showing how it can be used.

Consider a simple SSAS Tabular model with two tables in it. First, a table containing sales data called Sales:

image

Second, a date table called Date:

image

[Note: dates are in DD/MM/YYYY format]

There are two measures with the following definitions:

Total Sales:=SUM(Sales[Sales])

Total YTD Sales:=TOTALYTD([Total Sales], 'Date'[Date])

The measure Total YTD Sales gives the running total of sales from the beginning of the current year. I know it doesn’t follow Marco and Alberto’s best-practice pattern but I wanted to keep things simple on the DAX front…

Browsing the model in an Excel PivotTable gives the following result:

image

At this point if you double-click on cell C6 in the PivotTable you get the following, not very useful result, on a new worksheet:

image

What drillthrough does in SSAS Multidimensional, and what the new Detail Rows Expression property in SSAS Tabular v.next does, is allow an end user to see the detail-level data (usually the rows in the fact table) that was aggregated to give the value the user clicked on in the original PivotTable.

For the Total Sales measure, this property can be set with a DAX expression that returns a table something like this:

SELECTCOLUMNS(
	'Sales',
	"Date", 'Sales'[Date],
	"Sales Value", [Total Sales]
)

[For more details on the SelectColumns() function, see here]

image

Now when you click on cell C6 in the PivotTable you get the result of the table expression above filtered by the context of the cell you’ve clicked on – in this case, the date 4/1/2017. What appears in the new worksheet is data from the row from the Sales table for 4/1/2017:

image

This is already better than SSAS Multidimensional drillthrough because as a developer you have control over the column headers displayed in this table (in Multidimensional drillthrough the column names come out in a ridiculously user-unfriendly format) and the order that they are displayed in (which is equally painful to control in Multidimensional).

Now, consider cell D6 in the PivotTable, the cell that shows the year-to-date sales amount for 4/1/2017. If a user double-clicked on this cell they would expect to see all of the rows from the Sales table from 1/1/2017 to 4/1/2017, the rows whose sales have been aggregate to give the YTD total.

This can be achieved using the following expression in the Detail Rows Expression for the Total YTD Sales measure:

CALCULATETABLE(
	SELECTCOLUMNS(
		'Sales',
		"Date",
		'Sales'[Date],
		"Sales Value",
		[Total Sales]
	),
	DATESYTD('Date'[Date])
)

image

Now, double-clicking on cell D6 in the PivotTable gives the following table:

image

It may not look all that impressive, but there are few words that can describe how happy this makes me feel. This is exactly what is not possible with drillthrough in SSAS Multidimensional, and why drillthrough in Multidimensional has always been so frustrating to use. It’s great to see the feature properly implemented in Tabular.


Finding Out Which MDX Calculations Are Being Evaluated By Your Query In Analysis Services Multidimensional, Part 1

$
0
0

Since Analysis Services 2012 there have been two trace events that provide a lot of information about what’s going on in the Formula Engine when you run a query in Analysis Services Multidimensional: Calculation Evaluation and Calculation Evaluation Detailed Information. The problem is that they are not properly documented anywhere and they provide so much information that it’s difficult to interpret what they are telling you. This post on Thomas Ivarrsson’s blog (which I strongly advise you to read before you carry on) with information provided by Akshai Mirchandani of the dev team  is the only place that has any details about them and unfortunately it’s by no means comprehensive.

I don’t have the knowledge to provide a full description of these two trace events, so instead in this series of posts I want to do something less ambitious but hopefully still useful: show how you can use them to find out which MDX calculations are being evaluated when you run a query, which is of course going to be useful if you are trying to tune that query. It’s not always as easy as you might think to work out which calculations are referenced by a query: for example financial cubes often have hundreds of calculated members and/or scoped assignments, many of which are dependent on other calculations.

Here’s a super-simple example to start off with. Imagine you have a cube with just one regular measure, Sales Amount, and just one calculated measure with the following definition:

CREATE MEMBER
CURRENTCUBE.MEASURES.[Sales Forecast] AS
[Measures].[Sales Amount] * 2;

Now, consider the following query:

WITH
MEMBER MEASURES.X as 123
SELECT
{[Measures].[Sales Forecast]}
ON 0
,
[Date].[Date].[Date].MEMBERS
ON 1
FROM
[test]
CELL PROPERTIES VALUE

image

The query returns the Sales Forecast calculated measure on columns and every member on the Date level of the Date hierarchy on rows – so not all that interesting. However there are two things to point out:

  • The WITH clause has a calculated measure that isn’t used in the query. The reason I’ve put this in the query is to stop the Formula Engine from caching the results of any MDX calculations for longer than the lifetime of the query (see here for more details); it doesn’t affect the Storage Engine cache however. This means that every time the query is run you know that all the calculations will be evaluated and that you’ll be able to see any related activity in Profiler, and that you can run the query on a warm Storage Engine cache and won’t see many Storage Engine-related events.
  • The CELL PROPERTIES clause only returns the VALUE property and not the FORMAT_STRING property which is normally returned as well. This reduces the number of Calculation Evaluation events that are raised in Profiler when the query runs and makes it easier to see the important information.

With a Profiler trace that includes the Calculation Evaluation and Calculation Evaluation Detailed Information events, when you run the query above you’ll see this:

image

There are a lot of events generated in the trace even for this simple query, but the important thing to look for is the line highlighted in the screenshot above: a Calculation Evaluation Detailed Information event with the following event subclass:

107 – RunEvalNode Finished Calculating Item

Any time you see this event you know that a calculation has been evaluated in bulk mode for a subcube (ie an area of cells) in your cube. You may see more than one RunEvalNode event for the same calculation in the same query if it was evaluated for more than one subcube.

The contents of the TextData column (which is displayed in the lower half of the screen in Profiler) for the RunEvalNode event in the trace shown above are as follows:

image

As you can see, it tells you the MDX expression that has been evaluated for the subcube. It also tells you the name of the calculated measure, but it’s the MDX expression that’s important here because scoped assignments that overlap with a single calculated measure could mean that many different MDX expressions must be evaluated for that calculated measure.

Now for the bad news: you won’t see a RunEvalNode event for any calculations that are evaluated in cell-by-cell mode. You probably know that inefficient or badly-written calculations are often evaluated in cell-by-cell mode, which is usually slower than bulk mode, but there are cases where the Formula Engine evaluates a perfectly good calculation in cell-by-cell mode because it’s the right thing to do. For example, take a look at the following query:

WITH
MEMBER MEASURES.X as 123
SELECT
{[Measures].[Sales Forecast]}
ON 0
FROM
[test]
CELL PROPERTIES VALUE

 

image

It’s basically the same query as the one above but with the Rows axis removed, so it only returns a single cell. In Profiler you won’t see a RunEvalNode event because in this case the Sales Forecast calculation is evaluated in cell-by-cell mode.

That said you will see other events relating to the evaluation node for the Sales Forecast calculation, such as the Calculation Evaluation event shown here, the last for this node (NodeIndex=0, the same value that is shown in the IntegerData column) in the trace:

image

Notice also the LazyEvaluation tag which is 1, which indicates a calculation that is evaluated in cell-by-cell mode.

So, to sum up, there are two ways to see which calculations are referenced by your query. With a Profiler trace and that includes the Calculation Evaluation and Calculation Evaluation Detailed Information events:

  1. If your calculation is evaluated in bulk mode you will see a Calculation Evaluation Detailed Information event with the Event Subclass 107 – RunEvalNode Finished Calculating Item.
  2. If your calculation is evaluated in cell-by-cell mode you will see Calculation Evaluation events for the Init-Build-Prepare stages of the evaluation node.

In the next post in this series I’ll look at a more complex scenario that shows some unexpected behaviour by SSAS.

[I am extremely grateful to Akshai Mirchandani for answering a lot of questions relating to this topic. If you want to learn more about the internals of the Formula Engine there are two other useful resources: this post by Jeffery Wang, also of the dev team, and chapter 29 of the book “Microsoft SQL Server 2008 Analysis Services Unleashed”]


Finding Out Which MDX Calculations Are Being Evaluated By Your Query In Analysis Services Multidimensional, Part 2

$
0
0

In part 1 of this series I showed how you can use Profiler to find out which MDX calculations are being evaluated when a query runs on SSAS Multidimensional. In this post I’ll show a practical example of why this is so useful: a situation where SSAS evaluates a calculation that isn’t needed by a query.

Do you have a Date Tool dimension (also known as a Shell dimension or Time Utility dimension) in your cube? A lot of enterprise-level SSAS cubes use this technique to allow you to write a calculation once and have it apply to multiple measures. There are two main approaches to implementing Date Tool dimensions:

  • You can create a dimension with one hierarchy and one real member and then use calculated members for your calculations, or
  • You can create a dimension with one hierarchy and as many real members as you need calculations, and then use SCOPE statements on these members for your calculations

The second approach, described in detail in this article, is very popular but over the years I have seen several cases where customers of mine who use it have suffered from unexplained query performance problems, problems that have been solved by using the calculated member approach instead. It turns out that the Calculation Evaluation and Calculation Evaluation Detailed Information Profiler events can shed some light on the causes of these problems.

Here’s a simple test cube with a Date Tool dimension that has three real members on it:

image

Here’s the contents of the MDX Script, copied from the Calculations tab in the Cube Editor in SSDT:

CALCULATE;

SCOPE([Date Calc].[Date Calc].&[2 PPG]);
    THIS = ([Date Calc].[Date Calc].&[1 Value],
            [Date].[Calendar].CURRENTMEMBER.PREVMEMBER);
END SCOPE;

SCOPE([Date Calc].[Date Calc].&[3 YTD]);
    THIS = AGGREGATE(
            YTD([Date].[Calendar].CURRENTMEMBER),
            [Date Calc].[Date Calc].&[1 Value]);
END SCOPE;

As you can see, two of the members on the [Date Calc] dimension are overwritten by scoped assignments: [2 PPG] is overwritten with a previous period growth calculation and [3 YTD] is overwritten by a year-to-date calculation.

Here’s a query that includes a calculated measure defined in the WITH clause and returns two out of three of the members on the [Date Calc] dimension – but does not return the [3 YTD] calculation:

WITH
MEMBER [Measures].QueryCalc AS
[Measures].[Sales Amount] + 1

SELECT
{
[Measures].[Sales Amount],
[Measures].QueryCalc
}
*
{
[Date Calc].[Date Calc].&[1 Value],
[Date Calc].[Date Calc].&[2 PPG]
}
ON 0,
[Date].[Calendar].[Month].MEMBERS
ON 1
FROM
TEST

image

Running a Profiler trace as described in my previous post reveals that when this query is run, not only are the [Query Calc] and [2 PPG] calculations evaluated, but [3 YTD] is evaluated too:

image

It’s worth pointing out that this query was constructed deliberately to show a scenario where SSAS does decide to evaluate the [3 YTD] calculation, but in other cases it may decide otherwise. The reason it decides to do so here is due to a number of factors, including prefetching – see Jeffrey’s blog post here and the section on “Unexpected partition scans” here for some background information on this topic. Remember that in most cases prefetching is a good thing and is beneficial for performance, so if you see something like this happening in your cube you need to be sure that it’s actually causing you a performance problem before you try to prevent it.

If this is a problem for you there are a few things you can do. Rewriting your query to use subselects (if you have control over the MDX query that is being used) is one option:

WITH
MEMBER [Measures].QueryCalc AS
[Measures].[Sales Amount] + 1

SELECT
{
[Measures].[Sales Amount],
[Measures].QueryCalc
}
*
{
[Date Calc].[Date Calc].MEMBERS
}
ON 0,
[Date].[Calendar].[Month].MEMBERS
ON 1
FROM
(SELECT
{
[Date Calc].[Date Calc].&[1 Value],
[Date Calc].[Date Calc].&[2 PPG]
}
ON 0
FROM
TEST)
CELL PROPERTIES VALUE

Using the following connection string properties also works, because it turns off prefetching:

disable prefetch facts=true; cache ratio=1

…but as I said, this might hurt query performance in other ways.

Finally, as I said, using calculated members on your Date Tool dimension instead of the real members/scope statements approach will also work too. In my opinion this is the best solution since the problems with calculated member selection in Excel that caused problems for the calculated member Date Tool approach in the past were fixed a long time ago, and it will work even if you can’t change how your MDX queries are generated.


Finding Out (Approximately) How Long A Calculation Contributes To The Duration Of An MDX Query

$
0
0

In my last two blog posts (see here and here) I showed how to use the Calculation Evaluation and Calculation Evaluation Detailed Information trace events to work out which MDX calculations are evaluated when a query runs in Analysis Services Multidimensional. That’s very useful, but wouldn’t it be great if you could work out how long any single calculation contributes to the overall duration of a query? If you could, it would make performance tuning MDX calculations much easier.

While you can’t get an exact amount of time taken for each calculation, the good news is that it is possible to get a duration rounded to the next second if your calculation is evaluated in bulk mode.

Take a look at the following query:

WITH

MEMBER MEASURES.DAYRANK AS
RANK(
[Date].[Date].CURRENTMEMBER,
[Date].[Date].[Date].MEMBERS)-1

MEMBER MEASURES.HADSALE AS
IIF(
[Measures].[Internet Sales Amount]=0,
NULL,
MEASURES.DAYRANK)

MEMBER MEASURES.LASTSALERANK AS
MAX(
NULL:[Date].[Date].CURRENTMEMBER,
MEASURES.HADSALE)

MEMBER MEASURES.LASTSALE AS
([Measures].[Internet Sales Amount],
[Date].[Date].[Date].MEMBERS.ITEM(MEASURES.LASTSALERANK))

MEMBER MEASURES.SIMPLECALC AS
[Measures].[Internet Sales Amount] * 2

SELECT
HEAD([Customer].[Customer].[Customer].MEMBERS, 200)
*
{MEASURES.SIMPLECALC, MEASURES.LASTSALE}
ON 0,
[Date].[Date].[Date].MEMBERS
ON 1
FROM
[Adventure Works]

This query contains five calculated measures: the first four in the WITH clause, DAYRANK, HADSALE, LASTSALERANK and LASTSALE, are based on my approach for finding the last ever non-empty value for a measure across time; the final measure, SIMPLECALC, is as the name suggests a very simple calculation. On my laptop this query takes around 13 seconds to run on my laptop, on a warm Storage Engine cache. Why does it take so long? It’s clearly the calculations that are the problem, but which one(s)?

Luckily all of the calculations in this query are evaluated in bulk mode so, as I discussed in my last two posts, there is an event raised with:

Event Class = Calculation Evaluation Detailed Information

Event Subclass = 107 – RunEvalNode Finished Calculating Item

…for each of the calculations when they are evaluated. Unfortunately the Duration column for this event always shows 0, but there is a way to see approximately how long the calculation took by comparing the Start Time and Current Time columns in the trace.

The 107 – RunEvalNode event for the measure SIMPLECALC shows the same time for the Start Time and Current Time columns:

image

This indicates that the SIMPLECALC calculation is evaluated in under a second.

However, sequence of 107 – RunEvalNode events for the LASTSALE calculation shows something different:

image

There’s a gap of 7 seconds between the StartTime and the CurrentTime, and this indicates that the calculation took 7 seconds to evaluate. It’s a bit frustrating that there isn’t a way to get a more accurate duration here, but it’s still very clear which calculation is taking all the time. Even though the time for calculating LASTSALE includes the time taken for calculating LASTSALERANK, HADSALE and DAYRANK (all of which need to be calculated in order to calculation LASTSALE), the equivalent rows in the trace for these other calculations show they took under a second each. It’s only the logic inside LASTSALE itself that is slow – so that’s where any tuning needs to take place. Indeed, modifying the query to return LASTSALERANK instead of LASTSALE makes the query faster by around 6 seconds, supporting this conclusion.

If you’re curious about what the other 6 seconds of the query execution time is taken up by, it seems like it’s serialisation of the results – something I blogged about here. The query returns a cellset with 400*1190=476000 cells in, and SSAS doesn’t cope well with queries that return a large amount of data.


Viewing all 104 articles
Browse latest View live