3 Things: Data Analytics Highlights from September 2024

11th October 2024 . By Michael A

This month's '3 Things' blog post includes topics ranging from the availability of a new Azure AI model that can reason (i.e. OpenAI o1) to the benefits of GPU-acceleration in Polars DataFrames.

Read on as we highlight three things for each of the four technology areas that you should be aware of from last month.

Power BI


  • Tabular Model Definition Language (TMDL) support has been added to the ALM Toolkit. This means you can more easily compare differences between two TMDL models, cherry-pick changes to deploy, re-use common objects across models, and more. It's great to see TMDL becoming the go-to format for Power BI semantic models. Learn more.

  • The Copilot experiences across Fabric continue to expand and improve. There are two new enhancements to how Copilot can help you with creating Power BI reports: (1) It can work with you in a collaborative way to ensure it understands what you're trying to achieve with a report, and (2) increase transparency by outlining the changes it makes to improve your report. These enhancements increase the likelihood that the reports produced are fit for purpose. Learn more.

  • It's common in organisations to have a set of metrics that should ideally be reused across teams and departments. Power BI semantic models partially solve this problem by centralising metric definitions in the form of measures. However, the metrics are not are not readily discoverable outside of a semantic model. This is a problem that 'Metrics sets', a new feature added through Microsoft Fabric, addresses. Metrics sets make it possible for you to create curated collections of metrics that are more discoverable and easier to consume across your business. Learn more.



Microsoft Fabric


  • The first Fabric community conference (FabCon) in Europe took place towards the end of last month, and a huge number of announcements came along with it. There's a new Microsoft Fabric SKU cost calculator, Fabric API support for Service Principals, a new T-SQL Notebook experience, the ability to share Spark sessions across Notebook Activities in Data Pipelines, the general availability of Snowflake Mirroring, a public preview of the a new 'Copy Job' feature, improved Copilot experiences, and the list goes on. Get up to speed.Learn more.

  • The AI Skill capability in Fabric was introduced as a public preview about two months ago. It enables you to quickly create custom Q&A solutions powered by Large Language Model (LLM) queries. By augmenting LLMs with your data and business context, it can answer nuanced questions in comparable ways to a data analyst. A recent article from the Fabric team explores how Fabric AI skills can be extended with Azure OpenAI to answer questions from members of your business and go one step further by supplimenting its responses with with additional, contextually relevant information. Learn more.

  • Many people mistakenly compare Microsoft Fabric with Databricks as mutually exclusive options. Why settle for one when you can have the best of both? There had previously been somewhat 'hacky' solutions for getting Databricks Unity Catalog (UC) tables to sync with and be usable from Fabric. But now, there is a proper solution in the form of the 'Mirrored Azure Databricks Catalog' feature which is now in public preview. This means you can use the many Fabric experiences with your Databricks Unity Catalog tables with minimal friction and no data movement. Learn more.



Azure Analytics and AI


  • Retrieval Augmented Generation, or RAG, is the preferred choice for making LLM solutions, such as chatbots, contextually aware. A recent article on the Databricks blog explores how this can be implemented by combining Databricks with a vector database called Pinecone. The article covers steps from the ingestion of raw data all the way through to creating, deploying, and testing a chatbot. Even if you're not using Databricks, you can still learn something from the general approach. Learn more.

  • Adopting AI and incorporating it into business processes comes with a number of challenges. These include the management of decentralised AI solutions, inconsistent governance and security, and difficulties in tracking usage and ensuring internal departments are charged back for their AI model consumption. The Azure AI team published an article about the 'AI Hub Gateway Landing Zone', a solution accelerator that can help you address these challenges by using it as a starting point or as a reference design when designing your AI solution architecture. Learn more.

  • OpenAI recently introduced o1, a new AI model with the ability to reason and solve complex problems. This includes code generation, brainstorming, document comparison, and workflow management. You can now deploy solutions based on o1 with Azure OpenAI. Imagine the possibilities. Learn more.



Open-Source Analytics


  • One of many invaluable features of Delta Lake is the ability to instantaneously create copies of your data. A recent article on the Delta Lake blog explores how you can use this cloning capability to quickly create table copies for purposes such as experimentation and disaster recovery. It also demonstrates how you can clone parquet, Apache Iceberg, and Unity Catalog tables, which you may not have known was even an option. Learn more.

  • GPU acceleration in Polars can boost performance by up to 13x compared to Polars on CPU-bound queries. In a recent blog post, the Polars team covered how to get started, their design considerations, and the impact that GPU-accelerated Polars could have on workloads that benefit from this performance optimisation. There are some code examples and interactive notebooks available for you to download and try out yourself. Learn more.

  • Did you know DuckDB has full support for ACID transactions? In the world of analytics and AI, where change is constant, atomicity, consistency, isolation, and durability remain desirable data layer characteristics. The DuckDB team describe what this looks like in DuckDB and why it's relevant in OLAP solutions when it comes to concurrent data ingestion and reporting, and rolling back incorrect data transformations. Learn more.



Did You Find This Useful?

Get notified when we post something new by following us on X and LinkedIn.