3 Things: Data Analytics Highlights from August 2024

6th September 2024 . By Michael A

This month's '3 Things' blog post includes topics ranging from the introduction of a new feature called 'AI Skills' in Fabric to energy-efficiency benchmark comparisons between Polars and Pandas.

Read on as we highlight three things for each of the four technology areas that you should be aware of from last month.

Power BI


  • The Tabular Model Definition Language (TMDL) is now generally available. This is a strong indicator of its stability and means it's only a matter of time before this developer-friendly experience reaches a similar milestone in Power BI Project Files (PBIP) and Fabric Git integration. For Fabric Git integration specifically, this mean you will begin to see TMDL replacing Tabular Model Scripting Language (TMSL) as the semantic model definition language when committing and pushing your semantic model changes to Git. Learn more.

  • The Power BI paginated report authoring experience on the web continues to expand in features. The new authoring experience introduces views for editing and previewing your paginated report designs. Along with this comes the ability to add headers, footers, and parameters to your reports in addition to textboxes and images. Learn more.

  • Dynamic per-recipient subscriptions are generally available. If you were waiting for this milestone before adopting this feature, you have the green light. There is often a business need for individuals to have a pre-filtered version of a report pushed to their inbox rather than them having to pull it themselves. If your business requires something along these lines, this is a great feature to start exploring. Learn more.



Microsoft Fabric


  • The Microsoft Fabric team authored and published guidance on security best practices for OneLake. This new Microsoft Learn documentation covers topics including least privilege, secure by workload, and secure by use case. Familiarising yourself with its content will help you with considering and applying the necessary contols in Fabric to protect your data in OneLake. Learn more.

  • The Copilot experiences in Fabric are helpful in so many ways, but they can sometimes fall short due to a lack of business context and the nuances of your data. A new Microsoft Fabric feature called 'AI Skills' aims to plug this gap. With it, you can create conversational Q&A solutions that can use your data, plus some contextual information you provide up-front, to correctly answer ad-hoc analytical questions from your business users. Learn more.

  • Microsoft Fabric has a number of capabilities that can simplify CI/CD practices and processes. A recent article from the Fabric team explores how you can combine Fabric's Git integration and deployment pipelines to promote your changes from your Fabric development environemnts all the way through to production. It also includes a step-by-step guide on what you need to do to start putting it into practice now. Learn more.



Azure Analytics and AI


  • GPT-4o (the 'o' is for 'omni') combines real-time audio, video, and text understanding into a single model. This can enable more human-like behaviours in your AI solutions. And now you can create your own fine tuned versions of this model in the Azure OpenAI service. This capability is in preview so this is a great time to start exploring how it can transform processes in your business. Learn more.

  • When it comes to customising the behaviour of your LLM solutions, you typically have four choices: prompt engineering, retrieval augmented generation (RAG), fine-tuning, and pre-training the model. A recent article from the Azure AI services team deep-dives into prompt engineering. It starts by introducing some of its fundamental concepts and then explains why it's so important. Learn more.

  • There's a tonne of insightful content from the Databricks Data + AI Summit that took place in June 2024. A recent post on the Databricks blog highlights some of the top sessions ranging from an introduction to the Databricks Intelligence Platform to exploring how you can use Unity Catalog to bring data intelligence to healthcare. Learn more.



Open-Source Analytics


  • Apache Airflow 2.10.0 arrived, and with it comes a number of new features including 'Multiple Executor Configuration' (previously known as 'Hybrid Executors') which removes a limitation where you had to choose a single executor per environment, a new 'DatasetAlias' class that makes it possible for you to define complex dependencies for DAG executions based on dynamic metadata, and the retention of task instance history even after a task instance has been retired. There are just a few of the welcome improvements. Learn more.

  • Ibis, the portable dataframe library for Python, is dropping its backend support for Pandas and Dask in version 10.0. At first glance the announcement sounds bleak but, when you get past the headline, you'll quickly find it's because Ibis has a DuckDB backend that already does such an excellent job of querying Pandas DataFrames. Learn more.

  • Given that data processing in analytics and AI workloads is one of the most energy intensive tasks in computing, it's no surprise that there is a heavy focus on energy efficiency across the industry. In a recent article, the team at Polars compared the Polars with Pandas in this context and found that Polars is up to eight times more energy efficient than Pandas. Learn more.



Did You Find This Useful?

Get notified when we post something new by following us on X and LinkedIn.