Data Transformation Overview
The estimated time to complete this lab is 40 minutes.
In this lab, we will cover what is under the hood then you will connect to your data and transform it into a usable data model.
Creating reports in Power BI is driven by a need to either understand some raw data or to solve a problem or both. Sometimes it isn’t evident you have a problem until you see the data and sometimes you have a problem and you don’t know how to solve it but the answer lies in the data.
The data that we will be using in this Power BI training is based on the locations of where crocodiles are captured in the Northern Territory as defined by broad geographic areas referred to as Crocodile Capture Zones. The coordinate location of traps is not provided to the public.
The zones may be viewed using NR Maps. See folder: Parks and Culture > Wildlife. https://nrmaps.nt.gov.au/nrmaps.html
Zones were generally developed as follows;
- Borroloola: 100 km buffer from Borroloola township
- Katherine Zone 1: 5 km buffer from the Katherine River
- Litchfield: an extraction from reserves in GEODATA.
- Management Zone: originated in 2009 with minor modifications over the years.
- Nhulunbuy: 50 km buffer from Nhulunbuy township.
- Shoal Bay, Upper Harbour, Lower Harbour: straight line boundaries were hand-drawn
The spatial polygons are provided in an ESRI shapefile format in Geographics with datum GDA94. The capture table is provided in a Microsoft Office Excel format.
For more information about crocodile captures in the Northern Territory,
visit nt.gov.au/emergency/community-safety/crocodile-capture-and-management/map-of- crocodile-captures
This data describes the daily captures within the Northern Territory Crocodile Capture Zones.
Custodian: Parks and Wildlife Commission of the Northern Territory
Agency: Department of Environment, Parks and Water Security. Northern Territory Government Metadata: http://www.ntlis.nt.gov.au/metadata/export_data?type=html&metadata_id=C67DB1A044D43F View: https://nrmaps.nt.gov.au/nrmaps.html
The capture zones can be viewed in NR Maps. See folder: Parks and Culture > Wildlife.
This dataset features date, region and other crocodile capture components making perfect real data for training materials.
Data description:
ORDER | FIELD | DATA FORM | DESCRIPTION |
1 | OBJECTID | Unique identifier for each record | |
2 | DATE_CAPTURED | date | Date of capture |
3 | SCIENTIFIC_NAME | text x 50 | Scientific name |
4 | COMMON_NAME | text x 50 | Common name |
5 | CAPTURE_METHOD | text x 50 | Method of capture |
6 | TOTAL_LENGTH | float | Length (cm) between the tip of snout and the tip of tail |
7 | HEAD_LENGTH | float | Length (cm) between the tip of snout and the end of dorsal cranial platform |
8 | SEX | float | Female or male |
9 | SNOUT_VENT_LENGTH | text x 1 | Length (cm) between the tip of snout and the beginning of vent (cloaca) |
10 | TAIL_COMPLETE | text x 1 | Whether the tail is complete or not |
12 | REGION | text x 50 | The region for management zones are Katherine, Nhulunbuy, Darwin, Borroloola and Other/Unknown |
13 | ZONE_NAME | text x 50 | The capture zones are divided into 9 zones: Upper Harbour, Lower Harbour, Shoal Bay, Management Zone, Katherine, Borroloola, Litchfield, Nhulunbuy and Outside Management Zone. |
14 | ZONE_CODE | number | The code for capture zones |
15 | GROUP_NAME | text x 50 | The location group of traps inside the capture zones |
16 | LOCATION | text x 50 | The location area of traps inside the location group |
Prep 1: Fuel Cap Release & Other Options
Power BI Desktop offers a comprehensive range of options and settings that allow users to customise their experience and optimize various aspects of their data analysis and reporting. Let’s explore some of these sections in more detail:
About Power BI Desktop:
File Menu: Go to the File menu and select About. Note the Version that is specified, the edition (32 or 64 bit) and the month and year. You need to know this when weird things happen and someone from either Corporate or Microsoft support will require this info.
Preview Features:
Preview Features: This section allows you to enable or disable experimental or beta features that are still under development. It is only offered with the Cloud edition of Power BI Desktop not the On Premises or Report Server edition. It provides early access to new functionalities and enhancements, but be cautious as they may be less stable than the stable release features.
Data Load:
Data Load: Here, you can configure settings related to data loading behaviour. It includes options such as enabling parallel loading to speed up data retrieval, defining data privacy levels to control data source access, and adjusting the data cache size to optimise performance.
Power Query Editor:
Global: This section provides options to control the behaviour of the Power Query Editor, the tool used for data transformation and cleansing. Users can adjust settings such as the number of undo steps, data type detection, and handling of column name changes.
Privacy: Users can define privacy levels for each data source, specifying whether data from different sources can be combined or accessed by other queries.
Diagnostics: This setting allows users to enable diagnostic logging for query performance analysis and troubleshooting.
Privacy:
Data Privacy: This section provides options for managing data privacy settings. You can specify the level of data privacy for each data source, control privacy levels for combining data from multiple sources, and enable Enhanced Data Privacy (EDP) mode for increased data security.
Security:
Data Security: You can configure settings related to data security, such as defining privacy levels, encrypting connections, and enabling/disable Fast Combine, which allows Power BI Desktop to optimise data loading by bypassing certain privacy checks.
File Security: This option allows users to set a password to protect their Power BI Desktop files, restricting unauthorised access.
Save and Recover:
AutoRecover: You can enable and adjust automatic recovery to save their Power BI Desktop file at regular intervals, helping to prevent data loss in case of unexpected application crashes or system failures.
File Path: This setting allows you to specify the default file path for saving their Power BI Desktop files.
Report Settings:
Current File: This section contains settings specific to the current report. Users can customise the interaction behaviour of visuals, such as enabling cross-highlighting or enabling preview features like Smart Narratives or Smart Guides.
These options and settings provide users with fine-grained control over their Power BI Desktop environment, allowing them to optimise data loading, ensure data privacy and security, customise the behaviour of the Power Query Editor, and configure report-specific settings. By exploring and utilising these options effectively, users can tailor their Power BI Desktop experience to their specific needs and enhance their data analysis and reporting workflows.
Power BI Source Control:
Save as PBIP : If you’re working on a new project or you’ve opened an existing Power BI Desktop file (pbix), you can save your work as a Power BI project file (pbip). Let’s take a closer look at what you see in your project’s root folder:
<project name>.Dataset
A collection of files and folders that represent a Power BI dataset. It contains some of the most important files you’re likely to work on, like model.bim. To learn more about the files and subfolders and files in here, see Project Dataset folder.
<project name>.Report
A collection of files and folders that represent a Power BI report. To learn more about the files and subfolders and files in here, see Project report folder.
.gitIgnore
Specifies intentionally untracked files Git should ignore. Power BI Desktop creates the .gitignore file in the root folder when saving if it doesn’t already exist.
Dataset and report subfolders each have default git ignored files specified in .gitIgnore:
- Dataset
.pbi\localSettings.json
.pbi\cache.abf
- Report
.pbi\localSettings.json
<project name>.pbip
The PBIP file contains a pointer to a report folder, opening a PBIP opens the targeted report and model for authoring.: