Skip to main content
Procore

Export to Fabric Lakehouse Using Fabric Notebooks

Overview

Using Data Factory in Microsoft Fabric with Delta Sharing enables seamless integration and processing of shared Delta tables as part of your analytics workflows with Procore Analytics 2.0. Delta Sharing is an open protocol for secure data sharing, allowing collaboration across organizations without duplicating data.

This guide walks you through the steps to set up and use Data Factory in Fabric with Delta Sharing, utilizing Notebooks for processing and exporting data to a Lakehouse.

Prerequisites

  • Procore Analytics 2.0 SKU
  • Delta Sharing Credentials:
    • Access to Delta Sharing credentials provided by a data provider.
    • A sharing profile file (config.share) containing:
      • Endpoint URL (Delta Sharing Server URL).
      • Access Token (Bearer token for secure data access).
  • Create your config.yaml file with specific credentials.
  • Microsoft Fabric Environment:
    • A Microsoft Fabric tenant account with an active subscription.
    • A Fabric-enabled Workspace.
  • Packages and Scripts:
    • Download the fabric-lakehouse package. The directory should include:
      • ds_to_lakehouse.py: Notebook code.
      • readme.md: Instructions.

Steps

Set Up Configuration

  1. Create the config.yaml file and define the configuration in the following structure
    source_config:
        config_path: path/to/your/delta-sharing-credentials-file.share
    tables: # Optional - Leave empty to process all tables
        - table_name1
        - table_name2
    target_config:
        lakehouse_path: path/to/your/fabric/lakehouse/Tables/ # Path to the Fabric Lakehouse

Set Up Your Lakehouse

  1. Open your Microsoft Fabric workspace.
  2. Navigate to your Lakehouse and click Open Notebook, then New Notebook.
  3. If you don’t know the value in config.yaml#lakehouse_path, you can copy it from the screen.
  4. Click the the ellipsis on Files, and select Copy ABFS path:
    clipboard_e5dd266d8c6a622cceb30dd893a9106d7.png
    clipboard_e08827dbe6aeae6ac8f2bec7f5c758325.png
    clipboard_e390331535300f7f89fd0a91c9a3adebe.png

3. Copy code of ds_to_lakehouse.py and paste into notebook window (Pyspark Python):

clipboard_e5b5d27813f9bebdfbe7dae73291cb3ae.png

The next step is to upload your own config.yaml and config.share into the Resources folder of the Lakehouse. You can create your own directory or use a builtin directory (already created for resources by Lakehouse):


clipboard_eb5765c4bbf3ac93bd7c51ba6373f8049.png
clipboard_e86d913b62d37d237524537676680e4f3.png

The example below shows a standard builtin directory for a config.yaml file .
Note: Make sure you upload both files on the same level and for the property config_path:

clipboard_e8ca378a5a0937b4b675d00adc2ec965f.png

4. Check the code of the notebook, lines 170-175.
The example below shows the necessary line changes:

config_path = "./env/config.yaml"

to 

config_path = "./builtin/config.yaml"

Since the files are in a builtin folder and not in a custom env, make sure to monitor your own structure of the files. You can upload them into different folders, but in such cases, update the code of the notebook to find config.yaml file properly.
 

clipboard_e44e5ad3be4a06929aba88716b0fdaa23.png

5. Click Run cell:


clipboard_e381c59ce2e4e9471efc0e9e4cc663b6d.png

Validation

  • Once the job completes, verify the data has been copied successfully to your Lakehouse.
  • Check the specified tables and ensure the data matches the shared Delta tables.
  • Wait until the job is finished, it should copy all the data.