Export to SQL Server Using Python Spark
Overview
This guide provides detailed instructions for setting up and using the Delta Sharing integration package on a Windows operation system to seamlessly integrate data into your workflows with Procore Analytics. The package supports multiple execution options, allowing you to choose your desired configuration and integration method.
Prerequisites
Ensure you have the following before proceeding:
- Procore Analytics 2.0 SKU
- Delta Sharing pofile file:
Obtain your *.share file containing Delta Sharing credentials. For convenience, copy it into the package directory. - Python Environment: Install Python 3 and pip on your system.
- Download Python.
- Alternatively, use the Microsoft Store.
Steps
- Prepare the Package
- Install Dependencies
- Generate Configuration
- Configure Cron Jobs and Immediate Execution
- Execution and Maintenance
Prepare the Package
- Create a new file named config.share with your Delta Share credentials in JSON format.
{
"shareCredentialsVersion": 1,
"bearerToken": "xxxxxxxxxxxxx",
"endpoint": "https://nvirginia.cloud.databricks.c...astores/xxxxxx"
}
- Get required fields.
Note: These details can be obtained from the Procore Analytics web application.- ShareCredentialsVersion: Version number (currently 1).
- BearerToken: Your Delta Share access token.
- Endpoint: Your Delta Share endpoint URL.
- Download and extract the package.
- Unzip the package to a directory of your choice.
- Copy the *.share Delta Sharing profile file into the package directory for easy access.
Install Dependencies
- Open a terminal in the package directory.
- Run the following command to install the dependencies:
- pip install -r requirements.txt
Generate Configuration
- Generate the config.yaml file by running python user_exp.py:
This script helps to generate the config.yaml file that contains necessary credentials and settings. - When configuring the data source, you'll be asked to provide:
- List of tables (comma-separated).
- Leave blank to sync all tables.
Example: `table1, table2, table3`. - Path to your `config.share` file.
- For the first time, you will provide your credentials like Delta Share source configuration location, tables, database, host and etc.
Note: Afterwards, you may reuse or update the config manually or by the running python user_exp.py again.
Configure Cron Jobs and Immediate Execution (Optional)
- Decide whether to set up a cron job for automatic execution.
- Provide a cron schedule:
- Format: * * * * * (minute, hour, day-of-month, month, day-of-week).
- Example for daily execution at 2 AM: 0 2 * * *
- To check scheduling logs, the file 'procore_scheduling.log' will be created as soon as scheduling is set up.
You can also check scheduling by running in terminal command:
For Linux and MacOs:
To edit/delete - edit scheduling cron by using:
```bash
EDITOR=nano crontab -e
```
- After running the command above, you should see something similar to:
- 2 * * * * /Users/your_user/snowflake/venv/bin/python /Users/your_user/snowflake/sql_server_python/connection_config.py 2>&1 | while read line; do echo "$(date) - $line"; done >> /Users/your_user/snowflake/sql_server_python/procore_scheduling.log # procore-data-import
- You also can adjust schedule cron or delete the whole line to stop it running by schedule.
For Windows:
- Check the schedule task is created:
```powershell
schtasks /query /tn "ProcoreDeltaShareScheduling" /fo LIST /v
``` - To edit/delete - scheduling task:
Open the Task Scheduler:- Press Win + R, type taskschd.msc, and press Enter.
- Navigate to the scheduled tasks.
- In the left pane, expand the Task Scheduler Library.
- Look for the folder where your task is saved (e.g., Task Scheduler Library or a custom folder).
- Find your task:
- Look for the task name ProcoreDeltaShareScheduling.
- Click on it to view its details in the bottom pane.
- Verify its schedule:
- Check the Triggers tab to see when the task is set to run.
- Check the History tab to confirm recent runs.
- To delete task:
- Delete task from the GUI.
Immediate Execution question:
- Option to run script for copying data immediately after configuration.
- After generating the config.yaml, the CLI is ready to be run anytime independently, by running script for copying data, depending on your package. See examples below:
python delta_share_to_azure_panda.py
OR
python delta_share_to_sql_spark.py
OR
python delta_share_to_azure_dfs_spark.py
Execution and Maintenance
Common Issues and Solutions
- Cron Job Setup:
- Ensure system permissions are correctly configured.
- Check system logs if the job fails to run.
- Verify the script delta_share_to_azure_panda.py has execute permissions.
- Configuration File:
- Ensure the file config.yaml is in the same directory as the script.
- Backup the file before making changes.
Support
For additional help:
- Review script logs for detailed error messages.
- Double-check the config.yaml file for misconfigurations.
- Contact your system administrator for permission-related issues.
- Reach out to Procore support for issues related to Delta Share access.
- Review log for failed tables: failed_tables.log.
Notes
- Always backup your configuration files before making changes.
- Test new configurations in a non-production environment to prevent disruptions.