-
Notifications
You must be signed in to change notification settings - Fork 21.8k
Update apache-spark-sql-connector.md #127981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@aukponmwan : Thanks for your contribution! The author(s) and reviewer(s) have been notified to review your proposed change. |
|
Learn Build status updates of commit 0508e1f: 💡 Validation status: suggestions
articles/synapse-analytics/spark/data-sources/apache-spark-sql-connector.md
For more details, please refer to the build report. Note: Your PR may contain errors or warnings or suggestions unrelated to the files you changed. This happens when external dependencies like GitHub alias, Microsoft alias, cross repo links are updated. Please use these instructions to resolve them. |
PRMerger Results
|
|
Can you review the proposed changes? Important: When the changes are ready for publication, adding a #label:"aq-pr-triaged" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces the Apache Spark SQL Server connector documentation with updated guidance for the new Spark connector for SQL databases (Preview). The update reflects a modernized connector preinstalled in Synapse Spark 3.5 runtime with enhanced capabilities.
Key changes:
- Introduces the preview Spark connector for SQL databases with support for multiple SQL database types (Azure SQL, SQL Managed Instance, SQL Server on Azure VM, and Fabric SQL databases)
- Updates authentication guidance to emphasize integrated Microsoft Entra ID authentication as the default method
- Replaces legacy code examples with modern PySpark and Scala examples using the
.mssql()method
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .option("hostNameInCertificate", "*.database.windows.net") \ | ||
| .load() | ||
| ```python | ||
| import com.microsoft.sqlserver.jdbc.spark |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import statement uses Java/Scala syntax in a PySpark code block. Python doesn't use this import syntax. Remove this line or use the correct Python import if one is required for the connector.
| import com.microsoft.sqlserver.jdbc.spark |
| .option("encrypt", "true") \ | ||
| .option("hostNameInCertificate", "*.database.windows.net") \ | ||
| .load() | ||
| import com.microsoft.sqlserver.jdbc.spark |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import statement uses Java/Scala syntax in a PySpark code block. Python doesn't use this import syntax. Remove this line or use the correct Python import if one is required for the connector.
| import com.microsoft.sqlserver.jdbc.spark |
| # [User/Password](#tab/userandpassword) | ||
|
|
||
| ```python | ||
| import com.microsoft.sqlserver.jdbc.spark |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This import statement uses Java/Scala syntax in a PySpark code block. Python doesn't use this import syntax. Remove this line or use the correct Python import if one is required for the connector.
| import com.microsoft.sqlserver.jdbc.spark |
| import org.apache.spark.sql.types._ | ||
| val url = "jdbc:sqlserver://<server>:<port>;database=<database>;" | ||
| val row_data = Seq( | ||
| Row("Alice", 2), |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Age value for 'Alice' is inconsistent with the PySpark example (line 87) where it's 1. This inconsistency could confuse readers trying to understand the examples.
| Row("Alice", 2), | |
| Row("Alice", 1), |
| df.write.mode("overwrite").option("url", url).option("accesstoken", token).mssql("dbo.publicExample") | ||
| spark.read.option("accesstoken", token).mssql("dbo.publicExample").show() |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The option name 'accesstoken' is inconsistent with typical casing conventions. Consider using 'accessToken' (camelCase) to match the option naming pattern shown in the legacy code examples and common convention.
| df.write.mode("overwrite").option("url", url).option("accesstoken", token).mssql("dbo.publicExample") | |
| spark.read.option("accesstoken", token).mssql("dbo.publicExample").show() | |
| df.write.mode("overwrite").option("url", url).option("accessToken", token).mssql("dbo.publicExample") | |
| spark.read.option("accessToken", token).mssql("dbo.publicExample").show() |
|
#sign-off |
|
Invalid command: '#sign-off'. Only the assigned author of one or more file in this PR can sign off. @eric-urban |
Replacing the SQL Server Connector for Apache Spark documentation.