Apache Spark Connector
The high-performance Apache Spark Connector provides read/write capabilities within your application, allowing you to perform numerous Apache Spark operations with minimal coding or even no-coding at all. Integrate Apache Spark data into applications such as SSIS, SQL Server, any ODBC-compatible application, or even directly within a programming language using this connector.
Integrate Apache Spark with these applications
SQL examples for Apache Spark Connector
Use these example Apache Spark SQL queries within SSIS, SQL Server or any ODBC-compatible application:
Select specific employees
Selects specific columns for employees in the Sales department. A basic filtering query.
-- Basic SELECT with a WHERE clause
SELECT
id,
name,
salary
FROM employees
WHERE department = 'Sales';
Rank employees by salary (Window Function)
Uses the ROW_NUMBER() window function to assign a rank to employees based on salary (highest first) within each department.
-- Using a window function for ranking
SELECT
id,
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank_in_dept
FROM employees;
Extract data from JSON string
Uses the built-in get_json_object function to extract the 'city' field from a JSON string stored in the 'user_data' column.
-- Assumes a table 'user_activity' with a STRING column 'user_data' containing JSON like:
-- {"name": "Alice", "address": {"street": "123 Main", "city": "Anytown"}}
SELECT
user_id,
get_json_object(user_data, '$.address.city') AS city
FROM user_activity
WHERE user_data IS NOT NULL; -- Example filter
Explode array elements (lateral view)
Uses LATERAL VIEW explode() to transform array elements into separate rows, finding users interested in 'hiking'. Assumes 'user_profiles' table has 'interests' ARRAY<STRING> column.
-- Using LATERAL VIEW to query array elements
SELECT DISTINCT
up.user_id,
up.name
FROM user_profiles up
LATERAL VIEW explode(interests) interest_table AS single_interest
WHERE single_interest = 'hiking';
Extract browser from User Agent (Regex)
Uses the regexp_extract function to parse a browser name (Firefox, Chrome, Safari, or MSIE) from a 'user_agent' string column in 'web_logs'.
-- Using a regular expression function
SELECT
user_agent,
regexp_extract(user_agent, '(Firefox|Chrome|Safari|MSIE)', 1) AS browser
FROM web_logs
LIMIT 100; -- Example limit