SQL Query Examples

Content

Driver SQL Syntax

SELECT / INTO Syntax
SELECT from Temp Table Syntax
Multiple SQL Statments Syntax
Comments Syntax
UNION / UNIAN ALL Syntax
Stored Procedure Syntax

SQL Examples

1. Basic Read (API, File, Embedded) Query Single File
2. Basic Read (API, File, Embedded) Query Multiple Files
3. Basic Read (API, File, Embedded) Query direct JSON string (embedded inside query)
4. Metadata Query with static Metadata (META option / Override Mode)
5. Basic Read (API, File, Embedded) Escape Sequence (String Literals with new lines, tabs)
6. Compression (Zip,GZip) Query Large Compressed File (From URL or Local disk)
7. Functions String Manipulation
8. Functions Control flow
9. Functions Date/Time
10. Functions Math
11. Functions String Manipulation
12. Functions Regular Expression
13. Functions Using Placeholder Functions anywhere in SQL
14. Functions Use Function(s) without FROM clause
15. Advanced Options Custom Search / Replace (Remove special characters)
16. Hierarchy Filter / Flattening Flatten Hierarchy with Filter (JSONPath / Regular Expression / Regex)
17. Language Features Pivot Value (Columns to Rows)
18. Functions JSON
19. Functions JSON (Flattening un-collapsed JSON)
20. Array Transformation Query Simple 2D Array
21. Array Transformation Query Simple 2D Array (With Missing Columns)
22. Array Transformation Query Complex 2D Array
23. Performance Handling Large Dataset / Avoid OutOfMemory Error (Using Disk for Query Engine)
24. Language Features DateTime field Compare in WHERE / ORDER BY
25. Language Features Group By / Limit / Order By
26. Language Features Case Statement
27. Language Features SELECT INTO (Temporary Table) / Multi Statements
28. Language - JOIN Statement using SELECT INTO / Multi Statements
29. Language Features UNION ALL / UNION Statement
30. Language Features Basic SQL queries

Driver SQL Syntax

On this page you will find various query examples for Amazon S3 JSON ODBC Driver / Data Gateway Connector. This driver offers highly flexible Client Size SQL Query engine which runs on client side to offer rich query language like a traditional RDBMS. Keep in mind that its not same as traditional database SQL so many features might not be supported also this engine is invoked on client side so there might be some performance issue if you have very large dataset. In below section you can see high level SQL Grammer / Syntax for writing SQL query for this driver.

SELECT Syntax / INTO

Below section shows use of SELECT statement. It can output data or store data into temporary table (must be named with prefix # (i.e. #temp_table) ).


SELECT 
 [DISTINCT] [* | [ expression [[AS] column_name_alias] [, ...] ]
 [INTO #temp_table_name]
 [FROM table_name]
 [WHERE condition [, ...] ]
 [GROUP BY expression [, ...] ]
 [HAVING condition [, ...] ]	
 [ORDER BY expression [ASC | DESC] [, ...] ]
 [LIMIT row_count] 
 [WITH (option_name=option_value] [, ...]) ] [;]
 

option_name:=IDENTIFIER
option_value:= NUMBER | STRING

Select using Temp Table

This driver supports storing data into temporary table and then using it later in the same script. You can issue multiple SQL statements in the same script seperated by semicolon (;). For output only last Non-SELECT INTO statement is used. When you query #temp_table you get support for INNER / OUTER JOIN statements.


 SELECT [DISTINCT] [* | [ expression [[AS] column_name_alias] [, ...] ] 
 FROM #temp_table_name
 [[INNER | OUTER | CROSS] JOIN #temp_table_name [alias] ON condition 
   [join_statement] 
   [join_statement] 
   ... 
 ]
 [WHERE condition [, ...] ]
 [GROUP BY expression [, ...] ]
 [HAVING condition [, ...] ]	
 [ORDER BY expression [ASC | DESC] [, ...] ]
 [LIMIT row_count]

Multiple SQL Statments

This driver supports executing multiple statements in a single command. You can issue multiple SQL statements in the same script seperated by semicolon (;). For output only last Non-SELECT INTO statement is used. When you query #temp_table you get support for INNER / OUTER JOIN statements.


select_into_temp; 
[select_into_temp;] 
[select_into_temp;]
...
select_from_temp_table;

Comment Syntax

ZappySys Driver supports single line comment and block comments anywhere in SQL


--some comment in single line 
|
/* some block comment */

UNION / UNION ALL Syntax

You can use UNION or UNION ALL key word to output multiple temp tables with similar structure in a single resultset. UNION ALL keyword doesnt apply DISTINCT operation to eliminate duplicate rows so its faster.


select_from_temp_table_1
   UNION | UNION ALL
select_from_temp_table_2
  [UNION | UNION ALL]
select_from_temp_table_3	
...
...

Stored Procedure Syntax

This driver supports writing Stored Procedure. This is useful when writing reusable parameterized SQL for ease of use. Rather than calling long SQL from application you can call short name with parameters to invoke same logic. Stored procedure support multi-statements using TEMP tables (i.e. SELECT ... INTO #mytemptable clause). To invoke Stored Proc use EXEC keyword before its name. See below Syntax.
CREATE PROCEDURE


CREATE PROCEDURE proc_name 
  [@parameter_name [= 'Some Default Value']  ] 
  [@parameter_name [= 'Some Default Value']  ] 
  ...
  ...
AS 
  sql-statement; 
 [sql-statement | sql-temp-table-usage] ;
 [sql-statement | sql-temp-table-usage] ;
 ...
 ...

Executing PROCEDURE

EXEC proc_name [value1] [,value2] ... [,valueN]

Deleting PROCEDURE

DROP PROCEDURE proc_name

SQL Examples

Basic Read (API, File, Embedded) Query Single File

This example shows how to query single JSON file (For multiple file use wildcard (i.e. *.json)). This example selects root level hierarchy (i.e $ as table name). Use table selector from toolbar to exact data from different hierarchy (i.e table). In order to see table list you must populate File/URL and other required connection parameters on the parameter grid.


SELECT * FROM $ WITH (SRC='zappysys-public-bucket/cust-1.json')

Basic Read (API, File, Embedded) Query Multiple Files

This example shows how to query multiple JSON files using wildcard (i.e. *.json). This example selects root level hierarchy (i.e $ as table name). Use table selector from toolbar to exact data from different hierarchy (i.e table).

SELECT * FROM $ 
WITH (
    SRC='zappysys-public-bucket/cust*-?.json'
    --,RECURSIVE='True' --Include files from sub folder
)

Basic Read (API, File, Embedded) Query direct JSON string (embedded inside query)

This example shows how to query direct JSON string embedded inside query. This is useful for testing purpose. In real world this may not be the case because data may be coming from File or URL (API call). This example selects root level hierarchy (i.e $ as table name). Use table selector from toolbar to exact data from different hierarchy (i.e table). In order to see table list you must populate File/URL and other required connection parameters on the parameter grid.

SELECT * FROM rows 
WITH 
(DATA='
{          
  rows : [
        {id:1, name: "AAA"}, 
        {id:2, name: "BBB"}, 
        {id:3, name: "CCC"}
  ]
}'
)

Metadata Query with static Metadata (META option / Override Mode)

This example shows how to supply custom Metadata rather than relying dynamic metadata detection (based on 300 rows scanning). If you run query you can click on [View Metadata] button to generate default metadata. If you dont want to supply all columns in meta clause then use @OverrideMode:1 along with column list you want to alter. Possible Types for META is => <none>, String, Int64, Long, Int, Int32, Short, Byte, Decimal, Double, Float, DateTime, Date, Boolean

SELECT * FROM rows 
WITH 
(DATA='
{          
  rows : [
        {id:1, checknumber:"0001", name: "AAA"}, 
        {id:2, checknumber:"0002", name: "BBB"}, 
        {id:3, checknumber:"0003", name: "CCC"}
  ]
}'
--//Method-1: >>>DIRECT META (compact mode)<<<
,META='id:int;checknumber:string(10);name:string(10)'
-- Add: @OverrideMode:1 in the column list to override meta for only specific column(s) - In below example we are changing default datatype/length for id and checknumber, all other columns are dynamic so dont have to supply in META clause
--,META='@OverrideMode:1;id:int;checknumber:string(10)'  

--//Method-2: >>>DIRECT META (json mode)<<<
--,META='[{Name:"id",Type:"Int32"},{Name:"checknumber",Type:"String",Length:10},{Name:"name",Type:"String",Length:10}]'
-- Add: @OverrideMode entry in the column list to override meta for only specific column(s). No need to set Type for that entry - In below example we are changing default datatype/length for id and checknumber, all other columns are dynamic so dont have to supply in META clause
--,META='[{Name:"@OverrideMode"}, {Name:"id",Type:"Int32"},{Name:"checknumber",Type:"String",Length:10}]'

--//Method-3: >>>META from File <<<
--,META='c:/data/meta.json'

--//Method-4: >>>META from Internal Storage - Just refer entry by name <<<
--,META='MY_META_STORAGE_NAME'

)

Basic Read (API, File, Embedded) Escape Sequence (String Literals with new lines, tabs)

This example shows how to use escape sequence in String Literals (Quoted values (single quotes) in WITH, SELECT or some other clause if driver supports). You can use @ symbol before string literal to allow special characters (e.g. /r /n /t /v /u0041 //). /r=carriage return, /n=new line, /t=tab, /uXXXX=any ascii/unicode char by hex number. To produce Unicode character using escape sequence use hex number starting /u e.g. /u0041 will produce A (Uppercase).

SELECT * FROM rows 
WITH 
(DATA=@'{rows : [{id:1, name: "AA\u0041"},\r\n {id:2, name: "BBB"},\r\n {id:3, name: "CCC"}]}'
)

Compression (Zip,GZip) Query Large Compressed File (From URL or Local disk)

This example shows how to read very large file from local disk or URL. You can read uncompressed or compressed file (in our example its GZip compressed file). Using --FAST option in your Filter enables Streaming Mode. To use STREAM Mode you must turn off IncludeParentColumns option as per the example.

SELECT * FROM $ 
--LIMIT 10
WITH(
 Filter='$.LargeArray[*]--FAST' --//Adding --FAST option turn on STREAM mode (large files)
,SRC='zappysys-public-bucket/large_file_10k_largearray_prop.json.gz'
--,SRC='zappysys-public-bucket/large_file*.json.gz'
,IncludeParentColumns='False'  --//This Must be OFF for STREAM mode (read very large files)
,FileCompressionType='GZip' --Zip or None (Zip format only available for Local files)
)

Functions String Manipulation

This example shows how to use string functions. When you use function your entire dataset is first cached locally and then transformation is applied so you may notice some speed degradation.

SELECT 
  substr('ZappySys', 1, 5)
, printf('%s %s %.2f', 'v1', CompanyName, CompanyId)
, ('Str1' || CompanyName || 'Str2') as ConcatExample
, upper('ZappySys')
, lower('ZappySys')
, replace('ZappySys', 'Sys', 'XYZ')
, instr('ZappySys', 'Sys')
, trim('   Zappy Sys  ') trim1
, trim('@@@ZappySys@@@', '@') trim2
, rtrim('    ZappySys   ')
, ltrim('    ZappySys   ')
, length( CompanyName )
FROM $ WITH (
FILTER='$.Info.Rows[*]',
DATA='{ Info: { Rows: [ { CompanyId: 1000 , CompanyName: "ZappySys" }, { CompanyId: 1001 , CompanyName: "Microsoft" } ] } }'
)

Functions Control flow

This example shows how to use math functions. When you use function your entire dataset is first cached locally and then transformation is applied so you may notice some speed degradation.

SELECT 
	 COALESCE(10,20) coalesce1 --returns first non-null arg
    ,COALESCE(null,20) coalesce2 --returns first non-null arg
    ,COALESCE(null,20,30,40) coalesce3 --returns first non-null arg
    ,IFNULL(null,20) --returns first non-null arg. Same function as COALESCE but simple (just two arguments allowed)
    ,NULLIF(20,20) nullif1 --returns null if both argument same
    ,NULLIF(10,20) nullif2 --returns null if both argument same
FROM $ WITH (
FILTER='$.Info.Rows[*]',
DATA='{ Info: { Rows : [ { CompanyId: 1000 , CompanyName: "ZappySys" }, { CompanyId: 1001 , CompanyName: "Microsoft" }] } }')

Functions Date/Time

This example shows how to use date/time functions. When you use function your entire dataset is first cached locally and then transformation is applied so you may notice some speed degradation.

JSON driver supports following 5 datetime functions. The date and time functions use a subset of IS0-8601 date and time formats. The date() function returns the date in this format: YYYY-MM-DD. The time() function returns the time as HH:MM:SS. The datetime() function returns YYYY-MM-DD HH:MM:SS.

date(timestring, modifier, modifier, ...)
time(timestring, modifier, modifier, ...)
datetime(timestring, modifier, modifier, ...)
julianday(timestring, modifier, modifier, ...)
strftime(format, timestring, modifier, modifier, ...)

=======================================
time string (input data)
=======================================
A time string (input) can be in any of the following formats. You can hard code or supply from column.

YYYY-MM-DD
YYYY-MM-DD HH:MM
YYYY-MM-DD HH:MM:SS
YYYY-MM-DD HH:MM:SS.SSS
YYYY-MM-DDTHH:MM
YYYY-MM-DDTHH:MM:SS
YYYY-MM-DDTHH:MM:SS.SSS
HH:MM
HH:MM:SS
HH:MM:SS.SSS
now
DDDDDDDDDD

=======================================
format specifier for strftime function
=======================================
Use below format specifiers to output date in desired format. For example to output date in MM-DD-YYYY use below select query

select STRFTIME('%Y-%m-%d','now') formatted_date_time from mytable
select STRFTIME('%Y-%m-%d',[my column]) formatted_date_time from mytable

%d day of month: 00
%f fractional seconds: SS.SSS
%H hour: 00-24
%j day of year: 001-366
%J Julian day number
%m month: 01-12
%M minute: 00-59
%s seconds since 1970-01-01
%S seconds: 00-59
%w day of week 0-6 with Sunday==0
%W week of year: 00-53
%Y year: 0000-9999
%% %

=================
Modifiers
=================
The time string (input) can be followed by zero or more modifiers that alter date and/or time. Each modifier is a transformation that is applied to the time value to its left. Modifiers are applied from left to right; order is important. The available modifiers are as follows.

For example to add 5 days in current date you call it as below

select DATE('now', '+5 day') from mytable

NNN days
NNN hours
NNN minutes
NNN.NNNN seconds
NNN months
NNN years
start of month
start of year
start of day
weekday N
unixepoch
localtime
utc

The first six modifiers (1 through 6) simply add the specified amount of time to the date and time specified by the preceding timestring and modifiers. The 's' character at the end of the modifier names is optional. Note that '±NNN months' works by rendering the original date into the YYYY-MM-DD format, adding the ±NNN to the MM month value, then normalizing the result. Thus, for example, the data 2001-03-31 modified by '+1 month' initially yields 2001-04-31, but April only has 30 days so the date is normalized to 2001-05-01. A similar effect occurs when the original date is February 29 of a leapyear and the modifier is ±N years where N is not a multiple of four.

The 'start of' modifiers (7 through 9) shift the date backwards to the beginning of the current month, year or day.

The 'weekday' modifier advances the date forward to the next date where the weekday number is N. Sunday is 0, Monday is 1, and so forth.

SELECT 
 DATE('now') date_now
,STRFTIME('%Y-%m-%dT%H:%M:%f','now') formatted_Date_time
,DATETIME('now') datetime_now
,DATE('now','localtime') datetime_now_local
,TIME('now') time_now
,JULIANDAY('now')
,DATE('now', 'start of month', '+1 month' , '-1 day' ) dt_modify1
,DATE('now', '+5 day') dt_modify2

FROM $ WITH (DATA='{ Info: { Rows : [ { CompanyId: 1000 , CompanyName: "ZappySys" }, { CompanyId: 1001 , CompanyName: "Microsoft" } ] } }')

Functions Math

This example shows how to use math functions. When you use function your entire dataset is first cached locally and then transformation is applied so you may notice some speed degradation.

SELECT 
	 abs(-500)
    ,random()
    ,round(1000.236, 2)
FROM $  WITH (DATA='{ Info: { Rows: [ { CompanyId: 1000 , CompanyName: "ZappySys" }, { CompanyId: 1001 , CompanyName: "Microsoft" } ] } }')

Functions String Manipulation

This example shows how to use string functions. When you use function your entire dataset is first cached locally and then transformation is applied so you may notice some speed degradation.

SELECT 
  substr('ZappySys', 1, 5)
, printf('%s %s %.2f', 'v1', CompanyName, CompanyId)
, ('Str1' || CompanyName || 'Str2') as ConcatExample
, upper('ZappySys')
, lower('ZappySys')
, replace('ZappySys', 'Sys', 'XYZ')
, instr('ZappySys', 'Sys')
, trim('   Zappy Sys  ') trim1
, trim('@@@ZappySys@@@', '@') trim2
, rtrim('    ZappySys   ')
, ltrim('    ZappySys   ')
, length( CompanyName )
FROM $ WITH (
FILTER='$.Info.Rows[*]',
DATA='{ Info: { Rows: [ { CompanyId: 1000 , CompanyName: "ZappySys" }, { CompanyId: 1001 , CompanyName: "Microsoft" } ] } }'
)

Functions Regular Expression

This example shows how to regular expression functions.

SELECT 
 regex_ismatch(SSN, '\d{3}-\d{3}-\d{4}') IsValidSSN, --check pattern found
 regex_replace(SSN, '\d','*') MaskedSSN, --search/replace pattern
 regex_search(SSN, '\d{4}')  SSNLast4Digits, --extract substring using pattern search
 regex_search(Email, '(?<name>\w+)@(?<domain>\w+.com)',0, 2)  DomainOf1stEmailByIndex, --extract 1st email (zero based index) and get domain name (by group name). For 2nd argument you can use numeric index or named argument. Both arguments are zero based index. 
 regex_search(Email, '(?<name>\w+)@(?<domain>\w+.com)',2, 2)  DomainOf3rdEmailByIndex, --extract 3rd email (zero based index) and get 2nd part of match (i.e. domain name). Both arguments are zero based index.
 regex_search('a@x.com | b@y.com | c@z.com', '(?<name>\w+)@(?<domain>\w+.com)','-0', 2)  DomainOfLastEmail, --extract last email (use negative sign in quote - zero based index) and get 2nd part of match (i.e. domain name). Both arguments are zero based index.
 regex_search(
       'a@x.com | b@y.com | c@z.com'
     , '(?<name>\w+)@(?<domain>\w+.com)'
     , '*' --occurrence match index ( * , N or -N ) negative number will do reverse find (e.g. to get last match use -0 , to get second last use -1)
     , 2  --group index from match (default is full match or -1)
     --, 0  -- ignore case for pattern match (0=strict match, 1=ignore case, default is 0) 
     --, 0  -- throw error if not matched 0=don't throw error, 1=throw error
     --, '#'  -- separator for multi match
    )  GetAllDomains --extract domains from all emails and show as email (zero based index) and get 2nd part of match (i.e. domain name). Both arguments are zero based index.    
FROM $ WITH (Filter='$.data.row[*]', DATA= '{
 data: {
  row:[{SSN: "111-222-4444", Email: "a@x.com | b@y.com | c@z.com" },{SSN: "111-X2Z-AB44", Email: "a@x.com | b@y.com | c@z.com"}]
 }
}
')

Functions Using Placeholder Functions anywhere in SQL

This example shows how to use placeholders functions anywhere in your SQL Query. Placeholders are replaced in SQL text before Parser so you can use it anywhere in SQL. Another advantage of placeholder function is it doesn't invoke client side query processing engine unlike other advanced SQL constructs (e.g. WHERE, GROUP BY, ORDER BY, UDF). Placeholders are searched and replaced before parser can analyze the SQL query. Here are list of placeholder functions.

<<FUN_NOW>>
<<FUN_TODAY>>
<<today+2d,FUN_TO_DATE>>
<<yearstart,FUN_TO_DATE>>
<<monthstart-1d,FUN_TO_DATE>>
<<now+2h,FUN_TO_DATETIME>>
<<your_file_path,FUN_READFILE>>
<<input_text,FUN_BASE64ENC>>
<<input_text,FUN_BASE64ENC>>
<<input_text,FUN_BASE64DEC>>
<<input_text,FUN_URLENC>>
<<input_text,FUN_URLDEC>>
<<date_format,FUN_TODAY>>
<<date_format,FUN_TO_UTC_DATE>>
<<date_format,FUN_TO_UTC_DATETIME>>
<<date_format,FUN_TO_LOCAL_DATE>>
<<date_format,FUN_TO_LOCAL_DATETIME>>

Full list of functions found here: https://zappysys.com/links/?id=10103

Where date_format can be from any valid format specifiers from here https://docs.microsoft.com/en-us/dotnet/standard/base-types/custom-date-and-time-format-strings

Examples:
select * from $ WITH (SRC='http://httpbin.org/get', HEADER='Authtorization: Basic <<Myuser12:MyP@ssW!,FUN_BASE64ENC>>')
select * from $ WITH (SRC='http://httpbin.org/get?id=<<FUN_NEW_GUID>>')
select * from $ WITH (SRC='<<c:/temp/path.txt,FUN_READFILE>>')
select * from $ WITH (SRC='http://httpbin.org/get?startdate=<<yyyy-MM-dd HH:mm:ss,FUN_TODAY>>')
select * from $ WITH (SRC='http://httpbin.org/get', HEADER='date:<<2012-01-01T00:00:00,FUN_TO_UTC_DATE>>')

SELECT * FROM $ WITH(SRC='mybucket/myservice/myfile_<<yyyy-MM-dd,FUN_TODAY>>.dat')  --read today's file

Functions Use Function(s) without FROM clause

This example shows how to call function(s) without using FROM clause.

SELECT RTRIM('trimmed     ') Col1, 
               DATE('now', '+5 day') Col2

Advanced Options Custom Search / Replace (Remove special characters)

This example shows how to use custom search/replace feature. This function is invoked before parser parse data so its useful for some unique scenario which needs removal of conflicting characters, renaming columns etc.

SELECT * FROM rows 
WITH 
(DATA='
{          
  rows : [
        {_id:1, _name: "AAA",number:{}}, 
        {_id:2, _name: "BBB",number:{}}, 
        {_id:3, _name: "CCC",number:{}}
  ]
}'
,EnableCustomReplace=1
,SearchFor='number:{}'  --//or USE: --regex  , --regex-ic (case-insensitive) or no suffix for plain text search.
,ReplaceWith='number:{blank:1}'
)

Hierarchy Filter / Flattening Flatten Hierarchy with Filter (JSONPath / Regular Expression / Regex)

This example shows how to query data using direct Filter expression and some other filter related settings. It also shows how to use regular expressions to filter nodes. /*
===========================
Filter Examples (for in-memory filter. Full object loaded in memory):
===========================
$.store.books[*] //get all books for store
$.store.employees[*] //get all employees for store
$.store.books[*].sections[*] //get all sections from all books
$.store.books[*].author //get all authors of all books for store
$.store.books[*] //get all books for store
$.store.books[2] //get 3rd book record
$.store.books[:2] //get first 2 books from the top
$.store.books[-2:] //get last 2 books
$.store.books[?(@author=='sam')] //get all books where author attribute equals to 'sam'
$..[?(@author=='sam')] //get all documents / sub documents (any level) where author attribute equals to 'sam'
$.store.books[?(@author=~ /sam|bob/ )] //Using Regular Expression: get all books where author attribute contains 'sam' or 'bob'
$.store.books[?(@author=~ /^((?!sam|bob).)*$/ )] //Using Regular Expression: get all books where author attribute does not contain word 'sam' or 'bob'
$.store.books[?(@author=~ /^((?!sam).)*$/ && @category=~ /^((?!science).)*$/ )] //Using Regular Expression: get all books where author attribute does not contain word 'sam' and category attribute does not contain word 'science'
$.store.books[?(@author=~ /^sam|bob$/ )] //Using Regular Expression: get all books where author name is exactly 'sam' or 'bob'
$.store.books[?(@author=~ /^sam/ )] //Using Regular Expression: get all books where author name starts with 'sam'
$.store.books[?(@author=~ /sam$/ )] //Using Regular Expression: get all books where author name ends with 'sam'
$.store.employees[?( @.hiredate>'2015-01-01' && @.hiredate<'2015-01-04' )] //get all employees where hiredate between two dates
$.store.books[?(@price<10)] //get books where price is less than 10
$.store.books[?(@.price<10)] //get books where price (any level under books) is less than 10
$.store.books[?(@price<@old_price)] //get books where price is less than old price
$.store.books[?(@.tag)] //filter all books with tag
$.[0] //Get first node from Array Json format (only works if JSON starts with array)
===========================
Filter Examples (for Stream Mode filter):
===========================
$.store.books--FAST //get all books in stream mode. Means books property (which may have many records) is not loaded in memory but its streamed as needed
//stream mode is very fast but has limited filter capability so use very simple expressions (e.g. $.prop1.innerProp2)
//by default all simple expressions (e.g. no array brackets in expression) are STREAM mode
*/
{ "store": { "employees": [ { "name": "bob", "hiredate": "2015-01-01" }, { "name": "sam", "hiredate": "2015-01-02" }, { "name": "ken", "hiredate": "2015-01-03" } ], "books": [ { "category": "reference", "author": "bob", "title": "hellooo1", "price": 1.95, "old_price": 2.96, "sections": [ "s1", "s2", "s3" ] }, { "category": "fiction", "author": "sam", "title": "hellooo2", "price": 1.96, "old_price": 0.96, "sections": [ "s4", "s1", "s3" ] }, { "category": "science", "author": "steve", "title": "hellooo3", "tag": "1bcd", "price": 11, "old_price": 10, "sections": [ "s9", "s2", "s3" ] } ], "location": { "street": "123 Main St.", "city": "Newyork", "state": "GA" } } }

SELECT * FROM $ 
WITH 
(
--enter path in SRC or use static value in DATA'
--SRC='zappysys-public-bucket/api/data.json'
--SRC='zappysys-public-bucket/somefile.json'
--SRC='zappysys-public-bucket/some*.json'
DATA='
{ version:1,         
  doc: {     
	  location: {lat:111.55,lon:-222.12}, 
	  rows: [
	        {id:1, name: "AAA"}, 
	        {id:2, name: "BBB"}, 
	        {id:3, name: "CCC"}
	  ]
  }
}'
,Filter='$.doc.rows[*]' -- for more information check this https://zappysys.com/blog/jsonpath-examples-expression-cheetsheet/
--,Filter='$.doc.rows[?(@name=='AAA')]' --//equal match example
--,Filter='$..[?(@name=='AAA')]' --//equal match example (scan all levels - parent or child)
--,Filter='$.doc.rows[?(@name=~/AAA|BBB/)]' --//regex match example

,IncludeParentColumns='true'  --this will include any columns found above filtered node (e.g. rows).
,ParentColumnPrefix='p_'
)

Language Features Pivot Value (Columns to Rows)

This example shows how to pivot column to rows (e.g. Transform property name as row value). By default Tag name (i.e. Property) outputs as Column name but when your column name is Dynamic then you can use Pivot Feature to transform the data.

SELECT
  Pivot_Name as ProjectName, 
  Pivot_Value_id as Id, 
  Pivot_Value_code as Code,
  Pivot_Value_budget as Budget
FROM $
WITH(
Filter='$.projects',
--enter path in SRC or use static value in DATA'
--SRC='zappysys-public-bucket/api/data.json'
--SRC='zappysys-public-bucket/some*.json'
--SRC='zappysys-public-bucket/somefile.json'
DATA='
{          
  projects : {
    P101 : {id:1, code: "AAA", budget:1500.10}, 
    P102 : {id:2, code: "BBB", budget:7000.20}, 
    P103 : {id:3, code: "CCC", budget:1100.10}
  }
}
',
EnablePivot='True'
)

Functions JSON

This example shows how to use various JSON functions.

select 
json_value(j1,"$.V1"),
json_array_first(j2),
json_array_last(j2),
json_array_nth(j2,1)
from array WITH (DATA=
'{"array" : [{"c1": "abc", "c2": "ABC", "j1": "{''V1'':''value1''}", "j2": "[13,22,''31'']"}] }')

Functions JSON (Flattening un-collapsed JSON)

This example shows how to flatten JSON arrays which are not flattened by the driver. Another way to flatten small arrays without using json_value is set EnableArrayFlattening=true


/*
SELECT * FROM value
WITH(SRC='zappysys-public-bucket/people.json'
, EnableArrayFlattening = 'True' --This is another way to extract data from small arrays. It will flatten arrays upto N items
, MaxArrayItemsToFlatten = 5
)
*/

-- OR --

SELECT 
	UserName,
	Features as Features_Raw,
	json_value(Features,'$.[0]') || ',' || json_value(Features,'$.[1]')  Features_CSV,
	AddressInfo as AddressInfo_Raw,
	json_value(AddressInfo,'$.[0].Address') Address1_Address,
	json_value(AddressInfo,'$.[0].City.Name') Address1_City,
	json_value(AddressInfo,'$.[0].City.CountryRegion') Address1_Country,
	json_value(AddressInfo,'$.[0].City.Region') Address1_Region,
	
	json_value(AddressInfo,'$.[1].Address') Address2_Address,
	json_value(AddressInfo,'$.[1].City.Name') Address2_City,
	json_value(AddressInfo,'$.[1].City.CountryRegion') Address2_Country,
	json_value(AddressInfo,'$.[1].City.Region') Address2_Region,
	Cost
FROM value
WITH(SRC='zappysys-public-bucket/people.json'
)

Array Transformation Query Simple 2D Array

This example shows how to query compact 2D array (array inside array) where column name is not repeated. You can specify Simple 2D Transformation along with filter to select column name (e.g. $.columns[*])

SELECT * FROM rows
WITH(
--enter path in SRC or use static value in DATA'
--SRC='zappysys-public-bucket/api/data.json'
--SRC='zappysys-public-bucket/some*.json'
--SRC='zappysys-public-bucket/somefile.json'
DATA='
{
  "columns": ["RecordID","CustomerID","CustomerName"],
  "rows": [ [1,"AAA","Customer A"], [2,"BBB","Customer B"], [3,"CCC","Customer C"] ]
}'
,ArrayTransformType='TransformSimpleTwoDimensionalArray'
,ArrayTransColumnNameFilter='$.columns[*]'
)

Array Transformation Query Simple 2D Array (With Missing Columns)

This example shows how to query compact 2D array (array inside array) where column name is missing.

SELECT * FROM rows
WITH(
--enter path in SRC or use static value in DATA'
--SRC='zappysys-public-bucket/api/data.json'
--SRC='zappysys-public-bucket/some*.json'
--SRC='zappysys-public-bucket/somefile.json'
DATA='
{
  "rows": [ [1,"AAA","Customer A"], [2,"BBB","Customer B"], [3,"CCC","Customer C"] ]
}'
,ArrayTransformType='TransformSimpleTwoDimensionalArray'
,ArrayTransEnableCustomColumns='True'
,ArrayTransCustomColumns='RecordID,CustomerID,CustomerName'
)

Array Transformation Query Complex 2D Array

SELECT * FROM rows
WITH(
--enter path in SRC or use static value in DATA'
--SRC='zappysys-public-bucket/api/data.json'
--SRC='zappysys-public-bucket/some*.json'
--SRC='zappysys-public-bucket/somefile.json'
DATA='
{ 
	requestid:1, 
	schema:{ fields:[{name:"id"},{name:"name"},{name:"birthdate"}] }, 
	rows: [ { f:[{v:10},{v:"aaa"},{v:"2015-01-01"}]}, { f:[{v:20},{v:"bbb"},{v:"2015-02-01"}] } ]
}'
,ArrayTransformType = 'TransformComplexTwoDimensionalArray'
,ArrayTransColumnNameFilter = '$.schema.fields[*].name'
,ArrayTransRowValueFilter = '$.f[*].v'
)

Performance Handling Large Dataset / Avoid OutOfMemory Error (Using Disk for Query Engine)

Advanced Query can invokes ClientSide query Engine which means entire table is fetched in memory for processing causing OutOfMemory Exception (OOM). To avoid this error you can specify Disk based Temporary storage for query engine. Default is In memory storage.

SELECT * FROM $
WITH(

   Src='zappysys-public-bucket/cust-1.json'
  ,TempStorageMode='Disk' --or 'Memory'
--OR
--,TempStorageMode=1 --//Memory=0, Disk=1
)

Language Features DateTime field Compare in WHERE / ORDER BY

This example shows how to use date time field in WHERE or / ORDER BY. Date Time field needs special handling. You have to use DATETIME function else it will do string compare rather true date time compare. Foe example WHERE OrderDate<=DATETIME('1997-12-31 00:00:00') -- OR WHERE OrderDate<=DATE('1997-12-31'). You must use DATETIME either left side or right side. Also must use 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss' or 'yyyy-MM-dd HH:mm:ss.fff' format (where fff is milliseconds)

SELECT * FROM value 
WHERE OrderDate<=DATETIME('1997-12-31 00:00:00') -- OR use DATE('1997-12-31') DateTime column can be queried this way. You must wrap DATETIME function around it. Also must use 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'  or 'yyyy-MM-dd HH:mm:ss.fff' format (where fff is milliseconds)
ORDER BY DATETIME(OrderDate) 
WITH (SRC='zappysys-public-bucket/invoices.json')

Language Features Group By / Limit / Order By

This example shows how to use GROUP BY / ORDER BY and LIMIT clauses.

SELECT 
  Country AS Invoice_Country
, SUM(UnitPrice * Quantity) AS Invoice_Total
FROM value 
WHERE Discount > 0
--AND OrderDate<=DATETIME('1997-12-31 00:00:00') -- OR use DATE('1997-12-31') DateTime column can be queried this way. You must wrap DATETIME function around it. Also must use 'yyyy-MM-dd' or 'yyyy-MM-dd HH:mm:ss'  or 'yyyy-MM-dd HH:mm:ss.fff' format (where fff is milliseconds)
GROUP BY Country 
HAVING SUM(UnitPrice * Quantity) > 1000
ORDER BY Invoice_Total DESC--,DATETIME(OrderDate) 
LIMIT 3
WITH (SRC='zappysys-public-bucket/invoices.json')

Language Features Case Statement

This example shows how to write CASE statement to select / output different value based on multiple conditions.

SELECT 
  name 
, CASE id 
 WHEN 1 THEN 1+1 
 WHEN 2 THEN 2+2 
 ELSE 0 
 END ThisIsCaseColumn
FROM rows 
WITH 
(DATA='
{          
  rows : [
        {id:1, name: "AAA"}, 
        {id:2, name: "BBB"}, 
        {id:3, name: "CCC"}
  ]
}'
)

Language Features SELECT INTO (Temporary Table) / Multi Statements

This example shows how to use SELECT INTO statement to save result into temporary table and use it later in the same batch. Temp table is indicated by # prefix (e.g. #tmp). This example also uses multiple statements in the same batch. If you have multiple results in the same batch (i.e. SELECT * FROM ) then only last result is returned and others results are discarded. Each statement must be separated by semi-colon ( ; )

SELECT * INTO #tmp FROM rows 
WITH 
(
--enter path in SRC or use static value in DATA
--SRC='zappysys-public-bucket/api/data.json'
--SRC='zappysys-public-bucket/some*.json'
--SRC='zappysys-public-bucket/somefile.json'
--OR--
DATA='
{          
  rows : [
        {id:1, name: "AAA"}, 
        {id:2, name: "BBB"}, 
        {id:3, name: "CCC"}
  ]
}'
);

--//second statement in the same batch 
select * from #tmp 
where id > 1
--AND SomeDateColumn=DATETIME('2012-12-31 00:00:00')  --DateTime column can be queried this way... Must use 'yyyy-MM-dd HH:mm:ss' format
;

Language - JOIN Statement using SELECT INTO / Multi Statements

This example shows how to use SELECT INTO statement and use of multiple statements in the same batch. Output from only last SELECT is returned if you have multiple SELECT * FROM in the same batch. Each statement must be separated by semi-colon ( ; )


SELECT * INTO #cust FROM rows 
WITH (
--enter path in SRC or use static value in DATA
--SRC='zappysys-public-bucket/api/customers.json'
--SRC='zappysys-public-bucket/cust*.json'
--SRC='zappysys-public-bucket/customers.json'
DATA='{
rows: [
  {customerid: 1, name: "AAA"},
  {customerid: 1, name: "BBB"},
  {customerid: 1, name: "CCC"}
 ]
}
'
);

SELECT * INTO #ord FROM rows
WITH (
DATA='{
rows: [
  {orderid:1000, customerid: 1, orderdate: "2012-01-01T00:00:00",total:2000.50},
  {orderid:1001, customerid: 1, orderdate: "2012-01-01T00:00:00",total:2000.50},
  {orderid:1002, customerid: 2, orderdate: "2013-01-01T00:00:00",total:5000.70},
  {orderid:1003, customerid: 2, orderdate: "2013-01-02T00:00:00",total:6000.70}
 ]
}'
);

--//second statement in the same batch 
SELECT c.customerid,o.orderid,o.orderdate,o.total 
FROM #cust c
INNER JOIN #ord o ON c.customerid=o.customerid
--LEFT OUTER JOIN #ord o ON c.customerid=o.customerid
--WHERE o.orderdate=DATETIME('2012-01-01 00:00:00') --for datetime column format must be queried in 'yyyy-MM-dd HH:mm:ss' format only
-- To Use greater than less than must need to use DATETIME function (e.g. WHERE o.OrderDate<=DATETIME('1997-12-31 23:12:59')
;

Language Features UNION ALL / UNION Statement

This example shows how to write UNION or UNION ALL statement to combine results from multiple SQL queries. To remove duplicate rows from result use UNION rather than UNION ALL.

SELECT * into #tbl1 FROM rows 
WITH (
--enter path in SRC or use static value in DATA
--SRC='zappysys-public-bucket/api/data.json'
--SRC='zappysys-public-bucket/some*.json'
--SRC='zappysys-public-bucket/somefile.json'

DATA='{ rows : [{id:1, name: "AAA"}, {id:2, name: "BBB"}]}'
);

SELECT * into #tbl2 FROM rows 
WITH (DATA='{ rows : [{id:3, name: "CCC"}, {id:4, name: "DDD"}]}' );

select * from #tbl1
UNION ALL
select * from #tbl2;

Language Features Basic SQL queries

Basic query examples.


--Whenever Possible try to write Simple Query. When you use WHERE / GROUP BY / ORDER BY Expressions. Driver may invoke Client side processing if server side doesnt support that construct.
--For comment use line comment with two dash (e.g. --my comment) or use block comment for multi line comment (e.g. /*my comment*/ )
--Any query must contain at least two required parts
--data source (or table) names and a list of column names
--column names come first (preceded by SELECT keyword)
--table names follow (preceded by FROM keyword)

--Here is the query that returns all rows with all columns (* stands for all columns) from 'Accounts' table
SELECT * FROM Accounts

--Use quoted identifiers when you have space or other character in names (e.g. Column or Table name contains space)
SELECT ID, "Account Name" FROM "Closed Accounts"
SELECT ID, [Account Name] FROM [Closed Accounts]

--returns all rows from 'Accounts' table, each row will have Id and Amount columns
SELECT Id, Amount FROM Accounts

--returns only those rows of 'Accounts' table that meet next condition: Amount is greater or equal to 1000.0
SELECT Id, Amount FROM Accounts WHERE Amount >= 1000.0

--returns only those rows of 'Accounts' table that meet next condition: Amount is greater than 1000 or DepartamentID is equal to 123 
SELECT Id, Amount FROM Accounts WHERE (Amount > 1000.0 OR DepartamentID = 123)

--returns only those rows of 'Accounts' table that meet next condition: DepartamentID is equal to 1, 101 or 202 
SELECT Id, Amount FROM Accounts WHERE DepartamentID IN (1, 101, 202)

--returns only those rows of 'Accounts' table that meet next condition: DepartamentID is equal to 321 and Amount is less than 100 
SELECT Id, Amount FROM Accounts WHERE DepartamentID = 321 AND Amount < 100

--returns only those rows of 'Persons' table that meet next condition: Name column value starts with 'John'
SELECT * FROM Person WHERE Name LIKE 'John%'

--case statement
SELECT 
 FirstName,
 LastName,
 CASE Country 
  WHEN 'US' THEN 'NA' 
  WHEN 'Canada' THEN 'NA'
  WHEN 'India' THEN 'Asia'
  WHEN 'China' THEN 'Asia'
  ELSE 'Unknown' 
 END ColRegion 
 FROM Person 

--escaping string values (use of /n /r /t /v allowed)
SELECT * FROM MyTable WITH( Body=@'Line1/nLine2')