import

Overview

Bulk import data from existing files or from any relational database source to redshift in single command. Command supports ODBC, OLEDB and ADO.net drivers to consume source data

Syntax

import {Db|File}<SOURCE_TYPE> [--enc-key ENCRYPTION_KEY_PATH]
           [--load-options INPUT] [--maxthreads INPUT] [--source-compressed]
           [--source-driver {ADONET_MSSQL|ODBC|OLEDB|ZS_REDSHIFT}]
           [--source-path INPUT] [--source-recursive] [--source-stagepath INPUT]
           [--source-query INPUT] [--archivemethod {Delete|Move}]
           [--source-archivemethod {Delete|Move}]
           [--source-stage-archivemethod {Delete|Move}]
           [--source-archivefolder INPUT] [--source-stage-archivefolder INPUT]
           [--escapechar] [--target-stage-archivemethod {Delete|Move}]
           [--target-stage-archivefolder INPUT] [--target-stagepath INPUT]
           [--target-table INPUT] [--no-split] [--maxrows-perfile INPUT]
           [--maxrows-all INPUT] [--logfile INPUT] [--firstrow-has-colnames]
           [--target-truncate] [--target-truncate-fast] [--nullas INPUT]
           [--treat-empty-asnull] [--target-columns INPUT] [--cleanup-onerror]
           [--treat-blank-asnull] [--coldelimiter INPUT] [--quotedvalue]
           [--dateformat INPUT] [--allow-invalidchar]
           [--invalidchar-replacement INPUT] [--maxerrors INPUT] [--testload]
           [--rows-to-skip INPUT] [--ignore-invalid-date] [--truncate-to-fit]
           [--roundec] [--identinsert] [--ignore-blank] [--trimend]
           [--no-compress] [--updatestats] [--allow-missing-cols]
           [--source-connstr INPUT] [--target-connstr INPUT]
           [--accesskey YOUR_ACCESS_KEY] [--secretkey YOUR_SECRET_KEY]
           [--serviceurl INPUT] [--region INPUT] [--consoleout FILEPATH]
           [-v|--verbose] [-w|--wait] [-i|--interactive]

Arguments

Parameter	Description
<SOURCE_TYPE>	Type of source from where you want to read data. Valid options are DB or FILE . If input files are already compressed in gz format then use --source-compressed option. Possible values are [Db \| File]
--enc-key ENCRYPTION_KEY_PATH	Specifies AES 256 encryption key file path. This encryption key is used to encrypt data before uploading to S3. Data is decrypted automatically by Redshift while loading S3 files into target table. To create new encryption key use mkkey command.
--load-options	Additional load options for Redshift Copy
--maxthreads	Maximum parallel threads
--source-compressed	Specifies whether source file(s) already compressed
--source-driver	Driver type (e.g. ADONET_SQL, ODBC, OLEDB)
--source-path	Source file(s) path (e.g. c:\cust*.txt --or-- c:\cust001.txt
--source-recursive	Include source files from sub directories
--source-stagepath	Source stage path
--source-query	Source data query
--archivemethod	Default Archive method for source and target stage file
--source-archivemethod	Archive method for source file(s)
--source-stage-archivemethod	Archive method for source stage file(s)
--source-archivefolder	Archive folder path for source file(s)
--source-stage-archivefolder	Archive folder path for source stage file(s)
--escapechar	Allow escape sequence character
--target-stage-archivemethod	Archive method for target stage file(s)
--target-stage-archivefolder	Archive folder path for target stage file(s)
--target-stagepath	Target staging folder path
--target-table	Target table where you want to load data
--no-split	Do not split data into multiple files. If you don't specify this option then data is exported to stage into multiple files (default rows per file can be set using --maxrows-perfile option (default is 1000000 per file)
--maxrows-perfile	Max rows per file when splitfiles option is enabled
--maxrows-all	Max rows to load from input dataset
--logfile	Log file path
--firstrow-has-colnames	Truncate target table before load
--target-truncate	Truncate target table before data load using transaction with rollback support
--target-truncate-fast	Truncate target table before data load without using transaction (not recommended. Cant rollback delete if data load fails)
--nullas	Null indicator string in data
--treat-empty-asnull	Treat empty string as NULL
--target-columns	List of columns to load in target table (input data must be in the same order)
--cleanup-onerror	Always make sure stage files cleaned up when archive method is set to Delete. When this option is not set files are left in stage area if error occurs.
--treat-blank-asnull	Treat blank string (empty or only spaces) as NULL
--coldelimiter	Field separator character for data files. (NOTE: use \t for tab)
--quotedvalue	Wrap quotes around value in data files
--dateformat	Date format in data file (e.g. YYYY-MM-DD HH24:MI:SS.MI)
--allow-invalidchar	Allow invalid UTF8 character in input data files
--invalidchar-replacement	Replacement char for invalid char
--maxerrors	Maximum allowed errors before failing data load
--testload	Load data in test mode. This option validates full load without actually loading it
--rows-to-skip	Number of initial rows to skip from data file
--ignore-invalid-date	Ignore invalid date
--truncate-to-fit	Truncate extra chars to fit target column
--roundec	Round decimal values rather than truncate
--identinsert	Enable identity insert
--ignore-blank	Enable identity insert
--trimend	Trim trailing blanks
--no-compress	Do not compress files before transfer. If this option is not specified then files are always compressed
--updatestats	Update statistics after load
--allow-missing-cols	Allow missing columns at the end in source file
--source-connstr	Source DB connection string
--target-connstr	Target DB connection string
--accesskey YOUR_ACCESS_KEY	Cloud account access key
--secretkey YOUR_SECRET_KEY	Cloud account secret key
--serviceurl	Service url for Amazon S3 service. Leave blank to route requests to default endpoint.
--region	Region of your S3 Bucket used for staging area for Redshift. Specify this option if S3 bucket is in different region than your redshift cluster.
--consoleout FILEPATH	Captures console output and saves to file
-v \| --verbose	Log verbosity level. -v means detailed, If you don't specify this option then minimum logging will be done.
-w \| --wait	Wait for user input before exit
-i \| --interactive	Continue in interactive mode after this command is completed. This switch is not applicable if you already in interactive mode

Notes

How to Setup Amazon Redshift Cluster

Click here to learn more

Examples

Examples
Import SQL Server records to Redshift Cluster using minimum default options import Db --source-driver ADONET_MSSQL --source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.,b.,c.OrderID,c.OrderDate,c.Freight from customers a,products b,orders c" --source-stagepath "c:\redshift\stage" --target-table "customerdata" --target-stagepath "bw-rstest/cmdstage" --logfile "c:\redshift\log.txt" --region us-east-1 --accesskey "AKIA***************" --secretkey "lPi+XQ********************" --source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;" --target-connstr "Host=mytestcluster-1.csu****.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*******;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
Import SQL Server records to Redshift Cluster using minimum default options import Db --source-driver ADONET_MSSQL --source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.,b.,c.OrderID,c.OrderDate,c.Freight from customers a,products b,orders c" --source-stagepath "c:\redshift\stage" --target-table "customerdata" --target-stagepath "bw-rstest/cmdstage" --logfile "c:\redshift\log.txt" --region us-east-1 --accesskey "AKIA***************" --secretkey "lPi+XQ********************" --source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;" --target-connstr "Host=mytestcluster-1.csu****.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*******;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
Import SQL Server records to Redshift Cluster - With Encryption turned on #-------------------------------------- #Step-1 : Create your encryption key (This step only required first time. Once encryption key is generated you can reuse it) #-------------------------------------- mkkey "c:\private.key" --aes #-------------------------------------- #Step-2 : Load your data to redshift #-------------------------------------- import Db --enc-key "c:\private.key" --source-driver ADONET_MSSQL --source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.,b.,c.OrderID,c.OrderDate,c.Freight from customers a,products b,orders c" --source-stagepath "c:\redshift\stage" --target-table "customerdata" --target-stagepath "bw-rstest/cmdstage" --logfile "c:\redshift\log.txt" --region us-east-1 --accesskey "AKIA***************" --secretkey "lPi+XQ********************" --source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;" --target-connstr "Host=mytestcluster-1.csu****.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*******;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
Import SQL Server records to Redshift Cluster import Db --source-driver ADONET_MSSQL --source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.,b.,c.OrderID,c.OrderDate,c.Freight from customers a,products b,orders c" --source-archivemethod None --source-stage-archivemethod Delete --target-stage-archivemethod Delete --source-stagepath "c:\redshift\stage" --target-table "customerdata" --target-stagepath "bw-rstest/cmdstage" --target-truncate --logfile "c:\redshift\log.txt" --maxrows-perfile 100000 --region us-east-1 --accesskey "AKIA***************" --secretkey "lPi+XQ********************" --source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;" --target-connstr "Host=mytestcluster-1.csu****.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*******;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
Import local flat files to Redshift cluster import File --source-path "c:\data\files\.part" --source-archivemethod None --source-stage-archivemethod Delete --target-stage-archivemethod Delete --source-stagepath "c:\redshift\stage" --target-table "customerdata" --target-stagepath "bw-rstest/cmdstage" --target-truncate --logfile "c:\redshift\log.txt" --maxrows-perfile 100000 --region us-east-1 --accesskey "AKIA**************" --secretkey "lPi+XQ********************" --target-connstr "Host=mytestcluster-1.csu****.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*******;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
Import local compressed gzip files to Redshift cluster import File --source-compressed --source-path "c:\data\gzip\.gz" --source-archivemethod None --target-stage-archivemethod Delete --target-table "customerdata" --target-stagepath "bw-rstest/cmdstage" --target-truncate --coldelimiter "\|" --logfile "c:\redshift\log.txt" --maxrows-perfile 100000 --region us-east-1 --accesskey "AKIA**************" --secretkey "lPi+XQ********************" --target-connstr "Host=mytestcluster-1.csu****.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*******;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
Import data from any ODBC data source to Redshift cluster. Use tab for column seperator import Db --source-driver ODBC --source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.,b.,c.OrderID,c.OrderDate,c.Freight from customers a,products b,orders c" --source-archivemethod None --source-stage-archivemethod Delete --target-stage-archivemethod Delete --source-stagepath "c:\redshift\stage" --target-table "customerdata" --target-stagepath "bw-rstest/cmdstage" --target-truncate --coldelimiter "\t" --logfile "c:\redshift\log.txt" --maxrows-perfile 100000 --region us-east-1 --accesskey "AKIA***************" --secretkey "lPi+XQ********************" ## Approach-1: Below syntax is for Odbc connectivity using DSN approach ## --source-connstr "Dsn=MyDsn;Uid=MyUser1;Pwd=MyP@ssword;" ## Approach-2: Below syntax is for Odbc connectivity using DSN less approach. Refer vendor connection string documentation for valid Driver name/Attributes --source-connstr "Driver={SQL Server Native Client 10.0};Server=localhost;Database=Northwind;Trusted_Connection=yes;" ## Adjust CommandTime out for long running Load. Default is 1 hr for command timeout and 30 seconds for connect timeout --target-connstr "Host=mytestcluster-1.csu****.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*******;EnableSsl=true;Timeout=30;CommandTimeout=3600;"

www.zappysys.com | Products | All copyrights reserved. ZappySys LLC.