Help > AWS > redshift >
import
PreviousNext

  • Overview
  • Syntax
  • Arguments
  • Notes
  • Examples
  • Overview

    Bulk import data from existing files or from any relational database source to redshift in single command. Command supports ODBC, OLEDB and ADO.net drivers to consume source data

    Download ZappyShell

    Syntax

    import {Db|File}<SOURCE_TYPE> [--enc-key ENCRYPTION_KEY_PATH]
               [--load-options INPUT] [--maxthreads INPUT] [--source-compressed]
               [--source-driver {ADONET_MSSQL|ODBC|OLEDB|ZS_REDSHIFT}]
               [--source-path INPUT] [--source-recursive] [--source-stagepath INPUT]
               [--source-query INPUT] [--archivemethod {Delete|Move}]
               [--source-archivemethod {Delete|Move}]
               [--source-stage-archivemethod {Delete|Move}]
               [--source-archivefolder INPUT] [--source-stage-archivefolder INPUT]
               [--escapechar] [--target-stage-archivemethod {Delete|Move}]
               [--target-stage-archivefolder INPUT] [--target-stagepath INPUT]
               [--target-table INPUT] [--no-split] [--maxrows-perfile INPUT]
               [--maxrows-all INPUT] [--logfile INPUT] [--firstrow-has-colnames]
               [--target-truncate] [--target-truncate-fast] [--nullas INPUT]
               [--treat-empty-asnull] [--target-columns INPUT] [--cleanup-onerror]
               [--treat-blank-asnull] [--coldelimiter INPUT] [--quotedvalue]
               [--dateformat INPUT] [--allow-invalidchar]
               [--invalidchar-replacement INPUT] [--maxerrors INPUT] [--testload]
               [--rows-to-skip INPUT] [--ignore-invalid-date] [--truncate-to-fit]
               [--roundec] [--identinsert] [--ignore-blank] [--trimend]
               [--no-compress] [--updatestats] [--allow-missing-cols]
               [--source-connstr INPUT] [--target-connstr INPUT]
               [--accesskey YOUR_ACCESS_KEY] [--secretkey YOUR_SECRET_KEY]
               [--serviceurl INPUT] [--region INPUT] [--consoleout FILEPATH]
               [-v|--verbose] [-w|--wait] [-i|--interactive] 
    

    Arguments

    Parameter Description
    <SOURCE_TYPE>
    Type of source from where you want to read data. Valid options are DB or FILE . If input files are already compressed in gz format then use --source-compressed option. Possible values are [Db | File]
    --enc-key ENCRYPTION_KEY_PATH
    Specifies AES 256 encryption key file path. This encryption key is used to encrypt data before uploading to S3. Data is decrypted automatically by Redshift while loading S3 files into target table. To create new encryption key use mkkey command.
    --load-options
    Additional load options for Redshift Copy
    --maxthreads
    Maximum parallel threads
    --source-compressed
    Specifies whether source file(s) already compressed
    --source-driver
    Driver type (e.g. ADONET_SQL, ODBC, OLEDB)
    --source-path
    Source file(s) path (e.g. c:\cust*.txt --or-- c:\cust001.txt
    --source-recursive
    Include source files from sub directories
    --source-stagepath
    Source stage path
    --source-query
    Source data query
    --archivemethod
    Default Archive method for source and target stage file
    --source-archivemethod
    Archive method for source file(s)
    --source-stage-archivemethod
    Archive method for source stage file(s)
    --source-archivefolder
    Archive folder path for source file(s)
    --source-stage-archivefolder
    Archive folder path for source stage file(s)
    --escapechar
    Allow escape sequence character
    --target-stage-archivemethod
    Archive method for target stage file(s)
    --target-stage-archivefolder
    Archive folder path for target stage file(s)
    --target-stagepath
    Target staging folder path
    --target-table
    Target table where you want to load data
    --no-split
    Do not split data into multiple files. If you don't specify this option then data is exported to stage into multiple files (default rows per file can be set using --maxrows-perfile option (default is 1000000 per file)
    --maxrows-perfile
    Max rows per file when splitfiles option is enabled
    --maxrows-all
    Max rows to load from input dataset
    --logfile
    Log file path
    --firstrow-has-colnames
    Truncate target table before load
    --target-truncate
    Truncate target table before data load using transaction with rollback support
    --target-truncate-fast
    Truncate target table before data load without using transaction (not recommended. Cant rollback delete if data load fails)
    --nullas
    Null indicator string in data
    --treat-empty-asnull
    Treat empty string as NULL
    --target-columns
    List of columns to load in target table (input data must be in the same order)
    --cleanup-onerror
    Always make sure stage files cleaned up when archive method is set to Delete. When this option is not set files are left in stage area if error occurs.
    --treat-blank-asnull
    Treat blank string (empty or only spaces) as NULL
    --coldelimiter
    Field separator character for data files. (NOTE: use \t for tab)
    --quotedvalue
    Wrap quotes around value in data files
    --dateformat
    Date format in data file (e.g. YYYY-MM-DD HH24:MI:SS.MI)
    --allow-invalidchar
    Allow invalid UTF8 character in input data files
    --invalidchar-replacement
    Replacement char for invalid char
    --maxerrors
    Maximum allowed errors before failing data load
    --testload
    Load data in test mode. This option validates full load without actually loading it
    --rows-to-skip
    Number of initial rows to skip from data file
    --ignore-invalid-date
    Ignore invalid date
    --truncate-to-fit
    Truncate extra chars to fit target column
    --roundec
    Round decimal values rather than truncate
    --identinsert
    Enable identity insert
    --ignore-blank
    Enable identity insert
    --trimend
    Trim trailing blanks
    --no-compress
    Do not compress files before transfer. If this option is not specified then files are always compressed
    --updatestats
    Update statistics after load
    --allow-missing-cols
    Allow missing columns at the end in source file
    --source-connstr
    Source DB connection string
    --target-connstr
    Target DB connection string
    --accesskey YOUR_ACCESS_KEY
    Cloud account access key
    --secretkey YOUR_SECRET_KEY
    Cloud account secret key
    --serviceurl
    Service url for Amazon S3 service. Leave blank to route requests to default endpoint.
    --region
    Region of your S3 Bucket used for staging area for Redshift. Specify this option if S3 bucket is in different region than your redshift cluster.
    --consoleout FILEPATH
    Captures console output and saves to file
    -v | --verbose
    Log verbosity level. -v means detailed, If you don't specify this option then minimum logging will be done.
    -w | --wait
    Wait for user input before exit
    -i | --interactive
    Continue in interactive mode after this command is completed. This switch is not applicable if you already in interactive mode

    Notes

    How to Setup Amazon Redshift Cluster

    Click here to learn more

    Examples

    Examples
    Import SQL Server records to Redshift Cluster using minimum default options

    import Db 
    	--source-driver ADONET_MSSQL 
    	--source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.*,b.*,c.OrderID,c.OrderDate,c.Freight  from customers a,products b,orders c" 
    	--source-stagepath "c:\redshift\stage" 
    	--target-table "customerdata" 
    	--target-stagepath "bw-rstest/cmdstage" 
    	--logfile "c:\redshift\log.txt" 
    	--region us-east-1 
    	--accesskey "AKIA*****************" 
    	--secretkey "lPi+XQ************************"  
    	--source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;"  
    	--target-connstr "Host=mytestcluster-1.csu********.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*********;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
      
    Import SQL Server records to Redshift Cluster using minimum default options

    import Db 
    	--source-driver ADONET_MSSQL 
    	--source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.*,b.*,c.OrderID,c.OrderDate,c.Freight  from customers a,products b,orders c" 
    	--source-stagepath "c:\redshift\stage" 
    	--target-table "customerdata" 
    	--target-stagepath "bw-rstest/cmdstage" 
    	--logfile "c:\redshift\log.txt" 
    	--region us-east-1 
    	--accesskey "AKIA*****************" 
    	--secretkey "lPi+XQ************************"  
    	--source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;"  
    	--target-connstr "Host=mytestcluster-1.csu********.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*********;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
      
    Import SQL Server records to Redshift Cluster - With Encryption turned on

    #--------------------------------------      
    #Step-1 : Create your encryption key (This step only required first time. Once encryption key is generated you can reuse it)      
    #--------------------------------------
    
    mkkey "c:\private.key" --aes
    
    #--------------------------------------
    #Step-2 : Load your data to redshift
    #--------------------------------------
    import Db 
      --enc-key "c:\private.key"
    	--source-driver ADONET_MSSQL 
    	--source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.*,b.*,c.OrderID,c.OrderDate,c.Freight  from customers a,products b,orders c" 
    	--source-stagepath "c:\redshift\stage" 
    	--target-table "customerdata" 
    	--target-stagepath "bw-rstest/cmdstage" 
    	--logfile "c:\redshift\log.txt" 
    	--region us-east-1 
    	--accesskey "AKIA*****************" 
    	--secretkey "lPi+XQ************************"  
    	--source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;"  
    	--target-connstr "Host=mytestcluster-1.csu********.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*********;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
      
    Import SQL Server records to Redshift Cluster

    import Db 
    	--source-driver ADONET_MSSQL 
    	--source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.*,b.*,c.OrderID,c.OrderDate,c.Freight  from customers a,products b,orders c" 
    	--source-archivemethod None 
    	--source-stage-archivemethod Delete 
    	--target-stage-archivemethod Delete 
    	--source-stagepath "c:\redshift\stage" 
    	--target-table "customerdata" 
    	--target-stagepath "bw-rstest/cmdstage" 
    	--target-truncate 
    	--logfile "c:\redshift\log.txt" 
    	--maxrows-perfile 100000 
    	--region us-east-1 
    	--accesskey "AKIA*****************" 
    	--secretkey "lPi+XQ************************"  
    	--source-connstr "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=SSPI;"  
    	--target-connstr "Host=mytestcluster-1.csu********.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*********;EnableSsl=true;Timeout=30;CommandTimeout=3600;"
      
    Import local flat files to Redshift cluster

    import File 
    	--source-path "c:\data\files\*.part" 
    	--source-archivemethod None 
    	--source-stage-archivemethod Delete 
    	--target-stage-archivemethod Delete 
    	--source-stagepath "c:\redshift\stage" 
    	--target-table "customerdata" 
    	--target-stagepath "bw-rstest/cmdstage" 
    	--target-truncate 
    	--logfile "c:\redshift\log.txt" 
    	--maxrows-perfile 100000 
    	--region us-east-1 
    	--accesskey "AKIA*****************" 
    	--secretkey "lPi+XQ************************"  
    	--target-connstr "Host=mytestcluster-1.csu********.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*********;EnableSsl=true;Timeout=30;CommandTimeout=3600;" 
      
    Import local compressed gzip files to Redshift cluster

    import File 
    	--source-compressed 
    	--source-path "c:\data\gzip\*.gz" 
    	--source-archivemethod None 
    	--target-stage-archivemethod Delete 
    	--target-table "customerdata" 
    	--target-stagepath "bw-rstest/cmdstage" 
    	--target-truncate 
    	--coldelimiter "|"
      --logfile "c:\redshift\log.txt" 
    	--maxrows-perfile 100000 
    	--region us-east-1 
    	--accesskey "AKIA*****************" 
    	--secretkey "lPi+XQ************************"  
    	--target-connstr "Host=mytestcluster-1.csu********.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*********;EnableSsl=true;Timeout=30;CommandTimeout=3600;" 
    Import data from any ODBC data source to Redshift cluster. Use tab for column seperator

    import Db 
    	--source-driver ODBC 
    	--source-query "select ROW_NUMBER()Over(order by a.CustomerID) Id, a.*,b.*,c.OrderID,c.OrderDate,c.Freight  from customers a,products b,orders c" 
    	--source-archivemethod None 
    	--source-stage-archivemethod Delete 
    	--target-stage-archivemethod Delete 
    	--source-stagepath "c:\redshift\stage" 
    	--target-table "customerdata" 
    	--target-stagepath "bw-rstest/cmdstage" 
    	--target-truncate 
    	--coldelimiter "\t"  
    	--logfile "c:\redshift\log.txt" 
    	--maxrows-perfile 100000 
    	--region us-east-1 
    	--accesskey "AKIA*****************" 
    	--secretkey "lPi+XQ************************"   
      ## Approach-1: Below syntax is for Odbc connectivity using DSN approach
      ## --source-connstr "Dsn=MyDsn;Uid=MyUser1;Pwd=MyP@ssword;"  
      ## Approach-2: Below syntax is for Odbc connectivity using DSN less approach. Refer vendor connection string documentation for valid Driver name/Attributes
      --source-connstr "Driver={SQL Server Native Client 10.0};Server=localhost;Database=Northwind;Trusted_Connection=yes;"  
      ## Adjust CommandTime out for long running Load. Default is 1 hr for command timeout and 30 seconds for connect timeout
    	--target-connstr "Host=mytestcluster-1.csu********.us-east-1.redshift.amazonaws.com;Port=5439;Database=dev;UserName=masteruser;Password=*********;EnableSsl=true;Timeout=30;CommandTimeout=3600;"      
          


    www.zappysys.com | Products | All copyrights reserved. ZappySys LLC.