Package

com.actian

spark_vector

Permalink

package spark_vector

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. spark_vector
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. package loader

    Permalink

    Spark-Vector loader

    Spark-Vector loader

    The Spark-Vector loader is a utility that facilitates loading files of different formats (for now CSV,Parquet, and Orc only) into Vector, through Spark and using the Spark-Vector connector.

    For CSV parsing, the Spark-Vector loader uses the csv method of the org.apache.spark.sql.DataFrame class.

    Example:

    This scala code snippet to read a csv file (executed in spark-shell):

    sqlContext.sql("""CREATE TEMPORARY TABLE large_table
    USING com.actian.spark_vector.sql.DefaultSource
    OPTIONS (
     host "vectorhost",
     instance "VI",
     database "dbName",
     table "large_table"
    )""")
    
    sqlContext.sql("""CREATE TEMPORARY TABLE csv_files
    USING csv
    OPTIONS (path "hdfs://namenode:8020/data/csv_file*", header "false", sep "|")
    """)
    
    val results = sqlContext.sql("""insert into table large_table select * from csv_files""")

    is equivalent to

    spark-submit --master spark://spark_master:7077 --class com.actian.spark_vector.loader.Main
     $SPARK_VECTOR/loader/target/spark_vector_loader-assembly-2.0-SNAPSHOT.jar load csv -sf "hdfs://namenode:8020/data/csv_file*"
     -sc "|" -vh vectorhost -vi VI -vd dbName -tt large_table

    To read a parquet file this

    sqlContext.read.parquet("hdfs://namenode:8020/data/parquet_file.parquet").registerTempTable("parquet_file")
    sqlContext.sql("""insert into table large_table select * from parquet_file""")

    is equivalent to

    spark-submit --master spark://spark_master:7077 --class com.actian.spark_vector.loader.Main
     $SPARK_VECTOR/loader/target/spark_vector_loader-assembly-2.0-SNAPSHOT.jar load parquet -sf "hdfs://namenode:8020/data/parquet_file.parquet"
     -vh vectorhost -vi VI -vd dbName -tt large_table

    To read a orc file this

    sqlContext.read.orc("hdfs://namenode:8020/data/orc_file.orc").registerTempTable("orc_file")
    sqlContext.sql("""insert into table large_table select * from orc_file""")

    is equivalent to

    spark-submit --master spark://spark_master:7077 --class com.actian.spark_vector.loader.Main
     $SPARK_VECTOR/loader/target/spark_vector_loader-assembly-2.0-SNAPSHOT.jar load orc -sf "hdfs://namenode:8020/data/orc_file.orc"
     -vh vectorhost -vi VI -vd dbName -tt large_table

    Of course, by using the Spark-Vector connector directly, one can load arbitrarily complex relations (not only files) into Vector and files of any format that Spark is able to read.

    For a complete list of options available, see Args.

    See also

    Args.

Inherited from AnyRef

Inherited from Any

Ungrouped