Package

com

actian

Permalink

package actian

Visibility
  1. Public
  2. All

Value Members

  1. package spark_vector

    Permalink

    Spark-Vector connector.

    Spark-Vector connector.

    With this connector, data can be loaded from Spark sources into Vector and results of Vector computations can be consumed in Spark and transformed into a DataFrame. Both operations are performed in parallel: the data coming from every input RDD partition is serialized using Vector's binary protocol and passed through socket connections to Vector end points. Although there is a chance that network communication is incurred at this point, most of the time this connector will try to assign only local RDD partitions to each Vector end point. Respectively, Vector data is unloaded through socket connections and consumed in Spark into an RDD.

    Throughout the documentation we will use DataStream and Vector end point interchangeably. A Vector DataStream is the logical stream of consuming binary data in Vector. Typically, these DataStreams will be executed in parallel (i.e. there will be as many threads as DataStreams allocated), but there will be cases when a Vector thread will handle multiple DataStreams. On the other hand, each connection to a Vector end point maps to exactly one DataStream.

Ungrouped