Package

com.actian.spark_vector

datastream

Permalink

package datastream

Spark -> Vector loading part

Loading from Spark to Vector will be initiated through the SparkSQL/DataFrames APIs. The sequence of operations is:

Unloading from Vector to Spark will be initiated through the SparkSQL/DataFrames APIs. The sequence of operations is:

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. datastream
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class DataStreamClient(vectorProps: VectorConnectionProperties, table: String) extends Serializable with Logging with Product

    Permalink

    A client to prepare loading and issue the load SQL query to Vector

    A client to prepare loading and issue the load SQL query to Vector

    vectorProps

    connection information

    table

    to which table this client will load data

    Note

    This client opens a JDBC connection when instantiated. To prevent leaks, the close method must be called

  2. case class DataStreamPartition(index: Int, rdd: RDD[_], parentIndices: Seq[Int]) extends Partition with Product with Serializable

    Permalink

    A partition corresponding to one DataStream

    A partition corresponding to one DataStream

    parentIndices

    a list of indices of the parent partitions that will be assigned to this DataStream and will subsequently be serialized and sent through a single connection

  3. final class DataStreamPartitionAssignment extends BipartiteAssignment

    Permalink

    Class that contains the matching algorithm used to assign RDD partitions to Vector hosts, based on affinities.

    Class that contains the matching algorithm used to assign RDD partitions to Vector hosts, based on affinities.

    The algorithm used here tries to assign partitions to hosts for which they have affinity. For this reason only partitions that have affinity to at least one host are matched here, the others are assigned to a random node. Also, this algorithm aims to minimize the maximum number of partitions that a host will have assigned, i.e. the most data a host will process

  4. case class VectorEndpoint(host: String, port: Int, username: String, password: String) extends Serializable with Product

    Permalink

    Information to connect to a VectorEndpoint (DataStream)

  5. case class VectorEndpointConf(vectorEndpoints: IndexedSeq[VectorEndpoint]) extends Serializable with Product

    Permalink

    Configuration for read/write end points - one entry for each Vector end point expecting data

Value Members

  1. object DataStreamPartitionAssignment

    Permalink
  2. object VectorEndpoint extends Logging with Serializable

    Permalink

    Contains helpers to obtain VectorEndpoint information from Vector's SQL interface.

    Contains helpers to obtain VectorEndpoint information from Vector's SQL interface.

    Note

    The way this information is obtained, by issuing a select from a system table, will very likely be modified in the future

  3. object VectorEndpointConf extends Serializable

    Permalink
  4. def padding(pos: Int, typeSize: Int): Int

    Permalink

    Helper to determine how much padding (# of trash bytes) needs to be written to properly align a type with size typeSize, given that we are currently at pos

  5. package reader

    Permalink
  6. package writer

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped