With this connector, data can be loaded from Spark sources into Vector and results of Vector computations can be consumed in Spark and transformed into
a DataFrame. Both operations are performed in parallel: the data coming from every input RDD partition is serialized using Vector's binary protocol and passed
through socket connections to Vector end points. Although there is a chance that network communication is incurred at this point, most of the time this connector
will try to assign only local RDD partitions to each Vector end point. Respectively, Vector data is unloaded through socket connections and consumed in Spark
into an RDD.
Throughout the documentation we will use DataStream and Vector end point interchangeably. A Vector DataStream is the logical stream of consuming binary data in
Vector. Typically, these DataStreams will be executed in parallel (i.e. there will be as many threads as DataStreams allocated), but there will be cases when
a Vector thread will handle multiple DataStreams. On the other hand, each connection to a Vector end point maps to exactly one DataStream.
Spark-Vector connector.
With this connector, data can be loaded from
Spark
sources intoVector
and results ofVector
computations can be consumed inSpark
and transformed into aDataFrame
. Both operations are performed in parallel: the data coming from every inputRDD
partition is serialized usingVector's
binary protocol and passed through socket connections toVector
end points. Although there is a chance that network communication is incurred at this point, most of the time this connector will try to assign only localRDD
partitions to eachVector
end point. Respectively,Vector
data is unloaded through socket connections and consumed inSpark
into anRDD
.Throughout the documentation we will use
DataStream
andVector
end point interchangeably. AVector DataStream
is the logical stream of consuming binary data inVector
. Typically, theseDataStream
s will be executed in parallel (i.e. there will be as many threads asDataStreams
allocated), but there will be cases when aVector
thread will handle multipleDataStreams
. On the other hand, each connection to aVector
end point maps to exactly oneDataStream
.