With this connector, data can be loaded from Spark sources into Vector and results of Vector computations can be consumed in Spark and transformed into
a DataFrame. Both operations are performed in parallel: the data coming from every input RDD partition is serialized using Vector's binary protocol and passed
through socket connections to Vector end points. Although there is a chance that network communication is incurred at this point, most of the time this connector
will try to assign only local RDD partitions to each Vector end point. Respectively, Vector data is unloaded through socket connections and consumed in Spark
into an RDD.
Throughout the documentation we will use DataStream and Vector end point interchangeably. A Vector DataStream is the logical stream of consuming binary data in
Vector. Typically, these DataStreams will be executed in parallel (i.e. there will be as many threads as DataStreams allocated), but there will be cases when
a Vector thread will handle multiple DataStreams. On the other hand, each connection to a Vector end point maps to exactly one DataStream.
Spark-Vector connector.
With this connector, data can be loaded from
Sparksources intoVectorand results ofVectorcomputations can be consumed inSparkand transformed into aDataFrame. Both operations are performed in parallel: the data coming from every inputRDDpartition is serialized usingVector'sbinary protocol and passed through socket connections toVectorend points. Although there is a chance that network communication is incurred at this point, most of the time this connector will try to assign only localRDDpartitions to eachVectorend point. Respectively,Vectordata is unloaded through socket connections and consumed inSparkinto anRDD.Throughout the documentation we will use
DataStreamandVectorend point interchangeably. AVector DataStreamis the logical stream of consuming binary data inVector. Typically, theseDataStreams will be executed in parallel (i.e. there will be as many threads asDataStreamsallocated), but there will be cases when aVectorthread will handle multipleDataStreams. On the other hand, each connection to aVectorend point maps to exactly oneDataStream.