Having Issue How To Order Streamed Dataframe ?

Knoldus Blogs

A few days ago, i have to perform aggregation on streaming dataframe. And the moment, i apply groupBy for aggregation, data gets shuffled. Now the situation arises how to maintain order?

Yes, i can use orderBy with streaming dataframe using Spark Structured Streaming, but only in complete mode. There is no way of doing ordering of streaming data in append mode and update mode.

I have tried different ways to solve this issue. Like, if i go with spark structured streaming. I might sort the streamed data in batches but not across batches.

I started finding solutions with different technologies like Apache Flink, Apache storm etc. What i faced at the end is disappointment. 😦

A bit of light at the end of the tunnel

Luckily there is Apache Kafka Stream which provides the facility of accessing its StateStore.  Kafka Stream provides Processor API.

The low-level Processor…

View original post 403 more words


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s