Playing with Akka Streams and Twitter
- Disclaimer : Since Play 2.5/Akka Streams 2, things shown in this post can be achieved more easily, for example without relying directly on the actor API. This this post for more information.
You may have heard of Reactive-Streams, a specification inspired by the Reactive Manifesto.
A lot of big actors are involved in this initiative, like Typesafe, Netflix, Pivotal, RedHat and Twitter.
Its goal is to “provide a standard for asynchronous stream processing with non-blocking back pressure”.
In a few words, back pressure is the ability for a data producer to adapt the data transmission rate depending on the speed of the consumer, to avoid overwhelming it.
This article will focus on Akka Streams, which is an implementation of Reactive-Streams that relies on Akka actors and provides an higher level API on top of them.
Play 2.5 will rely on Akka-Streams under the hood, and it will also allow to benefit from it to to handle data streams reactively, like we are doing today with Play 2 and the Iteratee API.
Akka Streams is still tagged as “experimental”, but the 1.0 version is close. With Play 2.4, it’s already possible to use it, via Reactive-Streams to Iteratee and Iteratee to Reactive-Streams conversions.
MixTweets, updated
In a previous post, we’ve seen how to handle streams of Twitter messages reactively, how to merge several Twitter searches in a unique stream, transform them into JSON and push this into an EventSource (Server Sent Event) output with the Iteratee API.
Now, we will do the same with Akka Streams and compare this two solutions.
How to define a source
In Akka Streams, data producers are called sources. It’s the equivalent of the Publisher
interface in the Reactive-Streams specification.
We are using twitter4j and Twitter Streaming API to receive messages. For each new message, we need to push data into a source.
To achieve this, we need a reference to an actor implementing the ActorPublisher
trait. We also need to bind the source to this actor.
Then, we can push some data into the actor and retrieve this data from the source.
The first line creates an actor reference and a publisher. Now we can keep references to the actor and to the source. When we send elements to the actor (with the !
method), it will dynamically add elements in the source. We must choose a size for the actor message queue (1000) and an OverflowStrategy
to handle cases where this size is exceeded.
Note that we need to materialize the source before being able to use the actor behind it.
The source is materialized into a publisher, and then we need to redefine a source from this publisher.
This trick helps us to solve a problem that is specific to our use case : we don’t want to consume the source right now because we must keep it for later, and merge it with other sources. Fortunately, this kind of use case should be simplified in future versions of Akka streams (an issue is opened in Github to address this).
In simpler cases, you can just declare a Source
, a Flow
(i.e. a bridge that takes an input a produces an output) and define a way to consume the data in the same time. Using directly a Sink
, the source will be materialized so you can get a reference to an actor.
Here is an example that just prints messages matching a pattern :
If you need more customization, it’s also possible to create a custom ActorPublisher
to do this.
In comparison, with Iteratee we would do things like that :
The (enum, channel)
tuple is quite similar to (actorRef, publisher)
.
We would also push new elements into the channel to feed the enumerator (i.e the data producer).
How to transform and merge sources
As we’re doing several Twitter searches, we have in result several sources. We will use a merge to have a single stream, containing all messages, so we can consume them more easily.
Note : At this stage, we could consume messages directly using a sink (an Akka Stream data consumer), for example source.runWith(Sink.foreach(println))
In comparison, with Iteratee would transform and merge streams like that :
The advantage of Akka Streams here is that we don’t need to use an intermediate structure like Enumeratee, we can just use a good old map
function.
Right now, the merge is really simpler with Iteratee. But there is already an issue opened in github (yes, another one) to be able to merge a stream of streams in a simple line, like streams.flatten(FlattenStrategy.concat)
.
Push to EventSource
To be compatible with Play controllers, we still need to use Enumerator and Iteratee objects.
Then we can use conversions included in Play 2.4 :
That’all!
You can see all the code (including details of the sourceToEnumerator
method) in this github project.
You can create an application account to configure the Twitter API client on the Twitter apps page.
Then open your browser on http://localhost:9000/liveTweets?query=java&query=ruby
(for example) to see the stream live!
In conclusion, I’ve tried to note a few good/bad points for each library :
Akka-stream
++ Very flexible, while the API is quite high level, we can easily go to a lower level with actors for more specific needs
++ It’s easy to defines flow and transformations on sources (need to use an Enumerattee to transform an input (Enumerator) in Iteratee API)
– A little more verbose / needs more boilerplate code for some basic stuffs. But Akka Streams is very young and will continue to provide new features quickly
Iteratee
++ Very good for this use case (broadcast and channel are very handy, merge is very easy) – Low level / complex cases can be more difficult
Both are very good stream processing libraries, with high level API and good capabilities to handle back pressure.
Note that I’m just beginning with Akka Streams, if you find better ideas to achieve this, please tell me :)