A Streaming Twitter Bot with Guzzle

There's been something bugging me about interacting with Twitter through its API. Sure, it's easy enough to pull data for records, even if you're building your own OAuth request by hand. Its not too hard to take that logic a step further and post tweets on a schedule either, something that thousands (if not tens of thousands) of 'bot' authors have already done. But what about realtime responses? Like, creating a script that would respond immediately to triggers, not waiting for periodic check?

Twitter has an endpoint for that. Their streaming API does not have the wide variety of options that the REST API has, mostly limited to searches, users, and groups of users. Which is enough for a simple fetching bot.

The first step is the most difficult - authenticating, connecting, and streaming activities from Twitter. I decided to use a library this time around instead of attempting to write everything from scratch (again), hoping that there would be some built-in goodies for detecting events. I went with Guzzle, hoping that it would be able to handle Twitter's stream. It did - well, it kinda did.

  1. require_once __DIR__ . '/vendor/autoload.php';

  2. // yes, i used composer for great autoloading

  3. use Guzzle\Http\Client;

  4. use Guzzle\Plugin\Oauth\OauthPlugin;

  5. use Guzzle\Stream\PhpStreamRequestFactory;

  6. // use 'normal' oauth creds to connect to streaming API via the plugin

  7. $streaming_client = new Client('https://userstream.twitter.com/1.1');

  8. $oauth = new OauthPlugin(array(

  9. 'consumer_key' => 'KEY',

  10. 'consumer_secret' => 'SECRET',

  11. 'token' => 'TOKEN',

  12. 'token_secret' => 'TOKEN_SECRET',

  13. ));

  14. $streaming_client->addSubscriber($oauth);

  15. // an endpoint to pull all tweets relevant to the user in scope

  16. $request = $streaming_client->get('user.json');

Using Guzzle handled all of the special CURL OAuth logic. Which was pretty nice compared to how much tinkering around it took to get my previous connection up and running. With just a handful of lines I now have the request configured, authentication set, and am ready to open up a stream with Twitter.

  1. // open up stream based on the request

  2. $factory = new PhpStreamRequestFactory();

  3. $stream = $factory->fromRequest($request);

  4. // loop through the stream and do stuff

  5. while (!$stream->feof()) {

  6. $line = $stream->readLine();

  7. $message = json_decode($line, true);

  8. }

This is where things start to get hairy. The issue comes in with Twitter moving away from HTTP/1.0 for their streaming responses. HTTP/1.1 allows them to do chunked transfers, something that most PHP functions do not like. There are a few ways around chunked transfers, using sockets and such, that Guzzle doesn't like. I didn't want to dive into the core of the Guzzle library and gut things out. That's not a great option when working with dependencies.

The heart of the issue, I think, involves getting the stream able to detect the end of one line and the beginning of the next - which, for this response, will be status updates and events. The above script would hang on $stream->readLine(), never moving forward, since a line was never read. I got around this by limiting how much I pulled, effectively chunking the chunked response, and manually looking for the line endings. It's not pretty but it worked.

  1. // loop through the stream and do stuff

  2. $line = '';

  3. while (!$stream->feof()) {

  4. $line .= $stream->readLine(512); // only read 512 characters at a time

  5. // look for a line break - if there is one, manually break it and detect the message

  6. while (strstr($line, "\r\n") !== false) {

  7. list($message, $line) = explode("\r\n", $line, 2);

  8. $message = json_decode($line, true);

  9. }

  10. }

A while loop inside of a while loop? Well, I guess there is a first time for everything.

Now that we have well-formed objects inside of that loop it is trivial to detect what the message is and if we want to respond to it. In this example we'll just respond with 'hi' to every individual who tweets at a bot.

  1. // loop through the stream and do stuff

  2. $line = '';

  3. while (!$stream->feof()) {

  4. $line .= $stream->readLine(512);

  5. // look for a line break - if there is one, manually break it and detect the message

  6. while (strstr($line, "\r\n") !== false) {

  7. list($message, $line) = explode("\r\n", $line, 2);

  8. $message = json_decode($message, true);

  9. if (isset($message['in_reply_to_screen_name']) && $message['in_reply_to_screen_name'] === 'SCREEN_NAME') {

  10. $response_client = new Client('https://api.twitter.com/1.1');

  11. $response_client->addSubscriber($oauth);

  12. $return_message = ".@{$message['user']['screen_name']} hi!";

  13. $response_tweet = $response_client->post('statuses/update.json', array(), array(

  14. 'status' => $return_message,

  15. 'in_reply_to_status_id' => $message['id_str'],

  16. ));

  17. $response_tweet->send();

  18. }

  19. }

  20. }

The only thing to point out with this script is the actual status reply, the prepending of a '.' - that is thanks to a limitation with Guzzle. One of the features of this library is auto-detection of post values that began with a '@' symbol and assuming that the following string is a file. I believe that this is some legacy thing with CURL. This doesn't work well with Twitter replies.

And that does it! While running this script it will open up a connection with the streaming API, wait for people to directly tweet at an account (reply to screen name), and reply with a simple 'hi!' back. For a more complex example you can check out this bot (@fetchmeaphoto) that fetches Bigstock results via Twitter (Bigstock Twitter Bot). It's very basic and pretty messy but it works (when I have the process running, that is).

This works for a basic realtime bot. Yet it's far from perfect. The stream may fail, data may get lost, duplicate statuses may be sent, and having the fetch, detection, and response all in one location is just asking for failures. I'm planning on creating a nice realtime bot boilerplate in the near future that can have multiple streams open (for redundancy), backup REST requests to verify, and a queueing system for outgoing tweets. But this works for now.