Streaming Twitter in PHP from Scratch

Over the last few months I've been playing around with Twitter's streaming API. I'm still at the tinkering stage, with a grand project in mind that's slowly being imagined. For a while the thought of a steady of stream of information was something that PHP just couldn't handle. Then I bumped into Guzzle and was able to quickly mock up a streaming bot. There were a few caveats with how the internals of Guzzle works, limitations that the latest version (4) addresses. Still, it left me wondering just how hard it would be to build a streaming client from scratch.

It turned out to be really easy. Thanks to my earlier work with Twitter authentication I was able to recycle a bunch of logic. In fact, it opened up a rather interesting possibility - more on that later.

The first thing is to build the OAuth header. This can be done the same way that the REST API authenticates, with a HMAC-SHA1 hashed signature and such. The only real difference between this and REST API authentication is the endpoint. Instead of 'https://api.twitter.com' I need to point at 'https://userstream.twitter.com'. That's it.

  1. // define the configuration

  2. $api_key = API_KEY;

  3. $api_secret = API_SECRET;

  4. $access_token = ACCESS_TOKEN;

  5. $access_token_secret = ACCESS_TOKEN_SECRET;

  6. $timestamp = time();

  7. $nonce = md5(mt_rand()); // this can be any hash you prefer

  8. // first we need to build the signature

  9. // note: custom parameters need to be added in alphabetical order

  10. $oauth_hash_array = [

  11. 'oauth_consumer_key' => $api_key,

  12. 'oauth_nonce' => $nonce,

  13. 'oauth_signature_method' => 'HMAC-SHA1',

  14. 'oauth_timestamp' => $timestamp,

  15. 'oauth_token' => $access_token,

  16. 'oauth_version' => '1.0',

  17. 'replies' => 'all',

  18. ];

  19. $oauth_hash = http_build_query($oauth_hash_array);

  20. $base_array = [

  21. 'GET',

  22. 'https://userstream.twitter.com/1.1/user.json',

  23. $oauth_hash,

  24. ];

  25. $base_array = array_map('rawurlencode', $base_array);

  26. $base = implode('&', $base_array);

  27. $key_array = [

  28. $api_secret,

  29. $access_token_secret,

  30. ];

  31. $key_array = array_map('rawurlencode', $key_array);

  32. $key = implode('&', $key_array);

  33. $signature = base64_encode(hash_hmac('sha1', $base, $key, true));

  34. $signature = rawurlencode($signature);

  35. // next we need to build the oauth header

  36. // note: custom parameters need to be added in alphabetical order

  37. $oauth_header_array = [

  38. 'oauth_consumer_key' => $api_key,

  39. 'oauth_nonce' => $nonce,

  40. 'oauth_signature' => $signature,

  41. 'oauth_signature_method' => 'HMAC-SHA1',

  42. 'oauth_timestamp' => $timestamp,

  43. 'oauth_token' => $access_token,

  44. 'oauth_version' => '1.0',

  45. 'replies' => 'all',

  46. ];

  47. array_walk($oauth_header_array, function(&$value, $key) {

  48. $value = "{$key}=\"{$value}\"";

  49. });

  50. $oauth_header_array = array_values($oauth_header_array);

  51. $oauth_header = implode($oauth_header_array, ',');

All of this is very similar to the authentication building that was explained in my original Twitter REST API post. I've modernized a few things, using cool new bracket syntax and a dedicated nonce value, which is all overshadowed by the blatent procedural nature of the the entire script. Mentioned it before - this whole process is just begging to be put into a class structure. Eh, this is a proof of concept.

Once the header is built its time to make a request. This is where the streaming nature of the script really comes in. There's a lot of ways to handle streams in PHP - this is one of my favorites.

  1. // dandy little closure to handle the incoming request

  2. $tweet_handler = function($curl_handle, $string) {

  3. $length = strlen($string);

  4. echo "Received {$length} bytes\n";

  5. echo "{$string}\n\n";

  6. flush();

  7. return $length;

  8. };

  9. // typical curl request with oauth header and closure call

  10. $curl_handle = curl_init();

  11. curl_setopt($curl_handle, CURLOPT_HTTPHEADER, ["Authorization: OAuth {$oauth_header}"]);

  12. curl_setopt($curl_handle, CURLOPT_HEADER, false);

  13. curl_setopt($curl_handle, CURLOPT_URL, 'https://userstream.twitter.com/1.1/user.json?replies=all');

  14. curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);

  15. curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, false);

  16. curl_setopt($curl_handle, CURLOPT_WRITEFUNCTION, $tweet_handler);

  17. curl_exec($curl_handle);

  18. curl_close($curl_handle);

  19. exit;

Yup, you can use an anonymous function to handle the incoming packets. There are several reasons this is amazing. First, you can do fancy use() logic to pass in extra utilities (database class, trigger to send outgoing requests, parsing, etc) all from right here. Also, once this is broken out into a nice class structure, you can pass in the function as a parameter and never have output from the class (unless you want it) - the class can construct a request, execute, and then just use the passed in closure to deal with the response in its scope.

Also, nothing here is terribly different from the REST logic. The OAuth signature and header is constructed the same. The Curl request is basically the same. And the success function may be used on both the incoming responses from the stream and chunks from the full response from a REST request. There'd have to be some tweaks, as the best way to deal with the REST response is to json_decode it and loop through the expanded structure to pull array/stdclass tweets, while the stream passes json_encoded tweets in, but that's an easy thing to work around.

There were a few frustrations along the way. The streaming API is much more strict than the REST one and does not return detailed errors. For the longest time I was passing in the authorization header as 'Oauth' instead of 'OAuth', a difference that doesn't matter in the REST world, and was getting generic html returns from Twitter. There was another trouble with properly encoding the hash array before hashing it that was no fun to debug, either.

One more point: there doesn't seem to be any way to gain user context in a programmatic way. Ideally I want to use a single account and pull multiple streams from the 'user stream', which gives me access to all mentions from those account. To poll this endpoint I need user context. The easiest way to get this is to just use developer credentials from that account, but that means I'll need 'n' credentials for 'n' accounts. Their search endpoint does not guarantee to have all tweets, so if I want a reliable way to fetch all the mentions to a certain handle, this seems to be the only way.

If there are any questions or thoughts feel free to leave a comment below or reach out to me via twitter (@jpemeric). Enjoy twittering!