Parsing Twitter Feeds with PHP
The most difficult part of pulling information from Twitter's 1.1 API is the actual request. I've covered how to create the OAuth request in some previous posts: making a basic OAuth request and passing in extra parameters. Once you get the information back, though, what do you do with it?
Twitter will return JSON for their requests, which is easy to parse with PHP and allows for some relatively deep nesting of data. You'll need to decode it first with a simple function call.
$response = curl_exec($curl_request); // from the previous posts
$response = json_decode($response);
Below is a dump of a response after being decoded into an associative array. I limited the response, as each tweet object can take up a lot of room.
array(20) {
[0]=>
object(stdClass)#25 (21) {
["created_at"]=>
string(30) "Sun Feb 24 15:15:50 +0000 2013"
["id"]=>
float(3.0569755089004E+17)
["id_str"]=>
string(18) "305697550890041344"
["text"]=>
string(87) "Think I'm failing at the weekend thing. (@ Blue Door Consulting) http://t.co/JY3XSRGnyT"
["source"]=>
string(61) "foursquare"
["truncated"]=>
bool(false)
["in_reply_to_status_id"]=>
NULL
["in_reply_to_status_id_str"]=>
NULL
["in_reply_to_user_id"]=>
NULL
["in_reply_to_user_id_str"]=>
NULL
["in_reply_to_screen_name"]=>
NULL
["user"]=>
object(stdClass)#24 (39) {
["id"]=>
int(26515074)
["id_str"]=>
string(8) "26515074"
["name"]=>
string(13) "Jacob Emerick"
["screen_name"]=>
string(8) "jpemeric"
["location"]=>
string(12) "Appleton, WI"
["description"]=>
string(132) "I'm a web programmer, developer, hiker, innovator, and a young man. Currently working at Blue Door Consulting as a programming poet."
["url"]=>
string(29) "https://home.jacobemerick.com/"
["entities"]=>
object(stdClass)#23 (2) {
["url"]=>
object(stdClass)#22 (1) {
["urls"]=>
array(1) {
[0]=>
object(stdClass)#21 (3) {
["url"]=>
string(29) "https://home.jacobemerick.com/"
["expanded_url"]=>
NULL
["indices"]=>
array(2) {
[0]=>
int(0)
[1]=>
int(29)
}
}
}
}
["description"]=>
object(stdClass)#20 (1) {
["urls"]=>
array(0) {
}
}
}
["protected"]=>
bool(false)
["followers_count"]=>
int(269)
["friends_count"]=>
int(259)
["listed_count"]=>
int(19)
["created_at"]=>
string(30) "Wed Mar 25 15:06:41 +0000 2009"
["favourites_count"]=>
int(20)
["utc_offset"]=>
int(-21600)
["time_zone"]=>
string(26) "Central Time (US & Canada)"
["geo_enabled"]=>
bool(false)
["verified"]=>
bool(false)
["statuses_count"]=>
int(4368)
["lang"]=>
string(2) "en"
["contributors_enabled"]=>
bool(false)
["is_translator"]=>
bool(false)
["profile_background_color"]=>
string(6) "FAFAFA"
["profile_background_image_url"]=>
string(81) "http://a0.twimg.com/profile_background_images/333242976/huron-mountain-sunset.jpg"
["profile_background_image_url_https"]=>
string(83) "https://si0.twimg.com/profile_background_images/333242976/huron-mountain-sunset.jpg"
["profile_background_tile"]=>
bool(false)
["profile_image_url"]=>
string(78) "http://a0.twimg.com/profile_images/2286764626/xnzz7ejexbfr3p8jwarp_normal.jpeg"
["profile_image_url_https"]=>
string(80) "https://si0.twimg.com/profile_images/2286764626/xnzz7ejexbfr3p8jwarp_normal.jpeg"
["profile_banner_url"]=>
string(57) "https://si0.twimg.com/profile_banners/26515074/1347979073"
["profile_link_color"]=>
string(6) "098F99"
["profile_sidebar_border_color"]=>
string(6) "B5B5B5"
["profile_sidebar_fill_color"]=>
string(6) "D8D8D8"
["profile_text_color"]=>
string(6) "000000"
["profile_use_background_image"]=>
bool(true)
["default_profile"]=>
bool(false)
["default_profile_image"]=>
bool(false)
["following"]=>
bool(false)
["follow_request_sent"]=>
bool(false)
["notifications"]=>
bool(false)
}
["geo"]=>
NULL
["coordinates"]=>
NULL
["place"]=>
NULL
["contributors"]=>
NULL
["retweet_count"]=>
int(0)
["entities"]=>
object(stdClass)#19 (3) {
["hashtags"]=>
array(0) {
}
["urls"]=>
array(1) {
[0]=>
object(stdClass)#18 (4) {
["url"]=>
string(22) "http://t.co/JY3XSRGnyT"
["expanded_url"]=>
string(21) "http://4sq.com/XUtlYf"
["display_url"]=>
string(14) "4sq.com/XUtlYf"
["indices"]=>
array(2) {
[0]=>
int(65)
[1]=>
int(87)
}
}
}
["user_mentions"]=>
array(0) {
}
}
["favorited"]=>
bool(false)
["retweeted"]=>
bool(false)
["possibly_sensitive"]=>
bool(false)
}
[1]=>
object(stdClass)#8 (21) {
["created_at"]=>
string(30) "Sun Feb 24 01:01:37 +0000 2013"
["id"]=>
…
With the return being a standard PHP array it's really easy to loop through the individual entries. Each tweet can be referenced by a standard object call. For example, if you wanted to display the raw content of each tweet with the date it was posted you would do something like this.
foreach($response as $tweet)
{
echo "{$tweet->text} {$tweet->created_at}\n";
}
Enhancing the Tweet
If you just spit out the text property you are missing out on a lot, though. Tweets have a lot of rich entities attached to them, like usernames, urls, hashtags, and media objects. On twitter.com and well-formed twitter clients these entities will be attached by hyperlinks to other resources. You can click on a hashtag within a link to boot up a search for that term and check out a conversation. That's where the entities object comes in.
For each type of entity (user_mentions, hashtags, urls, and media) there is a collection of data to do basic 'search and replace' on the default text field. For example, each 'urls' object contains a 'url' (the before), 'expanded_url' (what Twitter recognizes as the pre-shortened URL), 'display_url' (what Twitter uses as the anchor text for each link), and a pair of indices for the start and end points of the recognized entity. You have everything you need to create a rich, linked tweet and not a flat piece of text.
There are two ways you can enhance each entity. The first obvious method would be a simple str_replace.
foreach($response as $tweet)
{
foreach($tweet->entities->urls as $urls_object)
{
$search = "{$urls_object->url}";
$replace = "<a href=\"{$urls_object->url}\" title=\"{$urls_object->expanded_url}\">{$urls_object->display_url}</a>";
$text = str_replace($search, $replace, $text);
}
}
This works great until you run into more complex cases. If a tweet goes something like "This is a ridiculous but possible #case http://domain.com/use#case", then you're going to have a bad time. The previous chunk of code will replace both instances of '#case' and the may get all messed up (depending on what order you replace the entities). This is why Twitter includes the indices to let you replace defined pieces of the text and not depend on searching. You have to do the replacements in reverse order, though, or else you'll be replacing the wrong chunks of text.
$hashtag_link_pattern = '<a href="http://twitter.com/search?q=%%23%s&src=hash" rel="nofollow" target="_blank">#%s</a>';
$url_link_pattern = '<a href="%s" rel="nofollow" target="_blank" title="%s">%s</a>';
$user_mention_link_pattern = '<a href="http://twitter.com/%s" rel="nofollow" target="_blank" title="%s">@%s</a>';
$media_link_pattern = '<a href="%s" rel="nofollow" target="_blank" title="%s">%s</a>';
foreach($response as $tweet)
{
$text = $tweet->text;
$entity_holder = array();
foreach($tweet->entities->hashtags as $hashtag)
{
$entity = new stdclass();
$entity->start = $hashtag->indices[0];
$entity->end = $hashtag->indices[1];
$entity->length = $hashtag->indices[1] - $hashtag->indices[0];
$entity->replace = sprintf($hashtag_link_pattern, strtolower($hashtag->text), $hashtag->text);
$entity_holder[$entity->start] = $entity;
}
foreach($tweet->entities->urls as $url)
{
$entity = new stdclass();
$entity->start = $url->indices[0];
$entity->end = $url->indices[1];
$entity->length = $url->indices[1] - $url->indices[0];
$entity->replace = sprintf($url_link_pattern, $url->url, $url->expanded_url, $url->display_url);
$entity_holder[$entity->start] = $entity;
}
foreach($tweet->entities->user_mentions as $user_mention)
{
$entity = new stdclass();
$entity->start = $user_mention->indices[0];
$entity->end = $user_mention->indices[1];
$entity->length = $user_mention->indices[1] - $user_mention->indices[0];
$entity->replace = sprintf($user_mention_link_pattern, strtolower($user_mention->screen_name), $user_mention->name, $user_mention->screen_name);
$entity_holder[$entity->start] = $entity;
}
foreach($tweet->entities->media as $media)
{
$entity = new stdclass();
$entity->start = $media->indices[0];
$entity->end = $media->indices[1];
$entity->length = $media->indices[1] - $media->indices[0];
$entity->replace = sprintf($media_link_pattern, $media->url, $media->expanded_url, $media->display_url);
$entity_holder[$entity->start] = $entity;
}
krsort($entity_holder);
foreach($entity_holder as $entity)
{
$text = substr_replace($text, $entity->replace, $entity->start, $entity->length);
}
}
This is a bit verbose but allows a lot of flexibility in terms of formatting and updates. A few functions could make this quite a bit more DRY.
NOTE: if you have any issues with multibyte characters check out this comment for help.
More with Tweet Data
There is a lot more that you can do with the data returned by Twtter. User information, replies (conversations), retweet_count (limited metric of engagement), or even using the source field to display how you're sending in the tweets. One disappointing thing about the data is favorites - right now a tweet list does not contain a favorite count. You can get favorites from user objects or individual tweets, but not a list.
Anyways, I hope this helps out if you're planning on moving forward with Twitter integration. I've done many of these things on my lifestream website. Twitter is definitely one of the richer and easier APIs to deal with for reading once you get past the OAuth, especially compared to some of the nastier Google Data setups. Happy coding!
Comments (18)