Parsing Twitter Feeds with PHP
The most difficult part of pulling information from Twitter's 1.1 API is the actual request. I've covered how to create the OAuth request in some previous posts: making a basic OAuth request and passing in extra parameters. Once you get the information back, though, what do you do with it?
Twitter will return JSON for their requests, which is easy to parse with PHP and allows for some relatively deep nesting of data. You'll need to decode it first with a simple function call.
$response = curl_exec($curl_request); // from the previous posts
$response = json_decode($response);
Below is a dump of a response after being decoded into an associative array. I limited the response, as each tweet object can take up a lot of room.
array(20) {
[0]=>
object(stdClass)#25 (21) {
["created_at"]=>
string(30) "Sun Feb 24 15:15:50 +0000 2013"
["id"]=>
float(3.0569755089004E+17)
["id_str"]=>
string(18) "305697550890041344"
["text"]=>
string(87) "Think I'm failing at the weekend thing. (@ Blue Door Consulting) http://t.co/JY3XSRGnyT"
["source"]=>
string(61) "foursquare"
["truncated"]=>
bool(false)
["in_reply_to_status_id"]=>
NULL
["in_reply_to_status_id_str"]=>
NULL
["in_reply_to_user_id"]=>
NULL
["in_reply_to_user_id_str"]=>
NULL
["in_reply_to_screen_name"]=>
NULL
["user"]=>
object(stdClass)#24 (39) {
["id"]=>
int(26515074)
["id_str"]=>
string(8) "26515074"
["name"]=>
string(13) "Jacob Emerick"
["screen_name"]=>
string(8) "jpemeric"
["location"]=>
string(12) "Appleton, WI"
["description"]=>
string(132) "I'm a web programmer, developer, hiker, innovator, and a young man. Currently working at Blue Door Consulting as a programming poet."
["url"]=>
string(29) "https://home.jacobemerick.com/"
["entities"]=>
object(stdClass)#23 (2) {
["url"]=>
object(stdClass)#22 (1) {
["urls"]=>
array(1) {
[0]=>
object(stdClass)#21 (3) {
["url"]=>
string(29) "https://home.jacobemerick.com/"
["expanded_url"]=>
NULL
["indices"]=>
array(2) {
[0]=>
int(0)
[1]=>
int(29)
}
}
}
}
["description"]=>
object(stdClass)#20 (1) {
["urls"]=>
array(0) {
}
}
}
["protected"]=>
bool(false)
["followers_count"]=>
int(269)
["friends_count"]=>
int(259)
["listed_count"]=>
int(19)
["created_at"]=>
string(30) "Wed Mar 25 15:06:41 +0000 2009"
["favourites_count"]=>
int(20)
["utc_offset"]=>
int(-21600)
["time_zone"]=>
string(26) "Central Time (US & Canada)"
["geo_enabled"]=>
bool(false)
["verified"]=>
bool(false)
["statuses_count"]=>
int(4368)
["lang"]=>
string(2) "en"
["contributors_enabled"]=>
bool(false)
["is_translator"]=>
bool(false)
["profile_background_color"]=>
string(6) "FAFAFA"
["profile_background_image_url"]=>
string(81) "http://a0.twimg.com/profile_background_images/333242976/huron-mountain-sunset.jpg"
["profile_background_image_url_https"]=>
string(83) "https://si0.twimg.com/profile_background_images/333242976/huron-mountain-sunset.jpg"
["profile_background_tile"]=>
bool(false)
["profile_image_url"]=>
string(78) "http://a0.twimg.com/profile_images/2286764626/xnzz7ejexbfr3p8jwarp_normal.jpeg"
["profile_image_url_https"]=>
string(80) "https://si0.twimg.com/profile_images/2286764626/xnzz7ejexbfr3p8jwarp_normal.jpeg"
["profile_banner_url"]=>
string(57) "https://si0.twimg.com/profile_banners/26515074/1347979073"
["profile_link_color"]=>
string(6) "098F99"
["profile_sidebar_border_color"]=>
string(6) "B5B5B5"
["profile_sidebar_fill_color"]=>
string(6) "D8D8D8"
["profile_text_color"]=>
string(6) "000000"
["profile_use_background_image"]=>
bool(true)
["default_profile"]=>
bool(false)
["default_profile_image"]=>
bool(false)
["following"]=>
bool(false)
["follow_request_sent"]=>
bool(false)
["notifications"]=>
bool(false)
}
["geo"]=>
NULL
["coordinates"]=>
NULL
["place"]=>
NULL
["contributors"]=>
NULL
["retweet_count"]=>
int(0)
["entities"]=>
object(stdClass)#19 (3) {
["hashtags"]=>
array(0) {
}
["urls"]=>
array(1) {
[0]=>
object(stdClass)#18 (4) {
["url"]=>
string(22) "http://t.co/JY3XSRGnyT"
["expanded_url"]=>
string(21) "http://4sq.com/XUtlYf"
["display_url"]=>
string(14) "4sq.com/XUtlYf"
["indices"]=>
array(2) {
[0]=>
int(65)
[1]=>
int(87)
}
}
}
["user_mentions"]=>
array(0) {
}
}
["favorited"]=>
bool(false)
["retweeted"]=>
bool(false)
["possibly_sensitive"]=>
bool(false)
}
[1]=>
object(stdClass)#8 (21) {
["created_at"]=>
string(30) "Sun Feb 24 01:01:37 +0000 2013"
["id"]=>
…
With the return being a standard PHP array it's really easy to loop through the individual entries. Each tweet can be referenced by a standard object call. For example, if you wanted to display the raw content of each tweet with the date it was posted you would do something like this.
foreach($response as $tweet)
{
echo "{$tweet->text} {$tweet->created_at}\n";
}
Enhancing the Tweet
If you just spit out the text property you are missing out on a lot, though. Tweets have a lot of rich entities attached to them, like usernames, urls, hashtags, and media objects. On twitter.com and well-formed twitter clients these entities will be attached by hyperlinks to other resources. You can click on a hashtag within a link to boot up a search for that term and check out a conversation. That's where the entities object comes in.
For each type of entity (user_mentions, hashtags, urls, and media) there is a collection of data to do basic 'search and replace' on the default text field. For example, each 'urls' object contains a 'url' (the before), 'expanded_url' (what Twitter recognizes as the pre-shortened URL), 'display_url' (what Twitter uses as the anchor text for each link), and a pair of indices for the start and end points of the recognized entity. You have everything you need to create a rich, linked tweet and not a flat piece of text.
There are two ways you can enhance each entity. The first obvious method would be a simple str_replace.
foreach($response as $tweet)
{
foreach($tweet->entities->urls as $urls_object)
{
$search = "{$urls_object->url}";
$replace = "<a href=\"{$urls_object->url}\" title=\"{$urls_object->expanded_url}\">{$urls_object->display_url}</a>";
$text = str_replace($search, $replace, $text);
}
}
This works great until you run into more complex cases. If a tweet goes something like "This is a ridiculous but possible #case http://domain.com/use#case", then you're going to have a bad time. The previous chunk of code will replace both instances of '#case' and the may get all messed up (depending on what order you replace the entities). This is why Twitter includes the indices to let you replace defined pieces of the text and not depend on searching. You have to do the replacements in reverse order, though, or else you'll be replacing the wrong chunks of text.
$hashtag_link_pattern = '<a href="http://twitter.com/search?q=%%23%s&src=hash" rel="nofollow" target="_blank">#%s</a>';
$url_link_pattern = '<a href="%s" rel="nofollow" target="_blank" title="%s">%s</a>';
$user_mention_link_pattern = '<a href="http://twitter.com/%s" rel="nofollow" target="_blank" title="%s">@%s</a>';
$media_link_pattern = '<a href="%s" rel="nofollow" target="_blank" title="%s">%s</a>';
foreach($response as $tweet)
{
$text = $tweet->text;
$entity_holder = array();
foreach($tweet->entities->hashtags as $hashtag)
{
$entity = new stdclass();
$entity->start = $hashtag->indices[0];
$entity->end = $hashtag->indices[1];
$entity->length = $hashtag->indices[1] - $hashtag->indices[0];
$entity->replace = sprintf($hashtag_link_pattern, strtolower($hashtag->text), $hashtag->text);
$entity_holder[$entity->start] = $entity;
}
foreach($tweet->entities->urls as $url)
{
$entity = new stdclass();
$entity->start = $url->indices[0];
$entity->end = $url->indices[1];
$entity->length = $url->indices[1] - $url->indices[0];
$entity->replace = sprintf($url_link_pattern, $url->url, $url->expanded_url, $url->display_url);
$entity_holder[$entity->start] = $entity;
}
foreach($tweet->entities->user_mentions as $user_mention)
{
$entity = new stdclass();
$entity->start = $user_mention->indices[0];
$entity->end = $user_mention->indices[1];
$entity->length = $user_mention->indices[1] - $user_mention->indices[0];
$entity->replace = sprintf($user_mention_link_pattern, strtolower($user_mention->screen_name), $user_mention->name, $user_mention->screen_name);
$entity_holder[$entity->start] = $entity;
}
foreach($tweet->entities->media as $media)
{
$entity = new stdclass();
$entity->start = $media->indices[0];
$entity->end = $media->indices[1];
$entity->length = $media->indices[1] - $media->indices[0];
$entity->replace = sprintf($media_link_pattern, $media->url, $media->expanded_url, $media->display_url);
$entity_holder[$entity->start] = $entity;
}
krsort($entity_holder);
foreach($entity_holder as $entity)
{
$text = substr_replace($text, $entity->replace, $entity->start, $entity->length);
}
}
This is a bit verbose but allows a lot of flexibility in terms of formatting and updates. A few functions could make this quite a bit more DRY.
NOTE: if you have any issues with multibyte characters check out this comment for help.
More with Tweet Data
There is a lot more that you can do with the data returned by Twtter. User information, replies (conversations), retweet_count (limited metric of engagement), or even using the source field to display how you're sending in the tweets. One disappointing thing about the data is favorites - right now a tweet list does not contain a favorite count. You can get favorites from user objects or individual tweets, but not a list.
Anyways, I hope this helps out if you're planning on moving forward with Twitter integration. I've done many of these things on my lifestream website. Twitter is definitely one of the richer and easier APIs to deal with for reading once you get past the OAuth, especially compared to some of the nastier Google Data setups. Happy coding!
-
Jacob Emerick
Aug 5, '13
Hi Gert-Jan,Great to see you on here! No, I don't have any problem with you using the code - thank you for asking first. If you want to throw a comment or something in the theme mentioning me/this blog I wouldn't mind ;)On the side - one of my colleagues mentioned something about wordpress having this functionality built in and that you can extend some of the default functionality. I don't have a lot of background in this area so I'm not sure if this is applicable. Anyways, his handle is @dave_kz and his site is davekz.com if this sounds helpful.Anyways, thanks again for asking first!
-
Gert-Jan
Aug 6, '13
Great stuff, it works like a charm :) I can add you to the contributor list (with link) on our about-us contributor section if its OK to also put your photo there. I could for example use your Twitter photo? Let me know where the link should go and I will add it.
-
Jacob Emerick
Aug 6, '13
Awesome - that sounds great. And yeah, the Twitter photo will work. A link back to this blog's home page [ blog.jacobemerick.com ] would be cool. Thanks again, sir!
-
Daniel Crabbe
May 2, '18
is it possible to get just one tweet by url or id?
-
Jacob P Emerick
May 2, '18
Hi Daniel, Yup! You can get a single tweet by id from this endpoint... https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-show-id
Add to this discussion-
Jacob Emerick
Sep 9, '13
Thanks for adding this, Alexander! It is somewhat confusing that Twitter is passing the index information assuming multi-byte support and PHP has a hard time working around that...
Add to this discussion-
Jacob Emerick
Oct 5, '13
Hi Nelson,This script actually doesn't change the tweet object - the loop you have up there goes through and prepares another loop (at the bottom of my script in the post) to do the replacements. Are you using that loop (56 - 60) too?
-
santosh
Nov 6, '13
tweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! http://t.co/dZ7hZq4IxI pic.twitter.com/slK0IfrPMNtweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! tigerair.com/news/TH_201311… pic.twitter.com/slK0IfrPMNtweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! tigerair.com/news/TH_201311… pic.twitter.com/slK0IfrPMNtweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! tigerair.com/news/TH_201311… pic.twitter.com/slK0IfrPMNWhile using the cvode I am getting four tweets I need only the last out of these four ur help is appreciated
-
Jacob Emerick
Nov 11, '13
Hi Santosh,If you're getting repeats I think the problem could be in one of two places.1 - The feed is given you inactive tweets, maybe deleted or fail-to-post or something, and you should check for a 'status' field or something.2 - Your code is creating duplicates.If the problem is #2 than you could either do a simple array_pop to pull the last one or fix the code... Does that help? Sorry for the delay, work is been hectic the last few weeks!
Add to this discussion-
Jacob Emerick
Nov 11, '13
Glad to hear it helped, Sergey! All work and no play makes dull boys (sorry, silly American expression) :P
Add to this discussion-
Simon
Mar 11, '14
Thanks Chris, I was tearing my hair out about those indices seeming to be in the wrong place!
-
Jacob Emerick
Mar 12, '14
Yeah, thanks Chris! I linked to your comment from the code, hopefully if others have similar issues they'll be able to use this function.
Add to this discussion