Parsing Twitter Feeds with PHP

Category Web Development
Posted February 24, 2013
by Jacob Emerick

The most difficult part of pulling information from Twitter's 1.1 API is the actual request. I've covered how to create the OAuth request in some previous posts: making a basic OAuth request and passing in extra parameters. Once you get the information back, though, what do you do with it?

Twitter will return JSON for their requests, which is easy to parse with PHP and allows for some relatively deep nesting of data. You'll need to decode it first with a simple function call.


$response = curl_exec($curl_request); // from the previous posts
$response = json_decode($response);

Below is a dump of a response after being decoded into an associative array. I limited the response, as each tweet object can take up a lot of room.


array(20) {
  [0]=>
  object(stdClass)#25 (21) {
    ["created_at"]=>
    string(30) "Sun Feb 24 15:15:50 +0000 2013"
    ["id"]=>
    float(3.0569755089004E+17)
    ["id_str"]=>
    string(18) "305697550890041344"
    ["text"]=>
    string(87) "Think I'm failing at the weekend thing. (@ Blue Door Consulting) http://t.co/JY3XSRGnyT"
    ["source"]=>
    string(61) "foursquare"
    ["truncated"]=>
    bool(false)
    ["in_reply_to_status_id"]=>
    NULL
    ["in_reply_to_status_id_str"]=>
    NULL
    ["in_reply_to_user_id"]=>
    NULL
    ["in_reply_to_user_id_str"]=>
    NULL
    ["in_reply_to_screen_name"]=>
    NULL
    ["user"]=>
    object(stdClass)#24 (39) {
      ["id"]=>
      int(26515074)
      ["id_str"]=>
      string(8) "26515074"
      ["name"]=>
      string(13) "Jacob Emerick"
      ["screen_name"]=>
      string(8) "jpemeric"
      ["location"]=>
      string(12) "Appleton, WI"
      ["description"]=>
      string(132) "I'm a web programmer, developer, hiker, innovator, and a young man. Currently working at Blue Door Consulting as a programming poet."
      ["url"]=>
      string(29) "https://home.jacobemerick.com/"
      ["entities"]=>
      object(stdClass)#23 (2) {
        ["url"]=>
        object(stdClass)#22 (1) {
          ["urls"]=>
          array(1) {
            [0]=>
            object(stdClass)#21 (3) {
              ["url"]=>
              string(29) "https://home.jacobemerick.com/"
              ["expanded_url"]=>
              NULL
              ["indices"]=>
              array(2) {
                [0]=>
                int(0)
                [1]=>
                int(29)
              }
            }
          }
        }
        ["description"]=>
        object(stdClass)#20 (1) {
          ["urls"]=>
          array(0) {
          }
        }
      }
      ["protected"]=>
      bool(false)
      ["followers_count"]=>
      int(269)
      ["friends_count"]=>
      int(259)
      ["listed_count"]=>
      int(19)
      ["created_at"]=>
      string(30) "Wed Mar 25 15:06:41 +0000 2009"
      ["favourites_count"]=>
      int(20)
      ["utc_offset"]=>
      int(-21600)
      ["time_zone"]=>
      string(26) "Central Time (US & Canada)"
      ["geo_enabled"]=>
      bool(false)
      ["verified"]=>
      bool(false)
      ["statuses_count"]=>
      int(4368)
      ["lang"]=>
      string(2) "en"
      ["contributors_enabled"]=>
      bool(false)
      ["is_translator"]=>
      bool(false)
      ["profile_background_color"]=>
      string(6) "FAFAFA"
      ["profile_background_image_url"]=>
      string(81) "http://a0.twimg.com/profile_background_images/333242976/huron-mountain-sunset.jpg"
      ["profile_background_image_url_https"]=>
      string(83) "https://si0.twimg.com/profile_background_images/333242976/huron-mountain-sunset.jpg"
      ["profile_background_tile"]=>
      bool(false)
      ["profile_image_url"]=>
      string(78) "http://a0.twimg.com/profile_images/2286764626/xnzz7ejexbfr3p8jwarp_normal.jpeg"
      ["profile_image_url_https"]=>
      string(80) "https://si0.twimg.com/profile_images/2286764626/xnzz7ejexbfr3p8jwarp_normal.jpeg"
      ["profile_banner_url"]=>
      string(57) "https://si0.twimg.com/profile_banners/26515074/1347979073"
      ["profile_link_color"]=>
      string(6) "098F99"
      ["profile_sidebar_border_color"]=>
      string(6) "B5B5B5"
      ["profile_sidebar_fill_color"]=>
      string(6) "D8D8D8"
      ["profile_text_color"]=>
      string(6) "000000"
      ["profile_use_background_image"]=>
      bool(true)
      ["default_profile"]=>
      bool(false)
      ["default_profile_image"]=>
      bool(false)
      ["following"]=>
      bool(false)
      ["follow_request_sent"]=>
      bool(false)
      ["notifications"]=>
      bool(false)
    }
    ["geo"]=>
    NULL
    ["coordinates"]=>
    NULL
    ["place"]=>
    NULL
    ["contributors"]=>
    NULL
    ["retweet_count"]=>
    int(0)
    ["entities"]=>
    object(stdClass)#19 (3) {
      ["hashtags"]=>
      array(0) {
      }
      ["urls"]=>
      array(1) {
        [0]=>
        object(stdClass)#18 (4) {
          ["url"]=>
          string(22) "http://t.co/JY3XSRGnyT"
          ["expanded_url"]=>
          string(21) "http://4sq.com/XUtlYf"
          ["display_url"]=>
          string(14) "4sq.com/XUtlYf"
          ["indices"]=>
          array(2) {
            [0]=>
            int(65)
            [1]=>
            int(87)
          }
        }
      }
      ["user_mentions"]=>
      array(0) {
      }
    }
    ["favorited"]=>
    bool(false)
    ["retweeted"]=>
    bool(false)
    ["possibly_sensitive"]=>
    bool(false)
  }
  [1]=>
  object(stdClass)#8 (21) {
    ["created_at"]=>
    string(30) "Sun Feb 24 01:01:37 +0000 2013"
    ["id"]=>
…

With the return being a standard PHP array it's really easy to loop through the individual entries. Each tweet can be referenced by a standard object call. For example, if you wanted to display the raw content of each tweet with the date it was posted you would do something like this.


foreach($response as $tweet)
{
  echo "{$tweet->text} {$tweet->created_at}\n";
}

Enhancing the Tweet

If you just spit out the text property you are missing out on a lot, though. Tweets have a lot of rich entities attached to them, like usernames, urls, hashtags, and media objects. On twitter.com and well-formed twitter clients these entities will be attached by hyperlinks to other resources. You can click on a hashtag within a link to boot up a search for that term and check out a conversation. That's where the entities object comes in.

For each type of entity (user_mentions, hashtags, urls, and media) there is a collection of data to do basic 'search and replace' on the default text field. For example, each 'urls' object contains a 'url' (the before), 'expanded_url' (what Twitter recognizes as the pre-shortened URL), 'display_url' (what Twitter uses as the anchor text for each link), and a pair of indices for the start and end points of the recognized entity. You have everything you need to create a rich, linked tweet and not a flat piece of text.

There are two ways you can enhance each entity. The first obvious method would be a simple str_replace.


foreach($response as $tweet)
{
  foreach($tweet->entities->urls as $urls_object)
  {
    $search = "{$urls_object->url}";
    $replace = "<a href=\"{$urls_object->url}\" title=\"{$urls_object->expanded_url}\">{$urls_object->display_url}</a>";
    $text = str_replace($search, $replace, $text);
  }
}

This works great until you run into more complex cases. If a tweet goes something like "This is a ridiculous but possible #case http://domain.com/use#case", then you're going to have a bad time. The previous chunk of code will replace both instances of '#case' and the may get all messed up (depending on what order you replace the entities). This is why Twitter includes the indices to let you replace defined pieces of the text and not depend on searching. You have to do the replacements in reverse order, though, or else you'll be replacing the wrong chunks of text.


$hashtag_link_pattern = '<a href="http://twitter.com/search?q=%%23%s&src=hash" rel="nofollow" target="_blank">#%s</a>';
$url_link_pattern = '<a href="%s" rel="nofollow" target="_blank" title="%s">%s</a>';
$user_mention_link_pattern = '<a href="http://twitter.com/%s" rel="nofollow" target="_blank" title="%s">@%s</a>';
$media_link_pattern = '<a href="%s" rel="nofollow" target="_blank" title="%s">%s</a>';

foreach($response as $tweet)
{
  $text = $tweet->text;
  $entity_holder = array();
  
  foreach($tweet->entities->hashtags as $hashtag)
  {
    $entity = new stdclass();
    $entity->start = $hashtag->indices[0];
    $entity->end = $hashtag->indices[1];
    $entity->length = $hashtag->indices[1] - $hashtag->indices[0];
    $entity->replace = sprintf($hashtag_link_pattern, strtolower($hashtag->text), $hashtag->text);
    
    $entity_holder[$entity->start] = $entity;
  }
  
  foreach($tweet->entities->urls as $url)
  {
    $entity = new stdclass();
    $entity->start = $url->indices[0];
    $entity->end = $url->indices[1];
    $entity->length = $url->indices[1] - $url->indices[0];
    $entity->replace = sprintf($url_link_pattern, $url->url, $url->expanded_url, $url->display_url);
    
    $entity_holder[$entity->start] = $entity;
  }
  
  foreach($tweet->entities->user_mentions as $user_mention)
  {
    $entity = new stdclass();
    $entity->start = $user_mention->indices[0];
    $entity->end = $user_mention->indices[1];
    $entity->length = $user_mention->indices[1] - $user_mention->indices[0];
    $entity->replace = sprintf($user_mention_link_pattern, strtolower($user_mention->screen_name), $user_mention->name, $user_mention->screen_name);
    
    $entity_holder[$entity->start] = $entity;
  }
  
  foreach($tweet->entities->media as $media)
  {
    $entity = new stdclass();
    $entity->start = $media->indices[0];
    $entity->end = $media->indices[1];
    $entity->length = $media->indices[1] - $media->indices[0];
    $entity->replace = sprintf($media_link_pattern, $media->url, $media->expanded_url, $media->display_url);
    
    $entity_holder[$entity->start] = $entity;
  }
  
  krsort($entity_holder);
  foreach($entity_holder as $entity)
  {
    $text = substr_replace($text, $entity->replace, $entity->start, $entity->length);
  }
}

This is a bit verbose but allows a lot of flexibility in terms of formatting and updates. A few functions could make this quite a bit more DRY.
NOTE: if you have any issues with multibyte characters check out this comment for help.

More with Tweet Data

There is a lot more that you can do with the data returned by Twtter. User information, replies (conversations), retweet_count (limited metric of engagement), or even using the source field to display how you're sending in the tweets. One disappointing thing about the data is favorites - right now a tweet list does not contain a favorite count. You can get favorites from user objects or individual tweets, but not a list.

Anyways, I hope this helps out if you're planning on moving forward with Twitter integration. I've done many of these things on my lifestream website. Twitter is definitely one of the richer and easier APIs to deal with for reading once you get past the OAuth, especially compared to some of the nastier Google Data setups. Happy coding!

Comments (18)

Gert-Jan Aug 5, '13 Hi Jacob, is it OK to use the 2nd piece of text to parse the url's for our opensource project at Nexus Themes? (https://github.com/nexusthemes/nexusframework)?. It looks very sophisticated and I can't wait to embed it.Regards,Gert-Jan@barkgj
- Jacob Emerick Aug 5, '13 Hi Gert-Jan,Great to see you on here! No, I don't have any problem with you using the code - thank you for asking first. If you want to throw a comment or something in the theme mentioning me/this blog I wouldn't mind ;)On the side - one of my colleagues mentioned something about wordpress having this functionality built in and that you can extend some of the default functionality. I don't have a lot of background in this area so I'm not sure if this is applicable. Anyways, his handle is @dave_kz and his site is davekz.com if this sounds helpful.Anyways, thanks again for asking first!
- Gert-Jan Aug 6, '13 Great stuff, it works like a charm :) I can add you to the contributor list (with link) on our about-us contributor section if its OK to also put your photo there. I could for example use your Twitter photo? Let me know where the link should go and I will add it.
- Jacob Emerick Aug 6, '13 Awesome - that sounds great. And yeah, the Twitter photo will work. A link back to this blog's home page [ blog.jacobemerick.com ] would be cool. Thanks again, sir!
- Daniel Crabbe May 2, '18 is it possible to get just one tweet by url or id?
- Jacob P Emerick May 2, '18 Hi Daniel, Yup! You can get a single tweet by id from this endpoint... https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-show-id
Add to this discussion
- Name (required)
- Email (required, not displayed)
- Website
- Comment
- Basic HTML tags allowed (a, b, i, pre). Comments may be removed if they are deemed inappropriate.
- Email me when others comment on this post
- or
Alexander Hicks Sep 9, '13 So, I started running into problems with special characters throwing the text off course. (This is because of multibyte UTF-8 characters not being that compatible with PHP). Anyway, the easy solution is to replace the last line$text = substr_replace($text, $entity->replace, $entity->start, $entity->length);with$text = mb_substr($text, 0, $entity->start).$entity->replace.mb_substr($text, $entity->start+$entity->length);This will handle multibyte characters correctly. There isn't an mb_substr_replace though, so this is still the shortest option.Also, you'll need to declaremb_internal_encoding("UTF-8");at the start of your code.
- Jacob Emerick Sep 9, '13 Thanks for adding this, Alexander! It is somewhat confusing that Twitter is passing the index information assuming multi-byte support and PHP has a hard time working around that...
Add to this discussion
- Name (required)
- Email (required, not displayed)
- Website
- Comment
- Basic HTML tags allowed (a, b, i, pre). Comments may be removed if they are deemed inappropriate.
- Email me when others comment on this post
- or
Nelson Oct 3, '13 When trying to call this via: foreach($tweet->entities->urls as $url) { $entity = new stdclass(); $entity->start = $url->indices[0]; $entity->end = $url->indices[1]; $entity->length = $url->indices[1] - $url->indices[0]; $entity->replace = sprintf($url_link_pattern, $url->url, $url->expanded_url, $url->display_url); $entity_holder[$entity->start] = $entity; }echo $tweet->entities->urls." ";Am not getting the desired result of the hyperlinks embedded. Why am I missing this simple thing?
- Jacob Emerick Oct 5, '13 Hi Nelson,This script actually doesn't change the tweet object - the loop you have up there goes through and prepares another loop (at the bottom of my script in the post) to do the replacements. Are you using that loop (56 - 60) too?
- santosh Nov 6, '13 tweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! http://t.co/dZ7hZq4IxI pic.twitter.com/slK0IfrPMNtweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! tigerair.com/news/TH_201311â€¦ pic.twitter.com/slK0IfrPMNtweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! tigerair.com/news/TH_201311â€¦ pic.twitter.com/slK0IfrPMNtweet:Congratulations @TigerairSG to be the first in Asia to retrofit #A320 with Sharklets! tigerair.com/news/TH_201311â€¦ pic.twitter.com/slK0IfrPMNWhile using the cvode I am getting four tweets I need only the last out of these four ur help is appreciated
- Jacob Emerick Nov 11, '13 Hi Santosh,If you're getting repeats I think the problem could be in one of two places.1 - The feed is given you inactive tweets, maybe deleted or fail-to-post or something, and you should check for a 'status' field or something.2 - Your code is creating duplicates.If the problem is #2 than you could either do a simple array_pop to pull the last one or fix the code... Does that help? Sorry for the delay, work is been hectic the last few weeks!
Add to this discussion
- Name (required)
- Email (required, not displayed)
- Website
- Comment
- Basic HTML tags allowed (a, b, i, pre). Comments may be removed if they are deemed inappropriate.
- Email me when others comment on this post
- or
Sergey Nov 6, '13 Hi, Jacob,thank you. Though replacing in reverse order is quite obvious, I had hard time figuring it out until found your article. I should stop working extra hours as it's difficult to concentrate :)
- Jacob Emerick Nov 11, '13 Glad to hear it helped, Sergey! All work and no play makes dull boys (sorry, silly American expression) :P
Add to this discussion
- Name (required)
- Email (required, not displayed)
- Website
- Comment
- Basic HTML tags allowed (a, b, i, pre). Comments may be removed if they are deemed inappropriate.
- Email me when others comment on this post
- or
Chris Feb 13, '14 Just to further Alexander's comment. I found I got non-standard content with your code as well. I tried Alexander's code and it didn't help. I've made a couple changes to the code you may be interested in (and adjusting the code above with):1) adding the following code near the top:function mb_substr_replace ($string, $replacement, $start, $length = 0) { if (is_array($string)) { foreach ($string as $i => $val) { $repl = is_array ($replacement) ? $replacement[$i] : $replacement; $st = is_array ($start) ? $start[$i] : $start; $len = is_array ($length) ? $length[$i] : $length; $string[$i] = mb_substr_replace ($val, $repl, $st, $len); } return $string; } $result = mb_substr ($string, 0, $start, 'UTF-8'); $result .= $replacement; if ($length > 0) { $result .= mb_substr ($string, ($start+$length+1), mb_strlen($string, 'UTF-8'), 'UTF-8'); } return $result; } Then in I made the following change to the last foreach loop: $text = mb_substr_replace($text, $entity->replace, $entity->start, $entity->length); to take advantage of the mb_substr_replace function above. Also, when the "media" array isn't present (I've seen this) I added an if (isset($tweet->entities->media)) to the media foreach. Oh ya, and for some reason, the hash tags indices are too close together without spacing, so I added a space at the end of : $hashtag_link_pattern = '#%s ';
- Simon Mar 11, '14 Thanks Chris, I was tearing my hair out about those indices seeming to be in the wrong place!
- Jacob Emerick Mar 12, '14 Yeah, thanks Chris! I linked to your comment from the code, hopefully if others have similar issues they'll be able to use this function.
Add to this discussion
- Name (required)
- Email (required, not displayed)
- Website
- Comment
- Basic HTML tags allowed (a, b, i, pre). Comments may be removed if they are deemed inappropriate.
- Email me when others comment on this post
- or
Silver Bowen Apr 3, '18 Hi Jacob,Just wanted to let you know that folks are still getting use out of these Twitter API posts many years later. Thanks for putting them up!Silver BowenAdd to this discussion
- Name (required)
- Email (required, not displayed)
- Website
- Comment
- Basic HTML tags allowed (a, b, i, pre). Comments may be removed if they are deemed inappropriate.
- Email me when others comment on this post
- or
- Name (required)
- Email (required, not displayed)
- Website
- Comment
- Basic HTML tags allowed (a, b, i, pre). Comments may be removed if they are deemed inappropriate.
- Email me when others comment on this post
- or

Jacob Emerick's Blog

Parsing Twitter Feeds with PHP

Enhancing the Tweet

More with Tweet Data

Related Posts

Comments (18)

Tag Cloud

Recent Comments

Some of my Sites

Activity Stream

Jacob Emerick's Blog

Parsing Twitter Feeds with PHP

Enhancing the Tweet

More with Tweet Data

Related Posts

Working with the Twitter Archive

Passing Extra Parameters to Twitter via OAuth

Working with Twitter's 1.1 API via PHP, OAuth

Fuzzy Date

Comments (18)

Tag Cloud

Recent Comments

Some of my Sites

Activity Stream