Fake Linkedin Endorsements

One thing that you can do on the internet is to endorse people on Linkedin. You just visit one of your connection's profiles on the site and pick some skills pulled from their profile (and/or add new ones) and endorse them. This is a great way that you can 'validate' listed skills and give out simple public recommendations. Its also a great way to have a little bit of spammy fun with your network.

Outside of a notification (and, in the case of 'new' endorsements, a simple approval) there really isn't much stopping you from handing out random endorsements. As long as it exists on a selective Linkedin list you can endorse someone for it. There's a bit of humor to be had by endorsing a strictly-PHP developer with .NET or a steadfast football fan with hockey.

After a fun back-and-forth match with a coworker I began to wonder just how many endorsements were out there. Linkedin didn't seem to provide a full list. Was there a way I could scrape their site and get one?

Linkedin Endorsement Autocomplete

When you start to add a new endorsement to a user's profile an autocomplete option shows up. So, if you start typing 'ph' you get recommendations like 'photoshop', 'photography', 'php', etc. I looked into the javascript powering this feature. There isn't anything too sophisticated - the autocomplete list is completely based on the input characters - so it was fairly easy to track down the ajax call and pull it out.

  1. // if the query is 'ph'

  2. http://www.linkedin.com/ta/skill?query=ph&goback=%2Enmp_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1

  3. // simpler version that also works

  4. http://www.linkedin.com/ta/skill?query=ph

A nice json response comes back, complete with a 'display' version, url, and internal id. Here's an example of the response.

  1. {"resultList":

  2. [

  3. {

  4. "id":"281",

  5. "headLine":"<strong>>Ph<\/strong>otoshop",

  6. "displayName":"Photoshop",

  7. "subLine":"",

  8. "url":"http://www.linkedin.com/skills/skill/Photoshop"

  9. },

  10. {

  11. "id":"193",

  12. "headLine":"<strong>Ph<\/strong>otography",

  13. "displayName":"Photography",

  14. "subLine":"",

  15. "url":"http://www.linkedin.com/skills/skill/Photography"

  16. },

  17. {

  18. "id":"261",

  19. "headLine":"<strong>PH<\/strong>P",

  20. "displayName":"PHP",

  21. "subLine":"",

  22. "url":"http://www.linkedin.com/skills/skill/PHP"

  23. }

  24. ]

  25. }

So this is nice and all but the response is usually limited to about ten options, max. How can you get more?

Letter Combinations

In order to avoid manually typing in a bunch of possible queries into the url string I needed to write a script that would automatically hit Linkedin with different searches. It's been awhile since I played with this type of programming so I found it easiest to sketch out the problem.

Sketch of 1, 2, and 3 letter combinations

Sketch of 1, 2, and 3 letter combinations

I needed to loop through the alphabet multiple times, each time appending each character, to come up with a bunch of possible query strings. After realizing that I was going for a type of combination I worked out a formula to figure out how many types of queries I was looking at. If I wanted to have all possible 1, 2, and 3 character combinations it would be 27^3 - 27^2, or 18,954. That's a lot of requests to throw at a server. I decided that 4 characters (which would rack up half-a-million requests) weren't necessary (yet).

So, a script to come up with those character combinations.

  1. // let's define the alphabet

  2. $alphabet = array(

  3. 'a', 'b', 'c', 'd',

  4. 'e', 'f', 'g', 'h',

  5. 'i', 'j', 'k', 'l',

  6. 'm', 'n', 'o', 'p',

  7. 'q', 'r', 's', 't',

  8. 'u', 'v', 'w', 'x',

  9. 'y', 'z');

  10. // need a holder to help w/ recursion

  11. $search_array = array();

  12. $temp_search_array = array('');

  13. // loop through the alphabet to create some search strings

  14. // a, aa, ab, ba, etc

  15. for($i = 1; $i <= 3; $i++)

  16. {

  17. foreach($alphabet as $character)

  18. {

  19. foreach($temp_search_array as $query)

  20. {

  21. $search_array[] = $query . $character;

  22. }

  23. }

  24. $temp_search_array = $search_array;

  25. }

Scraping Linkedin

My first attempt was a blatant file_get_contents() using the ajax url. That didn't work. Unsure of what sort of validation that Linkedin may be running against incoming requests I decided to throw a couple of extra steps in. I randomized the queries so it wasn't obvious and ordered. I threw a five second delay to avoid hitting a max requests per time problem. Changing the file_get_contents into a full CURL with 'browser-like' headers made the incoming requests seem more legit. Also, in case the script failed at some point, I made sure to check the progress on each loop so I didn't make the same query twice on different runs.

These are all fairly basic first steps. If Linkedin still didn't like my requests I could randomize the delay, juggle a few different header configurations, even look into proxies or different servers to pull from different IP addresses. The basic steps worked, though. All 18,954 requests went through, one at a time, and took a little more than a day to complete.

  1. // the script is going to run awhile

  2. set_time_limit(0);

  3. $mysqli = new mysqli(

  4. 'localhost',

  5. 'root',

  6. '',

  7. 'scrape');

  8. // from looking at linkedin ajax calls

  9. $base_url = 'http://www.linkedin.com/ta/skill?query=%s';

  10. // make the request seem valid

  11. // pulled straight outa Chrome developer tools

  12. $header_array = array(

  13. 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

  14. 'Accept-Encoding' => 'gzip,deflate,sdch',

  15. 'Accept-Language' => 'en-US,en;q=0.8',

  16. 'Cache-Control' => 'max-age=0',

  17. 'Connection' => 'keep-alive',

  18. 'User-Agent' => 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36');

  19. // now loop through all the search params and ask for them

  20. shuffle($search_array);

  21. foreach($search_array as $search)

  22. {

  23. $query = "SELECT `id` FROM `search` WHERE `description` = '{$search}' LIMIT 1";

  24. $result = $mysqli->query($query);

  25. if($result->num_rows == 1) // assume that existing queries have been used

  26. continue;

  27. $url = sprintf($base_url, $search);

  28. // request out to linkedin servers

  29. $handle = curl_init($url);

  30. curl_setopt($handle, CURLOPT_HTTPHEADER, $header_array);

  31. curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);

  32. $response = curl_exec($handle);

  33. $response = json_decode($response);

  34. // if response is null than there's a good chance we're blocked

  35. if($response == NULL)

  36. exit('we got a problem: ' . $search);

  37. // else, insert the search as a 'holder'

  38. $query = "INSERT INTO `search` (`description`) VALUES ('{$search}')";

  39. $result = $mysqli->query($query);

  40. $search_id = $mysqli->insert_id;

  41. $date = date('Y-m-d H:i:s');

  42. // loop the results and insert the endorsements

  43. foreach($response->resultList as $response_row)

  44. {

  45. $linkedin_id = $response_row->id;

  46. $title = $response_row->displayName;

  47. $query = "SELECT `id` FROM `endorsement` WHERE `linkedin_id` = '{$linkedin_id}' LIMIT 1";

  48. $result = $mysqli->query($query);

  49. if($result->num_rows == 1)

  50. {

  51. while($row = $result->fetch_object())

  52. $endorsement_id = $row->id;

  53. }

  54. else

  55. {

  56. $query = "INSERT INTO `endorsement` (`title`, `linkedin_id`) VALUES ('{$title}', '{$linkedin_id}')";

  57. $result = $mysqli->query($query);

  58. $endorsement_id = $mysqli->insert_id;

  59. }

  60. // just for extra connections, connect to search term

  61. $query = "INSERT INTO `search_endorsement` (`search`, `endorsement`, `date`) VALUES ('{$search_id}', '{$endorsement_id}', '{$date}')";

  62. $mysqli->query($query);

  63. }

  64. sleep(5); // sneaky

  65. }

A Whole Bunch of Endorsements

After checking all of the 1, 2, and 3 character combinations I ended up with a list of 14,605 endorsements. By looking at the Linkedin IDs and assuming that they are auto-incrementing (and without holes), though, it appears that there are at least 50,000 total out there. So I pulled less than 30% of all endorsements using these scripts.

A cursory glance of the collected list provides some solid ammunition for future endorsement spamming. Acid Mine Drainage, Bagpipes, Smiling, Sudoku and Potatoes are all great skills that are rarely applicable in the web development field. I plan on using these, and perhaps a few of the other thousands also out there, to have some fun with my colleagues.

Please note: I don't encourage giving out 'fake' endorsements to people. There are some people (namely HR managers and recruiters) who may take endorsement lists pretty seriously. I just enjoy throwing one out of left field on occasion and wanted to figure out what options were out there.