Fake Linkedin Endorsements
One thing that you can do on the internet is to endorse people on Linkedin. You just visit one of your connection's profiles on the site and pick some skills pulled from their profile (and/or add new ones) and endorse them. This is a great way that you can 'validate' listed skills and give out simple public recommendations. Its also a great way to have a little bit of spammy fun with your network.
Outside of a notification (and, in the case of 'new' endorsements, a simple approval) there really isn't much stopping you from handing out random endorsements. As long as it exists on a selective Linkedin list you can endorse someone for it. There's a bit of humor to be had by endorsing a strictly-PHP developer with .NET or a steadfast football fan with hockey.
After a fun back-and-forth match with a coworker I began to wonder just how many endorsements were out there. Linkedin didn't seem to provide a full list. Was there a way I could scrape their site and get one?
Linkedin Endorsement Autocomplete
When you start to add a new endorsement to a user's profile an autocomplete option shows up. So, if you start typing 'ph' you get recommendations like 'photoshop', 'photography', 'php', etc. I looked into the javascript powering this feature. There isn't anything too sophisticated - the autocomplete list is completely based on the input characters - so it was fairly easy to track down the ajax call and pull it out.
// if the query is 'ph'
http://www.linkedin.com/ta/skill?query=ph&goback=%2Enmp_*1_*1_*1_*1_*1_*1_*1_*1_*1_*1
// simpler version that also works
http://www.linkedin.com/ta/skill?query=ph
A nice json response comes back, complete with a 'display' version, url, and internal id. Here's an example of the response.
{"resultList":
[
{
"id":"281",
"headLine":"<strong>>Ph<\/strong>otoshop",
"displayName":"Photoshop",
"subLine":"",
"url":"http://www.linkedin.com/skills/skill/Photoshop"
},
{
"id":"193",
"headLine":"<strong>Ph<\/strong>otography",
"displayName":"Photography",
"subLine":"",
"url":"http://www.linkedin.com/skills/skill/Photography"
},
{
"id":"261",
"headLine":"<strong>PH<\/strong>P",
"displayName":"PHP",
"subLine":"",
"url":"http://www.linkedin.com/skills/skill/PHP"
}
]
}
So this is nice and all but the response is usually limited to about ten options, max. How can you get more?
Letter Combinations
In order to avoid manually typing in a bunch of possible queries into the url string I needed to write a script that would automatically hit Linkedin with different searches. It's been awhile since I played with this type of programming so I found it easiest to sketch out the problem.
I needed to loop through the alphabet multiple times, each time appending each character, to come up with a bunch of possible query strings. After realizing that I was going for a type of combination I worked out a formula to figure out how many types of queries I was looking at. If I wanted to have all possible 1, 2, and 3 character combinations it would be 27^3 - 27^2, or 18,954. That's a lot of requests to throw at a server. I decided that 4 characters (which would rack up half-a-million requests) weren't necessary (yet).
So, a script to come up with those character combinations.
// let's define the alphabet
$alphabet = array(
'a', 'b', 'c', 'd',
'e', 'f', 'g', 'h',
'i', 'j', 'k', 'l',
'm', 'n', 'o', 'p',
'q', 'r', 's', 't',
'u', 'v', 'w', 'x',
'y', 'z');
// need a holder to help w/ recursion
$search_array = array();
$temp_search_array = array('');
// loop through the alphabet to create some search strings
// a, aa, ab, ba, etc
for($i = 1; $i <= 3; $i++)
{
foreach($alphabet as $character)
{
foreach($temp_search_array as $query)
{
$search_array[] = $query . $character;
}
}
$temp_search_array = $search_array;
}
Scraping Linkedin
My first attempt was a blatant file_get_contents() using the ajax url. That didn't work. Unsure of what sort of validation that Linkedin may be running against incoming requests I decided to throw a couple of extra steps in. I randomized the queries so it wasn't obvious and ordered. I threw a five second delay to avoid hitting a max requests per time problem. Changing the file_get_contents into a full CURL with 'browser-like' headers made the incoming requests seem more legit. Also, in case the script failed at some point, I made sure to check the progress on each loop so I didn't make the same query twice on different runs.
These are all fairly basic first steps. If Linkedin still didn't like my requests I could randomize the delay, juggle a few different header configurations, even look into proxies or different servers to pull from different IP addresses. The basic steps worked, though. All 18,954 requests went through, one at a time, and took a little more than a day to complete.
// the script is going to run awhile
set_time_limit(0);
$mysqli = new mysqli(
'localhost',
'root',
'',
'scrape');
// from looking at linkedin ajax calls
$base_url = 'http://www.linkedin.com/ta/skill?query=%s';
// make the request seem valid
// pulled straight outa Chrome developer tools
$header_array = array(
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' => 'gzip,deflate,sdch',
'Accept-Language' => 'en-US,en;q=0.8',
'Cache-Control' => 'max-age=0',
'Connection' => 'keep-alive',
'User-Agent' => 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36');
// now loop through all the search params and ask for them
shuffle($search_array);
foreach($search_array as $search)
{
$query = "SELECT `id` FROM `search` WHERE `description` = '{$search}' LIMIT 1";
$result = $mysqli->query($query);
if($result->num_rows == 1) // assume that existing queries have been used
continue;
$url = sprintf($base_url, $search);
// request out to linkedin servers
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_HTTPHEADER, $header_array);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, TRUE);
$response = curl_exec($handle);
$response = json_decode($response);
// if response is null than there's a good chance we're blocked
if($response == NULL)
exit('we got a problem: ' . $search);
// else, insert the search as a 'holder'
$query = "INSERT INTO `search` (`description`) VALUES ('{$search}')";
$result = $mysqli->query($query);
$search_id = $mysqli->insert_id;
$date = date('Y-m-d H:i:s');
// loop the results and insert the endorsements
foreach($response->resultList as $response_row)
{
$linkedin_id = $response_row->id;
$title = $response_row->displayName;
$query = "SELECT `id` FROM `endorsement` WHERE `linkedin_id` = '{$linkedin_id}' LIMIT 1";
$result = $mysqli->query($query);
if($result->num_rows == 1)
{
while($row = $result->fetch_object())
$endorsement_id = $row->id;
}
else
{
$query = "INSERT INTO `endorsement` (`title`, `linkedin_id`) VALUES ('{$title}', '{$linkedin_id}')";
$result = $mysqli->query($query);
$endorsement_id = $mysqli->insert_id;
}
// just for extra connections, connect to search term
$query = "INSERT INTO `search_endorsement` (`search`, `endorsement`, `date`) VALUES ('{$search_id}', '{$endorsement_id}', '{$date}')";
$mysqli->query($query);
}
sleep(5); // sneaky
}
A Whole Bunch of Endorsements
After checking all of the 1, 2, and 3 character combinations I ended up with a list of 14,605 endorsements. By looking at the Linkedin IDs and assuming that they are auto-incrementing (and without holes), though, it appears that there are at least 50,000 total out there. So I pulled less than 30% of all endorsements using these scripts.
A cursory glance of the collected list provides some solid ammunition for future endorsement spamming. Acid Mine Drainage, Bagpipes, Smiling, Sudoku and Potatoes are all great skills that are rarely applicable in the web development field. I plan on using these, and perhaps a few of the other thousands also out there, to have some fun with my colleagues.
Please note: I don't encourage giving out 'fake' endorsements to people. There are some people (namely HR managers and recruiters) who may take endorsement lists pretty seriously. I just enjoy throwing one out of left field on occasion and wanted to figure out what options were out there.
Comments (1)