For some reason, all social media APIs tend to be the least engineer-friendly* (*personal opinion) and are a real pain to work with. Regular international scandals around the misuse of users' personal data do not make things easier and usually lead to even more restrictions on data access for developers. Unless you use a member authentication token (aka social media sign-in button) in your app, most platforms won't grant you access to pages' public data or will at least make it difficult by requiring you to fill in enormous forms and provide demos to explain your use cases.
Security of your personal data on the web is good, don't get me wrong. But what if you need some basic public info to develop an app that will bring value to users and at the same time there doesn't seem to be a logical way to include a "Sign-in with your social media account" process?
LinkedIn page follower count
A common use case of providing a page follower count by a platform to anonymous users is a "Follow us" button that you can put on your website. LinkedIn has one as well. It looks like this:
That's a follower count for Microsoft if you are wondering. Here's a code to create one:
<script src="https://platform.linkedin.com/in.js" type="text/javascript"> lang: en_US</script>
<script type="IN/FollowCompany" data-id="1035" data-counter="bottom"></script>
where 1035
is the ID for Microsoft's LinkedIn page.
Let's put it on a web page and see what happens on the Network tab of DevTools.
Here it is - a public URL to retrieve a page's follower count without OAuth and/or request limitations:
https://www.linkedin.com/pages-extensions/FollowCompany?id={companyId}&counter=bottom
This is something we can work with. Let's write some code.
"""
linkedin_followers.py
"""
from sys import argv
from pyquery import PyQuery
import re
pq = PyQuery(url=f'https://www.linkedin.com/pages-extensions/FollowCompany?id={argv[1]}&counter=bottom')
widget_text = pq.text() # equivalent of <document.body.innerText> in JS
follower_count = int(re.sub(r'\D', '', widget_text)) # remove everything except digits and cast to int
print({ 'raw': widget_text, 'count': follower_count })
This will give us the following result:
$ python linkedin_followers.py 1035
{'raw': 'Follow\n7,750,599', 'count': 7750599}
LinkedIn company ID
What if you want a number for microsoft
, and not some 1035
page? Well, I don't have a pretty solution yet. Here are some things you can try depending on your use case.
Prompt to provide a LinkedIn ID as input
If you are able to provide input as LinkedIn page IDs, you will have a solution ready. In case it is a user who will fill in the input field, you could add an instruction on getting an ID near your input field. The official LinkedIn documentation itself provides users with the following tip to set up a "Follow" button:
As a company page administrator, your Company ID can be retrieved by navigating to the admin section of your company page. For example, the LinkedIn Company Admin Page is
https://www.linkedin.com/company/1337/admin/
. We will use the Company ID1337
in our example.
Crawl LinkedIn to get an ID
This method should not be your preferred solution. You should also be aware that sometimes LinkedIn will just call you out on being a bot and ask for a CAPTCHA. Sometimes though it will not. So you can automate this to some degree.
First thing you should consider is that company ID won't change over time, so once obtained should be stored in a database with a corresponding company name instead of being requested each time.
First thing you will need to crawl LinkedIn is a li_at
cookie to identify yourself as a user. To get one, you need to emulate login. I find Nightmare (a Node.js library with Electron under the hood) to be very useful in such tasks. Here is an example of how you can use it for the LinkedIn sign-in process:
/** * linkedin_login.js */ var Nightmare = require('nightmare'); var TIMEOUT = 5000; //ms var SHOW_BROWSER = true; var linkedinLogin = function(email, password) { var nightmare = new Nightmare({ waitTimeout: TIMEOUT, show: SHOW_BROWSER }); return nightmare .goto('https://www.linkedin.com/uas/login') .wait('#username') // wait until sign-in form appears .insert('#username', false) // clear .insert('#username', email) .insert('#password', false) // clear .insert('#password', password) .click('.btn__primary--large') .wait('.content') .cookies.get('li_at') .end(); }; var argvCount = process.argv.length; var email = process.argv[argvCount - 2];
// clearly, you should not input your password as a command-line argument in a real world out there var password = process.argv[argvCount - 1];
linkedinLogin(email, password).then(console.log);
Let's launch the script:
$ npm install nightmare $ node linkedin_login.js [email] [password]
If you have specified SHOW_BROWSER = true
(which is optional, but useful for debugging purposes), you will see an Electron window for a brief moment:
and your result after that:
{ name: 'li_at', value: 'COOKIE_VALUE', domain: '.www.linkedin.com', hostOnly: false, path: '/', secure: true, httpOnly: true, session: false, expirationDate: 1594297571.568987 }
Like with the company ID, if you are going to pull this value before each new request, the LinkedIn server's patience might run out quickly, so you better store it somewhere before its expirationDate
.
Now for the company ID itself. It seems to be exposed through various links on the page. One of them is this link:
So why don't we pull the ID right from it? Here's another Nightmare library approach:
/** * linkedin_company.js */ var Nightmare = require('nightmare'); var TIMEOUT = 5000; //ms var SHOW_BROWSER = true; var getLinkedinCompanyId = function(companyName, authCookieValue) { var nightmare = new Nightmare({ waitTimeout: TIMEOUT, show: SHOW_BROWSER }); return nightmare .goto('https://www.linkedin.com') .cookies.set('li_at', authCookieValue) .goto('https://www.linkedin.com/company/' + companyName) .wait(function () { return document.getElementsByTagName('a'); }) .evaluate(function() { var links = Array.from(document.getElementsByTagName('a')) .filter(function (e) { return e.href.indexOf('facetCurrentCompany') >= 0; }); return links[links.length - 1].href }) .end() .then(function (employeesUrl) { var parsedUrl = new URL(employeesUrl); var currentCompanyParam = parsedUrl.searchParams.get('facetCurrentCompany'); return parseInt(JSON.parse(currentCompanyParam)[0]); }); }; var argvCount = process.argv.length; var company = process.argv[argvCount - 2]; var cookie = process.argv[argvCount - 1]; getLinkedinCompanyId(company, cookie).then(console.log);
Now let's run it with the cookie value we stored:
$ node linkedin_company.js microsoft "COOKIE_VALUE" 1035
And here we have it: 1035
is our ID for the microsoft
page. We can do it again with any page, no matter the relation between our logged-in user and a company.
$ node linkedin_company.js apple "COOKIE_VALUE" 162479
Now we can use it with our linkedin_followers.py
script:
$ python linkedin_followers.py 162479
{'raw': 'Follow\n6,983,914', 'count': 6983914}