utils.social
A namespace that contains various utilities to help you extract social handles from text, URLs and and HTML documents.
Example usage:
const Apify = require('apify');
const emails = Apify.utils.social.emailsFromText('alice@example.com bob@example.com');
social.LINKEDIN_REGEX
Regular expression to exactly match a single LinkedIn profile URL. It has the following form: /^...$/i
and matches URLs such as:
https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.linkedin.com/in/linus-torvalds/latest-activity
Example usage:
if (Apify.utils.social.LINKEDIN_REGEX.test('https://www.linkedin.com/in/alan-turing')) {
console.log('Match!');
}
social.LINKEDIN_REGEX_GLOBAL
Regular expression to find multiple LinkedIn profile URLs in a text or HTML. It has the following form: /.../ig
and matches URLs such as:
https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.linkedin.com/in/linus-torvalds/latest-activity
the expression extracts just the following base URL:
https://www.linkedin.com/in/linus-torvalds
Example usage:
const matches = text.match(Apify.utils.social.LINKEDIN_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} LinkedIn profiles found!`);
social.INSTAGRAM_REGEX
Regular expression to exactly match a single Instagram profile URL. It has the following form: /^...$/i
and matches URLs such as:
https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.instagram.com/cristiano/followers
Example usage:
if (Apify.utils.social.INSTAGRAM_REGEX.test('https://www.instagram.com/old_prague')) {
console.log('Match!');
}
social.INSTAGRAM_REGEX_GLOBAL
Regular expression to find multiple Instagram profile URLs in a text or HTML. It has the following form: /.../ig
and matches URLs such as:
https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.instagram.com/cristiano/followers
the expression extracts just the following base URL:
https://www.instagram.com/cristiano
Example usage:
const matches = text.match(Apify.utils.social.INSTAGRAM_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Instagram profiles found!`);
social.TWITTER_REGEX
Regular expression to exactly match a single Twitter profile URL. It has the following form: /^...$/i
and matches URLs such as:
https://www.twitter.com/apify
twitter.com/apify
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.twitter.com/realdonaldtrump/following
Example usage:
if (Apify.utils.social.TWITTER_REGEX.test('https://www.twitter.com/apify')) {
console.log('Match!');
}
social.TWITTER_REGEX_GLOBAL
Regular expression to find multiple Twitter profile URLs in a text or HTML. It has the following form: /.../ig
and matches URLs such as:
https://www.twitter.com/apify
twitter.com/apify
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.twitter.com/realdonaldtrump/following
the expression extracts only the following base URL:
https://www.twitter.com/realdonaldtrump
Example usage:
const matches = text.match(Apify.utils.social.TWITTER_REGEX_STRING);
if (matches) console.log(`${matches.length} Twitter profiles found!`);
social.FACEBOOK_REGEX
Regular expression to exactly match a single Facebook profile URL. It has the following form: /^...$/i
and matches URLs such as:
https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech
https://www.facebook.com/profile.php?id=123456789
The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:
https://www.facebook.com/apifytech/photos
Example usage:
if (Apify.utils.social.FACEBOOK_REGEX.test('https://www.facebook.com/apifytech')) {
console.log('Match!');
}
social.FACEBOOK_REGEX_GLOBAL
Regular expression to find multiple Facebook profile URLs in a text or HTML. It has the following form: /.../ig
and matches URLs such as:
https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech
If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:
https://www.facebook.com/apifytech/photos
the expression extracts only the following base URL:
https://www.facebook.com/apifytech
Example usage:
const matches = text.match(Apify.utils.social.FACEBOOK_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Facebook profiles found!`);
social.YOUTUBE_REGEX
Regular expression to exactly match a single Youtube video URL. It has the following form: /^...$/i
and matches URLs such as:
https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE
Example usage:
if (Apify.utils.social.YOUTUBE_REGEX.test('https://www.youtube.com/watch?v=kM7YfhfkiEE')) {
console.log('Match!');
}
social.YOUTUBE_REGEX_GLOBAL
Regular expression to find multiple Youtube video URLs in a text or HTML. It has the following form: /.../ig
and matches URLs such as:
https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE
Example usage:
const matches = text.match(Apify.utils.social.YOUTUBE_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Youtube videos found!`);
social.EMAIL_REGEX
Regular expression to exactly match a single email address. It has the following form: /^...$/i
.
social.EMAIL_REGEX_GLOBAL
Regular expression to find multiple email addresses in a text. It has the following form: /.../ig
.
social.emailsFromText(text)
The function extracts email addresses from a plain text. Note that the function preserves the order of emails and keep duplicates.
Parameters:
text
:string
- Text to search in.
Returns:
Array<string>
- Array of emails addresses found. If no emails are found, the function returns an empty array.
social.emailsFromUrls(urls)
The function extracts email addresses from a list of URLs. Basically it looks for all mailto:
URLs and returns valid email addresses from them. Note
that the function preserves the order of emails and keep duplicates.
Parameters:
urls
:Array<string>
- Array of URLs.
Returns:
Array<string>
- Array of emails addresses found. If no emails are found, the function returns an empty array.
social.phonesFromText(text)
The function attempts to extract phone numbers from a text. Please note that the results might not be accurate, since phone numbers appear in a large variety of formats and conventions. If you encounter some problems, please file an issue.
Parameters:
text
:string
- Text to search the phone numbers in.
Returns:
Array<string>
- Array of phone numbers found. If no phone numbers are found, the function returns an empty array.
social.phonesFromUrls(urls)
Finds phone number links in an array of URLs and extracts the phone numbers from them. Note that the phone number links look like tel://123456789
,
tel:/123456789
or tel:123456789
.
Parameters:
urls
:Array<string>
- Array of URLs.
Returns:
Array<string>
- Array of phone numbers found. If no phone numbers are found, the function returns an empty array.
social.parseHandlesFromHtml(html, [data])
The function attempts to extract emails, phone numbers and social profile URLs from a HTML document, specifically LinkedIn, Twitter, Instagram and Facebook profile URLs. The function removes duplicates from the resulting arrays and sorts the items alphabetically.
Note that the phones
field contains phone numbers extracted from the special phone links such as [call us](tel:+1234556789)
(see
social.phonesFromUrls()
) and potentially other sources with high certainty, while phonesUncertain
contains phone
numbers extracted from the plain text, which might be very inaccurate.
Example usage:
const Apify = require('apify');
const browser = await Apify.launchPuppeteer();
const page = await browser.newPage();
await page.goto('http://www.example.com');
const html = await page.content();
const result = Apify.utils.social.parseHandlesFromHtml(html);
console.log('Social handles:');
console.dir(result);
Parameters:
html
:string
- HTML text[data]
:*
|null
=
- Optional object which will receive thetext
and$
properties that contain text content of the HTML andcheerio
object, respectively. This is an optimization so that the caller doesn't need to parse the HTML document again, if needed.
Returns:
SocialHandles
- An object with the social handles.