Skip to content

"🪡 Social account detection and extraction in js, e.g. for crawling/scraping."

Notifications You must be signed in to change notification settings

dinaaas/js_socials_regex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SocialsRegex for JavaScript

Detect and extract URLs of social media profiles with ease. This JavaScript package provides regular expressions and utilities to identify and extract social media account URLs from text.

Installation

Install the package using npm:

npm install socials_regex

Usage

const { SocialsRegex, SocialExtraction } = require('socials_regex');

// Example usage in your application
const text = 'Visit my Yelp page: https://www.yelp.com/biz/example-business';
const platform = SocialsRegex.Platforms.PLATFORM_YELP;
const matches = SocialExtraction.extractMatchesByPlatform(platform, text);

console.log(matches[SocialsRegex.Platforms.PLATFORM_YELP].company);
// Output: [{ matched: 'https://www.yelp.com/biz/example-business', company: 'example-business' }]

more example

const { SocialsRegex, SocialExtraction, PlatformsRegex } = require('socials_regex');

// Example text containing social platform URLs
const text = `
  Check out my GitHub: https://github.com/example_user
  Also, find me on Twitter: https://twitter.com/example_twitter
   Also, find me on Twitter: https://twitter.com/example_twitter
  Also, find me on Twitter: https://twitter.com/example_twitter

  Contact me on LinkedIn: https://linkedin.com/in/example_linkedin
`;

// Use the static method directly
const matchesPerPlatform = SocialExtraction.extractMatchesPerPlatform(text);
console.log('Matches per platform:', matchesPerPlatform);

const twitterMatches = SocialExtraction.extractMatchesByPlatform(SocialsRegex.Platforms.PLATFORM_TWITTER, text);
console.log('Twitter matches:', twitterMatches);

const githubMatches = SocialExtraction.extractMatchesByRegex([PlatformsRegex.REGEX[SocialsRegex.Platforms.PLATFORM_GITHUB].user,
  PlatformsRegex.REGEX[SocialsRegex.Platforms.PLATFORM_TWITTER].user], text);
console.log('GitHub matches:', githubMatches);

/*
* Matches per platform: {
  twitter: { user: [ [Object], [Object], [Object] ] },
  linkedin: { profile: [ [Object] ] },
  github: { user: [ [Object] ] }
}
Twitter matches: { twitter: { user: [ [Object], [Object], [Object] ] } }
GitHub matches: [
  [
    {
      matched: 'https://github.com/example_user',
      login: 'example_user'
    }
  ],
  [
    {
      matched: 'https://twitter.com/example_twitter',
      username: 'example_twitter'
    },
    {
      matched: 'https://twitter.com/example_twitter',
      username: 'example_twitter'
    },
    {
      matched: 'https://twitter.com/example_twitter',
      username: 'example_twitter'
    }
  ]
]

* */

Features

  • Detect the platform a URL points to (support for major platforms).
  • Extract information contained within the URL without accessing the link.
  • Extract emails and phone numbers from hyperlinks.

Supported Platforms

const { SocialsRegex, SocialExtraction, PlatformsRegex } = require('socials_regex');
const supportedPlatforms = PlatformsRegex.supportedPlatformsRegex();
console.log(supportedPlatforms)
// output
[
        'yelp',                  'whatsapp',
        'stackexchange network', 'crunchbase',
        'angellist',             'xing',
        'vimeo',                 'telegram',
        'stackoverflow',         'stackexchange',
        'snapchat',              'skype',
        'reddit',                'phone',
        'medium',                'hackernews',
        'email',                 'youtube',
        'instagram',             'twitter',
        'linkedin',              'github',
        'facebook'
]

// OR 

const { SocialsRegex, SocialExtraction, PlatformsRegex } = require('socials_regex');
const supportedPlatforms = SocialsRegex.Platforms.all();
console.log(supportedPlatforms)

        [
        'PLATFORM_FACEBOOK',
                'PLATFORM_GITHUB',
                'PLATFORM_LINKEDIN',
                'PLATFORM_TWITTER',
                'PLATFORM_INSTAGRAM',
                'PLATFORM_YOUTUBE',
                'PLATFORM_EMAIL',
                'PLATFORM_HACKER_NEWS',
                'PLATFORM_MEDIUM',
                'PLATFORM_PHONE',
                'PLATFORM_REDDIT',
                'PLATFORM_SKYPE',
                'PLATFORM_SNAPCHAT',
                'PLATFORM_STACKEXCHANGE',
                'PLATFORM_STACKOVERFLOW',
                'PLATFORM_TELEGRAM',
                'PLATFORM_VIMEO',
                'PLATFORM_XING',
                'PLATFORM_ANGELLIST',
                'PLATFORM_CRUNCHBASE',
                'PLATFORM_STACKEXCHANGE_NETWORK',
                'PLATFORM_WHATSAPP',
                'PLATFORM_YELP',
                'all',
                'show'
        ]

Supported Regexes

const supportedRegexes = SocialsRegex.Regexes.all();
// Output: ['ANGELLIST_URL_REGEX', 'CRUNCHBASE_URL_REGEX', 'EMAIL_URL_REGEX', 'FACEBOOK_URL_REGEX', 'GITHUB_URL_REGEX', 'HACKERNEWS_URL_REGEX', ...]

Development

  • Clone the repository: git clone https://github.com/talaatmagdyx/js_socials_regex.git
  • Install dependencies: npm install
  • Run tests: npm test

References

  • social-media-profiles-regexs: extract urls of social media profiles with regular expressions
  • Ruby socials_regex Social Regex Account Detection and Extraction for Ruby. Detect and extract URLs of social accounts: throw in URLs, get back URLs of social media profiles by type.

Contributing

Bug reports and pull requests are welcome on GitHub at Contributing. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

The pakcage is available as open source under the terms of the MIT License.

Reporting Bugs / Feature Requests

Please open an issue on GitHub for feedback, new feature requests, or bug reports.

Pull Request

Please read Contributing

Code of Conduct

Everyone interacting in the SocialsRegex project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

About

"🪡 Social account detection and extraction in js, e.g. for crawling/scraping."

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%