Scrape Facebook public pages without an API key

These details have not been verified by PyPI

Project links

Project description

Facebook Scraper

Scrape Facebook public pages without an API key. Inspired by twitter-scraper.

Install

To install the latest release from PyPI:

pip install facebook-scraper

Or, to install the latest master branch:

pip install git+https://github.com/kevinzg/facebook-scraper.git

Usage

Send the unique page name, profile name, or ID as the first parameter and you're good to go:

>>> from facebook_scraper import get_posts

>>> for post in get_posts('nintendo', pages=1):
...     print(post['text'][:50])
...
The final step on the road to the Super Smash Bros
We’re headed to PAX East 3/28-3/31 with new games

Optional parameters

(For the get_posts function).

group: group id, to scrape groups instead of pages. Default is None.
pages: how many pages of posts to request, the first 2 pages may have no results, so try with a number greater than 2. Default is 10.
timeout: how many seconds to wait before timing out. Default is 5.
credentials: tuple of user and password to login before requesting the posts. Default is None.
extra_info: bool, if true the function will try to do an extra request to get the post reactions. Default is False.
youtube_dl: bool, use Youtube-DL for (high-quality) video extraction. You need to have youtube-dl installed on your environment. Default is False.
post_urls: list, URLs or post IDs to extract posts from. Alternative to fetching based on username.
cookies: One of:
- The path to a file containing cookies in Netscape or JSON format. You can extract cookies from your browser after logging into Facebook with an extension like EditThisCookie (Chrome) or Cookie Quick Manager (Firefox). Make sure that you include both the c_user cookie and the xs cookie, you will get an InvalidCookies exception if you don't.
- A CookieJar
- A dictionary that can be converted to a CookieJar with cookiejar_from_dict
options: Dictionary of options. Set options={"comments": True} to extract comments, set options={"reactors": True} to extract the people reacting to the post. Both comments and reactors can also be set to a number to set a limit for the amount of comments/reactors to retrieve. The default limit for comments is 5000 and the default limit for reactors is 3000. Set options={"progress": True} to get a tqdm progress bar while extracting comments and replies. Set options={"allow_extra_requests": False} to disable making extra requests when extracting post data (required for some things like full text and image links). Set options={"posts_per_page": 200} to request 200 posts per page. The default is 4.

CLI usage

$ facebook-scraper --filename nintendo_page_posts.csv --pages 10 nintendo

Run facebook-scraper --help for more details on CLI usage.

Note: If you get a UnicodeEncodeError try adding --encoding utf-8.

Post example

{'available': True,
 'comments': 459,
 'comments_full': None,
 'factcheck': None,
 'fetched_time': datetime.datetime(2021, 4, 20, 13, 39, 53, 651417),
 'image': 'https://scontent.fhlz2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/58745049_2257182057699568_1761478225390731264_n.jpg?_nc_cat=111&ccb=1-3&_nc_sid=8024bb&_nc_ohc=ygH2fPmfQpAAX92ABYY&_nc_ht=scontent.fhlz2-1.fna&tp=14&oh=7a8a7b4904deb55ec696ae255fff97dd&oe=60A36717',
 'images': ['https://scontent.fhlz2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/58745049_2257182057699568_1761478225390731264_n.jpg?_nc_cat=111&ccb=1-3&_nc_sid=8024bb&_nc_ohc=ygH2fPmfQpAAX92ABYY&_nc_ht=scontent.fhlz2-1.fna&tp=14&oh=7a8a7b4904deb55ec696ae255fff97dd&oe=60A36717'],
 'is_live': False,
 'likes': 3509,
 'link': 'https://www.nintendo.com/amiibo/line-up/',
 'post_id': '2257188721032235',
 'post_text': 'Don’t let this diminutive version of the Hero of Time fool you, '
              'Young Link is just as heroic as his fully grown version! Young '
              'Link joins the Super Smash Bros. series of amiibo figures!\n'
              '\n'
              'https://www.nintendo.com/amiibo/line-up/',
 'post_url': 'https://facebook.com/story.php?story_fbid=2257188721032235&id=119240841493711',
 'reactions': {'haha': 22, 'like': 2657, 'love': 706, 'sorry': 1, 'wow': 123}, # if `extra_info` was set
 'reactors': None,
 'shared_post_id': None,
 'shared_post_url': None,
 'shared_text': '',
 'shared_time': None,
 'shared_user_id': None,
 'shared_username': None,
 'shares': 441,
 'text': 'Don’t let this diminutive version of the Hero of Time fool you, '
         'Young Link is just as heroic as his fully grown version! Young Link '
         'joins the Super Smash Bros. series of amiibo figures!\n'
         '\n'
         'https://www.nintendo.com/amiibo/line-up/',
 'time': datetime.datetime(2019, 4, 30, 5, 0, 1),
 'user_id': '119240841493711',
 'username': 'Nintendo',
 'video': None,
 'video_id': None,
 'video_thumbnail': None,
 'w3_fb_url': 'https://www.facebook.com/Nintendo/posts/2257188721032235'}

Notes

There is no guarantee that every field will be extracted (they might be None).
Group posts may be missing some fields like time and post_url.
Group scraping may return only one page and not work on private groups.
If you scrape too much, Facebook might temporarily ban your IP.
The vast majority of unique IDs on facebook (post IDs, video IDs, photo IDs, comment IDs, profile IDs, etc) can be appended to https://www.facebook.com/ to result in a redirect to the corresponding object.

Profiles

The get_profile function can extract information from a profile's about section. Pass in the account name or ID as the first parameter.
Note that Facebook serves different information depending on whether you're logged in (cookies parameter), such as Date of birth and Gender. Usage:

from facebook_scraper import get_profile
get_profile("zuck") # Or get_profile("zuck", cookies="cookies.txt")

Outputs:

{'About': "I'm trying to make the world a more open place.",
 'Education': 'Harvard University\n'
              'Computer Science and Psychology\n'
              '30 August 2002 - 30 April 2004\n'
              'Phillips Exeter Academy\n'
              'Classics\n'
              'School year 2002\n'
              'Ardsley High School\n'
              'High School\n'
              'September 1998 - June 2000',
 'Favourite Quotes': '"Fortune favors the bold."\n'
                     '- Virgil, Aeneid X.284\n'
                     '\n'
                     '"All children are artists. The problem is how to remain '
                     'an artist once you grow up."\n'
                     '- Pablo Picasso\n'
                     '\n'
                     '"Make things as simple as possible but no simpler."\n'
                     '- Albert Einstein',
 'Name': 'Mark Zuckerberg',
 'Places lived': [{'link': '/profile.php?id=104022926303756&refid=17',
                   'text': 'Palo Alto, California',
                   'type': 'Current town/city'},
                  {'link': '/profile.php?id=105506396148790&refid=17',
                   'text': 'Dobbs Ferry, New York',
                   'type': 'Home town'}],
 'Work': 'Chan Zuckerberg Initiative\n'
         '1 December 2015 - Present\n'
         'Facebook\n'
         'Founder and CEO\n'
         '4 February 2004 - Present\n'
         'Palo Alto, California\n'
         'Bringing the world closer together.'}

To extract friends, pass the argument friends=True, or to limit the amount of friends retrieved, set friends to the desired number.

Group info

The get_group_info function can extract info about a group. Pass in the group name or ID as the first parameter.
Note that in order to see the list of admins, you need to be logged in (cookies parameter).

Usage:

from facebook_scraper import get_group_info
get_group_info("latesthairstyles") # or get_group_info("latesthairstyles", cookies="cookies.txt")

Output:

{'admins': [{'link': '/africanstylemagazinecom/?refid=18',
             'name': 'African Style Magazine'},
            {'link': '/connectfluencer/?refid=18',
             'name': 'Everythingbrightandbeautiful'},
            {'link': '/Kaakakigroup/?refid=18', 'name': 'Kaakaki Group'},
            {'link': '/opentohelp/?refid=18', 'name': 'Open to Help'}],
 'id': '579169815767106',
 'members': 6814229,
 'name': 'HAIRSTYLES',
 'type': 'Public group'}

To-Do

Async support
~~Image galleries~~ (images entry)
~~Profiles or post authors~~ (get_profile())
~~Comments~~ (with options={'comments': True})

Alternatives and related projects

facebook-post-scraper. Has comments. Uses Selenium.
facebook-scraper-selenium. "Scrape posts from any group or user into a .csv file without needing to register for any API access".
Ultimate Facebook Scraper. "Scrapes almost everything about a Facebook user's profile". Uses Selenium.
Unofficial APIs. List of unofficial APIs for various services, none for Facebook for now, but might be worth to check in the future.
major-scrapy-spiders. Has a profile spider for Scrapy.
facebook-page-post-scraper. Seems abandoned.
- FBLYZE. Fork (?).
RSSHub. Generates an RSS feed from Facebook pages.
RSS-Bridge. Also generates RSS feeds from Facebook pages.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.59

Aug 31, 2022

0.2.58

Jul 20, 2022

0.2.57

Jun 28, 2022

0.2.56

May 11, 2022

0.2.55

Mar 30, 2022

0.2.54

Feb 18, 2022

0.2.53

Feb 10, 2022

0.2.52

Jan 14, 2022

0.2.51

Dec 20, 2021

0.2.50

Nov 10, 2021

0.2.49

Nov 1, 2021

0.2.48

Oct 20, 2021

0.2.47

Sep 29, 2021

0.2.46

Sep 7, 2021

0.2.45

Jul 12, 2021

This version

0.2.44

Jul 4, 2021

0.2.43

Jun 22, 2021

0.2.42

Jun 8, 2021

0.2.41

Jun 2, 2021

0.2.40

Jun 1, 2021

0.2.39

May 31, 2021

0.2.38

May 25, 2021

0.2.37

May 19, 2021

0.2.36

May 14, 2021

0.2.35

May 10, 2021

0.2.34

May 6, 2021

0.2.33

May 3, 2021

0.2.32

May 3, 2021

0.2.30

Apr 27, 2021

0.2.29

Apr 20, 2021

0.2.28

Apr 11, 2021

0.2.27

Apr 1, 2021

0.2.26

Mar 28, 2021

0.2.25

Mar 20, 2021

0.2.24

Mar 15, 2021

0.2.23 yanked

Mar 15, 2021

Reason this release was yanked:

Don't use, throws exception

0.2.22

Mar 12, 2021

0.2.21

Mar 10, 2021

0.2.20

Mar 8, 2021

0.2.19

Feb 4, 2021

0.2.18

Jan 12, 2021

0.2.17

Nov 10, 2020

0.2.15

Nov 9, 2020

0.2.14

Nov 8, 2020

0.2.13

Oct 5, 2020

0.2.12

Sep 17, 2020

0.2.11

Sep 16, 2020

0.2.10

Aug 21, 2020

0.2.9

Aug 6, 2020

0.2.8

Aug 1, 2020

0.2.7

Jul 30, 2020

0.2.6

Jul 20, 2020

0.2.5

Jul 17, 2020

0.2.4

Jul 8, 2020

0.2.3

Jun 29, 2020

0.2.3a0 pre-release

Jun 24, 2020

0.2.2a3 pre-release

Jun 15, 2020

0.2.2a2 pre-release

May 28, 2020

0.2.2a1 pre-release

May 28, 2020

0.2.2a0 pre-release

May 25, 2020

0.2.1a0 pre-release

May 24, 2020

0.1.12

Apr 16, 2020

0.1.11

Apr 9, 2020

0.1.10

Mar 25, 2020

0.1.9

Mar 12, 2020

0.1.8

Jan 23, 2020

0.1.7

Jan 19, 2020

0.1.6

Dec 31, 2019

0.1.5

Oct 18, 2019

0.1.4

Sep 8, 2019

0.1.3

Jul 31, 2019

0.1.2

Apr 30, 2019

0.1.1

Mar 15, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

facebook-scraper-0.2.44.tar.gz (35.0 kB view details)

Uploaded Jul 4, 2021 Source

Built Distribution

facebook_scraper-0.2.44-py3-none-any.whl (33.4 kB view details)

Uploaded Jul 4, 2021 Python 3

File details

Details for the file facebook-scraper-0.2.44.tar.gz.

File metadata

Download URL: facebook-scraper-0.2.44.tar.gz
Upload date: Jul 4, 2021
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.6 CPython/3.8.5 Linux/4.19.128-microsoft-standard

File hashes

Hashes for facebook-scraper-0.2.44.tar.gz
Algorithm	Hash digest
SHA256	`fce3e942a51ec279d5fa34da224df0e4fa33a16f00a6299d0733883096e1a457`
MD5	`82394f314478ea14b6967fe569ec4dd9`
BLAKE2b-256	`cd216ae8a9005c7c323740d22a8900c8537fac345058b06a443e0870cb1c8417`

See more details on using hashes here.

File details

Details for the file facebook_scraper-0.2.44-py3-none-any.whl.

File metadata

Download URL: facebook_scraper-0.2.44-py3-none-any.whl
Upload date: Jul 4, 2021
Size: 33.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.6 CPython/3.8.5 Linux/4.19.128-microsoft-standard

File hashes

Hashes for facebook_scraper-0.2.44-py3-none-any.whl
Algorithm	Hash digest
SHA256	`18011cc05189fe14103942a4f8942ffc8266a3909e606571d87f9edcf1cd4f3d`
MD5	`51606f9c2fff0fe2e8e915a1d3024522`
BLAKE2b-256	`1e50515fa225db8fc802cea2ad53f8346481b30b33973a719e8d6a8b7266942c`