How to Download a Facebook Page Posts and Comments to Excel with Python

Are you a social media marketer? Do you want to track your competitor’s Facebook page? Or, are you a social media researcher looking to extract publically available date on a Facebook page? This tutorial is for you!

This tutorial uses Facebook Graph API, a legal way of mining Facebook data, to extract info from public pages. Through the Graph API, you can download Facebook page posts and comments to Excel.

Number of reactions for Facebook posts

If you don’t have time to do this by hand and want the best tool for the job, you can sign up for a free trial, no credit card required, with Quintly, this cloud-based Facebook Scraper Tool.

It’s simple to use and has a lot of functionalities. You can also use it to scrape Instagram, YouTube, Google+, Twitter, LinkedIn, and Pinterest.

The tutorial will start with simple Facebook API calls using your browser, then will transition to automation using Python script. This might sound technical, but no worries- no programming knowledge is required! Later, I will introduce you a cloud-based Facebook scraper tool I use to save time.

Part 1: Scraping Facebook Page posts and Comments with URL queries

Step 1: Get a Facebook Graph API Access Token

To begin, you must get an access token from the Facebook Graph API explorer. Just log in to your regular Facebook account to get the access token which has a lifespan of two hours.

Graph API explorer Getting access token

Step 2: Scrapping Facebook Page posts with URL queries

To scrape Facebook Page Posts do the following query with your access token obtained from step 1.

Let us use Barack Obama page as an example. Some Facebook pages don’t have a short clean URL like the one for Barack Obama page does. It might be something like  “https://www.facebook.com/Funny-Dog-Videos-691496887527705/”, which uses “691496887527705” as a page name. For now, we’ll stick with the simpler URL of Obama’s page.

Barack Obama Facebook Page

To extract posts from Barack Obama page since 01-January-2018 to 30-April-2018, we copy:

https://graph.facebook.com/v2.9/barackobama/posts/?fields=message,link,permalink_url,created_time,type,name,id,comments.limit(0).summary(true),shares,likes.limit(0).summary(true),reactions.limit(0).summary(true)&until=2018-04-30&since=2018-01-01&limit=100&access_token=EAACEdxxx

You can replace the “barackobama” page name and access token with yours.

Below is the result we get from above query:

Facebook Page Data in JSON

Facebook Graph API data in JSON

You can copy and convert the returned JSON to CSV   as below.

JSON to CSV conversion online

Also, you can click  “before” and “after” cursors to move through a thread of results but as you can see the process is a bit tedious, This is where automation comes into place using Python.

Step 3: Scraping Facebook Page comments with URL queries

You can download comments for a specific post using below query:

https://graph.facebook.com/{post-ID}/comments?access_token=XXX

You can get the post-ID from the Excel sheet downloaded in the previous step #2 and access token from step #1.

Facebook Posts IDs

The query will look like this:

https://graph.facebook.com/6815841748_10155804349451749/comments?access_token=EAACEd

Querying Facebook Graph API comments in JSON

You can copy the above comments and use JSON to CSV converter to get them into CSV.

Comments to CSV

We can harvest data by directly pasting URLs into the browser, but it gets tedious if you want to get more data. Also, there are some limitations with this method.

In the next part, I will use Python to automate the process.

Part 2: Scraping Facebook Page Posts and Comments with Python

Don’t worry if you don’t know anything about programming; I have a working script which I’m going to use in the tutorial. You can use it, too.

Step 1: Install Python 3 and a Code Editor

Go to the Python website and install Python 3. Follow this guide on how to install Python correctly on Windows, Linux, and Mac. In the guide, Atom Code editor is recommended, but personally, I use Notepadd++ for Windows.

If you have followed the guide correctly below is what you should see in Windows:

Python3 installation test on Windows PowerShell

Step 2: Edit and Save the Python code on your computer

If you have followed the previous steps, you should have three things: the access token, the Facebook page to scrape, and the preferred period.

If your access token has expired (they expire in two hours), just click the get token button again to get a new one.

From the below script, replace everything within the quotation marks with your own code: “page_id”, “since_date”,”until_dare”,and “access_token”.

Scrape Facebook posts script attributes

Click to View Code below. You can copy the code and paste it on your editor and save it to a folder as a .py file which we will access later from the command line.

Click to View Code
import json
import datetime
import csv
import time
try:
    from urllib.request import urlopen, Request
except ImportError:
    from urllib2 import urlopen, Request

#app_id = "<FILL IN>"
#app_secret = "<FILL IN>"  # DO NOT SHARE WITH ANYONE!
page_id = "barackobama"

# input date formatted as YYYY-MM-DD
since_date = "2018-01-01"
until_date = "2018-04-30"

#access_token = app_id + "|" + app_secret
access_token = "EAACEdEose0"

def request_until_succeed(url):
    req = Request(url)
    success = False
    while success is False:
        try:
            response = urlopen(req)
            if response.getcode() == 200:
                success = True
        except Exception as e:
            print(e)
            time.sleep(5)

            print("Error for URL {}: {}".format(url, datetime.datetime.now()))
            print("Retrying.")

    return response.read()


# Needed to write tricky unicode correctly to csv
def unicode_decode(text):
    try:
        return text.encode('utf-8').decode()
    except UnicodeDecodeError:
        return text.encode('utf-8')


def getFacebookPageFeedUrl(base_url):

    # Construct the URL string; see http://stackoverflow.com/a/37239851 for
    # Reactions parameters
    fields = "&fields=message,link,permalink_url,created_time,type,name,id," + \
        "comments.limit(0).summary(true),shares,reactions" + \
        ".limit(0).summary(true)"

    return base_url + fields


def getReactionsForStatuses(base_url):

    reaction_types = ['like', 'love', 'wow', 'haha', 'sad', 'angry']
    reactions_dict = {}   # dict of {status_id: tuple<6>}

    for reaction_type in reaction_types:
        fields = "&fields=reactions.type({}).limit(0).summary(total_count)".format(
            reaction_type.upper())

        url = base_url + fields

        data = json.loads(request_until_succeed(url))['data']

        data_processed = set()  # set() removes rare duplicates in statuses
        for status in data:
            id = status['id']
            count = status['reactions']['summary']['total_count']
            data_processed.add((id, count))

        for id, count in data_processed:
            if id in reactions_dict:
                reactions_dict[id] = reactions_dict[id] + (count,)
            else:
                reactions_dict[id] = (count,)

    return reactions_dict


def processFacebookPageFeedStatus(status):

    # The status is now a Python dictionary, so for top-level items,
    # we can simply call the key.

    # Additionally, some items may not always exist,
    # so must check for existence first

    status_id = status['id']
    status_type = status['type']

    status_message = '' if 'message' not in status else \
        unicode_decode(status['message'])
    link_name = '' if 'name' not in status else \
        unicode_decode(status['name'])
    status_link = '' if 'link' not in status else \
        unicode_decode(status['link'])
    status_permalink_url = '' if 'permalink_url' not in status.keys() else \
            unicode_decode(status['permalink_url'])

    # Time needs special care since a) it's in UTC and
    # b) it's not easy to use in statistical programs.

    status_published = datetime.datetime.strptime(
        status['created_time'], '%Y-%m-%dT%H:%M:%S+0000')
    status_published = status_published + \
        datetime.timedelta(hours=-5)  # EST
    status_published = status_published.strftime(
        '%Y-%m-%d %H:%M:%S')  # best time format for spreadsheet programs

    # Nested items require chaining dictionary keys.

    num_reactions = 0 if 'reactions' not in status else \
        status['reactions']['summary']['total_count']
    num_comments = 0 if 'comments' not in status else \
        status['comments']['summary']['total_count']
    num_shares = 0 if 'shares' not in status else status['shares']['count']

    return (status_id, status_message, link_name, status_type, status_link,status_permalink_url,
            status_published, num_reactions, num_comments, num_shares)


def scrapeFacebookPageFeedStatus(page_id, access_token, since_date, until_date):
    with open('{}_facebook_statuses.csv'.format(page_id), 'w',encoding="utf-8") as file:
        w = csv.writer(file)
        w.writerow(["status_id", "status_message", "link_name", "status_type",
                    "status_link","permalink_url", "status_published", "num_reactions",
                    "num_comments", "num_shares", "num_likes", "num_loves",
                    "num_wows", "num_hahas", "num_sads", "num_angrys",
                    "num_special"])

        has_next_page = True
        num_processed = 0
        scrape_starttime = datetime.datetime.now()
        after = ''
        base = "https://graph.facebook.com/v2.9"
        node = "/{}/posts".format(page_id)
        parameters = "/?limit={}&access_token={}".format(100, access_token)
        since = "&since={}".format(since_date) if since_date \
            is not '' else ''
        until = "&until={}".format(until_date) if until_date \
            is not '' else ''

        print("Scraping {} Facebook Page: {}\n".format(page_id, scrape_starttime))

        while has_next_page:
            after = '' if after is '' else "&after={}".format(after)
            base_url = base + node + parameters + after + since + until

            url = getFacebookPageFeedUrl(base_url)
            statuses = json.loads(request_until_succeed(url))
            reactions = getReactionsForStatuses(base_url)

            for status in statuses['data']:

                # Ensure it is a status with the expected metadata
                if 'reactions' in status:
                    status_data = processFacebookPageFeedStatus(status)
                    reactions_data = reactions[status_data[0]]

                    # calculate thankful/pride through algebra
                    num_special = status_data[7] - sum(reactions_data)
                    w.writerow(status_data + reactions_data + (num_special,))

                num_processed += 1
                if num_processed % 100 == 0:
                    print("{} Statuses Processed: {}".format
                          (num_processed, datetime.datetime.now()))

            # if there is no next page, we're done.
            if 'paging' in statuses:
                after = statuses['paging']['cursors']['after']
            else:
                has_next_page = False

        print("\nDone!\n{} Statuses Processed in {}".format(
              num_processed, datetime.datetime.now() - scrape_starttime))


if __name__ == '__main__':
    scrapeFacebookPageFeedStatus(page_id, access_token, since_date, until_date)

Step 3: Scraping Facebook Posts with Reactions data count

Here is an example of how I have saved the Python code on my computer. You can name the file anything you want, make sure the file has the .py extension.

Computer Folder with Scrape Posts Python code

Now open your command line program, PowerShell, on Windows. Navigate to the folder with your downloaded script, which I described in the previous steps.

Powershell navigating to folder

Once you are there, type “python scriptname.py” to run the script and start scraping. For me, it is python scrape_posts.py

PowerShell running a script to scrape Facebook Posts

As you can see, it seems that Obama hasn’t posted much— only seven posts since the beginning of 2018 to end of April.

The script saves the output CSV file in the same folder as the Python code.

Computer folder with output from scraping Facebook page posts

The output file content looks like the image below:

CSV with Facebook posts reactions data

Step 4: Scraping Facebook Comments

Now that we have scraped the Facebook posts, it’s time to get the comments associated with the posts. Make sure the output file from the previous step is still intact in the same folder.

Below is a new code for you to use when scraping. You use this code the same way as the previous one. If you don’t want to get comments on some posts, you can delete the posts in the Excel sheet obtained from the last step. This new script reads that Excel file.

The procedure to run the code is same as in the previous step, input the same Facebook page name on the “file_id” field and your “access _token”  used in the last step.

scrape Facebook Comments Credentials

Click to View Code below. You can copy the code and paste it on your editor and save it to a folder as a .py file which we will access later from the command line. I save my code as “scrape_comments.py”

Computer folder with script to scrape Facebook comments

Click to View Code
import json
import datetime
import csv
import time
try:
    from urllib.request import urlopen, Request
except ImportError:
    from urllib2 import urlopen, Request

#app_id = "<FILL IN>"
#app_secret = "<FILL IN>"  # DO NOT SHARE WITH ANYONE!
file_id = "barackobama"

#access_token = app_id + "|" + app_secret
access_token = "EAACEdEose0"

def request_until_succeed(url):
    req = Request(url)
    success = False
    while success is False:
        try:
            response = urlopen(req)
            if response.getcode() == 200:
                success = True
        except Exception as e:
            print(e)
            time.sleep(5)

            print("Error for URL {}: {}".format(url, datetime.datetime.now()))
            print("Retrying.")

    return response.read()

# Needed to write tricky unicode correctly to csv


def unicode_decode(text):
    try:
        return text.encode('utf-8').decode()
    except UnicodeDecodeError:
        return text.encode('utf-8')


def getFacebookCommentFeedUrl(base_url):

    # Construct the URL string
    fields = "&fields=id,message,reactions.limit(0).summary(true)" + \
        ",created_time,comments,from,attachment"
    url = base_url + fields

    return url


def getReactionsForComments(base_url):

    reaction_types = ['like', 'love', 'wow', 'haha', 'sad', 'angry']
    reactions_dict = {}   # dict of {status_id: tuple<6>}

    for reaction_type in reaction_types:
        fields = "&fields=reactions.type({}).limit(0).summary(total_count)".format(
            reaction_type.upper())

        url = base_url + fields

        data = json.loads(request_until_succeed(url))['data']

        data_processed = set()  # set() removes rare duplicates in statuses
        for status in data:
            id = status['id']
            count = status['reactions']['summary']['total_count']
            data_processed.add((id, count))

        for id, count in data_processed:
            if id in reactions_dict:
                reactions_dict[id] = reactions_dict[id] + (count,)
            else:
                reactions_dict[id] = (count,)

    return reactions_dict


def processFacebookComment(comment, status_id, parent_id=''):

    # The status is now a Python dictionary, so for top-level items,
    # we can simply call the key.

    # Additionally, some items may not always exist,
    # so must check for existence first

    comment_id = comment['id']
    comment_message = '' if 'message' not in comment or comment['message'] \
        is '' else unicode_decode(comment['message'])

    num_reactions = 0 if 'reactions' not in comment else \
        comment['reactions']['summary']['total_count']

    if 'attachment' in comment:
        attachment_type = comment['attachment']['type']
        attachment_type = 'gif' if attachment_type == 'animated_image_share' \
            else attachment_type
        attach_tag = "[[{}]]".format(attachment_type.upper())
        comment_message = attach_tag if comment_message is '' else \
            comment_message + " " + attach_tag

    # Time needs special care since a) it's in UTC and
    # b) it's not easy to use in statistical programs.

    comment_published = datetime.datetime.strptime(
        comment['created_time'], '%Y-%m-%dT%H:%M:%S+0000')
    comment_published = comment_published + datetime.timedelta(hours=-5)  # EST
    comment_published = comment_published.strftime(
        '%Y-%m-%d %H:%M:%S')  # best time format for spreadsheet programs

    # Return a tuple of all processed data

    return (comment_id, status_id, parent_id, comment_message,
            comment_published, num_reactions)


def scrapeFacebookPageFeedComments(page_id, access_token):
    with open('{}_facebook_comments.csv'.format(file_id), 'w',encoding="utf-8") as file:
        w = csv.writer(file)
        w.writerow(["comment_id", "status_id", "parent_id", "comment_message",
                    "comment_published", "num_reactions",
                    "num_likes", "num_loves", "num_wows", "num_hahas",
                    "num_sads", "num_angrys", "num_special"])

        num_processed = 0
        scrape_starttime = datetime.datetime.now()
        after = ''
        base = "https://graph.facebook.com/v2.9"
        parameters = "/?limit={}&access_token={}".format(
            100, access_token)

        print("Scraping {} Comments From Posts: {}\n".format(
            file_id, scrape_starttime))

        with open('{}_facebook_statuses.csv'.format(file_id), 'r',encoding="utf-8") as csvfile:
            reader = csv.DictReader(csvfile)

            # Uncomment below line to scrape comments for a specific status_id
            # reader = [dict(status_id='5550296508_10154352768246509')]

            for status in reader:
                has_next_page = True

                while has_next_page:

                    node = "/{}/comments".format(status['status_id'])
                    after = '' if after is '' else "&after={}".format(after)
                    base_url = base + node + parameters + after

                    url = getFacebookCommentFeedUrl(base_url)
                    # print(url)
                    comments = json.loads(request_until_succeed(url))
                   
                    reactions = getReactionsForComments(base_url)
                    
                    for comment in comments['data']:
                        comment_data = processFacebookComment(
                            comment, status['status_id'])
                        reactions_data = reactions[comment_data[0]]

                        # calculate thankful/pride through algebra
                        num_special = comment_data[5] - sum(reactions_data)
                        w.writerow(comment_data + reactions_data +
                                   (num_special, ))

                        if 'comments' in comment:
                            has_next_subpage = True
                            sub_after = ''

                            while has_next_subpage:
                                sub_node = "/{}/comments".format(comment['id'])
                                sub_after = '' if sub_after is '' else "&after={}".format(
                                    sub_after)
                                sub_base_url = base + sub_node + parameters + sub_after

                                sub_url = getFacebookCommentFeedUrl(
                                    sub_base_url)
                                sub_comments = json.loads(
                                    request_until_succeed(sub_url))
                                sub_reactions = getReactionsForComments(
                                    sub_base_url)

                                for sub_comment in sub_comments['data']:
                                    sub_comment_data = processFacebookComment(
                                        sub_comment, status['status_id'], comment['id'])
                                    sub_reactions_data = sub_reactions[
                                        sub_comment_data[0]]

                                    num_sub_special = sub_comment_data[
                                        5] - sum(sub_reactions_data)
                                    w.writerow(sub_comment_data +
                                               sub_reactions_data + (num_sub_special,))
                                    
                                    num_processed += 1
                                    if num_processed % 100 == 0:
                                        print("{} Comments Processed: {}".format(
                                            num_processed,
                                            datetime.datetime.now()))

                                if 'paging' in sub_comments:
                                    if 'next' in sub_comments['paging']:
                                        sub_after = sub_comments[
                                            'paging']['cursors']['after']
                                    else:
                                        has_next_subpage = False
                                else:
                                    has_next_subpage = False

                        # output progress occasionally to make sure code is not
                        # stalling
                        num_processed += 1
                        if num_processed % 100 == 0:
                            print("{} Comments Processed: {}".format(
                                num_processed, datetime.datetime.now()))

                    if 'paging' in comments:
                        if 'next' in comments['paging']:
                            after = comments['paging']['cursors']['after']
                        else:
                            has_next_page = False
                    else:
                        has_next_page = False

        print("\nDone!\n{} Comments Processed in {}".format(
            num_processed, datetime.datetime.now() - scrape_starttime))


if __name__ == '__main__':
    scrapeFacebookPageFeedComments(file_id, access_token)


# The CSV can be opened in all major statistical programs. Have fun! 🙂

On the command line navigate to the folder with the script and posts CSV extracted from the previous step and run the script.

Scraping Facebook comments in action

Those seven posts we extracted earlier have 3465 comments!

Scraping Facebook comments finished

Computer folder with extracted Facebook comments

The output file content looks like below.

Facebook comments in CSV

Part 3: Scraping Facebook Page posts and Comments with Quintly a Cloud-based Facebook Data Scraper

As I mentioned earlier, the above procedures are not for everyone. They are very raw, time-consuming, and involve some manual work.

For example, if you want to be updated on the performance of the posts or download new comments on a daily basis, then you would have to redo the work every single day.

One of the tools I use and recommend for scraping Facebook posts, comments and other social media platforms is Quintly. It’s a cloud-based Scraper that works 24/7 for you, calling the APIs and aggregating the data in one interface.

They have a 14-day free trial.

Let’s continue with our previous example, this time using Quintly.

Step 1: Search for a Profile, Select and add it

To search for Facebook, Instagram, RSS Feed or Pinterest profiles, insert the entire URL link into the search box.

Search for profiles on Quintly

Barack Obama profile added on my dashboard.

Quintly dashboard

I’m on a Medium package, and I can add up to 15 profiles of either Facebook, Twitter, Instagram, Google+, Youtube, LinkedIn, and Pinterest.

Quintly medium package

Step 2: Scrape Facebook Posts by adding “Own Posts Table” metric

In our case, we want posts reactions. So I have added “Own Posts Table”  metric and set the date range from January 1, 2018, to April 30, 2018.

Quintly Own Posts Table metric

Then you can export Facebook posts to Excel

Exporting Facebook posts data on Quintly

The output file content looks like below.

Quintly exported Facebook data

Step 2: Scrape Facebook Posts Comments by adding “Comments Table” metric

Since we want posts comments, I have added the “Comments Table” metric and set the date range from January 1, 2018 to April 30, 2018.

Quintly comments table metricThen you can export Facebook comments into Excel

Exporting Facebook comments on Quintly

The output file content looks like below.

Comments in CSV exported by Quintly

Conclusion

As you can see, this method is very effective! If you want to save time and improve your business or research, I recommend you sign up for Quintly. You can scrape Facebook automatically, as well as other social media pages on Twitter, LinkedIn, Youtube, Google+, Pinterest, and Instagram.

Hope you have enjoyed the tutorial! If you have any questions, don’t hesitate to drop a comment.

Frequently Asked Questions

Can you scrape Facebook Group Emails?

Scraping Facebook emails or emails left in Facebook groups is not supported by the Facebook API.

Can you scrape Facebook Profiles?

Scraping Facebook profiles is not supported by the Facebook API.

Can you scrape Facebook Groups?

Yes, it’s possible to scrape Open Public Groups with the API but not Private Groups even if you are a member or an admin

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Pin It on Pinterest