Machine assisted sentiment analysis on NYTimes comments

TLDR: I extract the comments from a recent NYTimes article on a proposed public transit option and perform sentiment analysis on the comment text using a "Machine Learning as a service" API to analyze public reaction to the news release. I operationalize Albert Wenger's "Future of Programming" post by stitching together various web services.  + Google Sheets + Blockspring + Indico + Tableau

Courtesy: NYTimes & Friends of the Brooklyn Queens Connector

Courtesy: NYTimes & Friends of the Brooklyn Queens Connector

Here is my stream of consciousness as I read the Times article on the proposed streetcar to connect Brooklyn and Queens:

  1. Woah - shiny new train!
  2. It is hellish to get to anywhere between Brooklyn and Queens - glad to see the city taking steps to address this.
  3. Wait, shiny new train that travels at 12 mph and will be ready only in 2024? 
  4. (From wife) Street cars remind me of charming European cities.
  5. What about buses? - the SBS is pretty ok and you can make them shiny and they can be here <2024!
  6. Hey, that's my school's logo on the bus! 
  7. I wonder what everyone else is saying ...

..and hence this post. Since I was not entirely sure how I felt about a streetcar connecting BKLYN-QNS, I thought it would be cool to navigate the spectrum of opinion that internet comments offer. Perhaps, this could be useful for policy makers who announce a new program and want to gauge public response. This is by far not representative of the universe of all public reaction on this topic but this article was on the front page of The New York Times on Feb 3, 2016 and they moderate their comments, so there is some filtering of internet-isms.

This post is also influenced by Albert Wenger's blogpost on the future of programming (yes, same dude who is proposing a Basic Minimum Income Guarantee)  that explains stitching together digital "services" rather than putting together blocks of code to achieve a certain digital task. 

First, I found and tweaked code to extract NYTimes comments from Neal Caren's blog post that basically constructs the following URL:<ARTICLE-URL>&offset=<INCREMENT-BY-25-FOR-ECACH-PAGE>&sort=newest

Here is the raw comment output for the streetcar article

The available fields from each comment are:

  •     commentID
  •     status
  •     commentSequence
  •     userID
  •     userDisplayName
  •     userLocation
  •     userTitle
  •     userURL
  •     commentTitle
  •     commentBody
  •     createDate
  •     updateDate
  •     approveDate
  •     recommendations
  •     editorsSelection
  •     commentType
  •     trusted
  •     recommendedFlag
  •     reportAbuseFlag
  •     permID
  •     timespeople
  •     sharing

Once I extract all the comments, I throw them up on a Google spreadsheet and use Blockspring to connect to Indico, to do sentiment analysis on each comment. is a Machine Learning as a service company. They abstract away all the complex and "important to understand" math behind the cutting edge machine learning approaches and make it really easy to use so that you don't need an applied math degree to use these techniques as I hope to demonstrate here.

Blockspring allows users to extend google sheets by connecting tabular data with a host of digital services ranging from amazon products to image recognition and in this case, sentiment analysis. Its a freemium service and I am using the free version of the service.

This is how easy it is to call Indico's sentiment analysis API into a plain-jane Google sheet using Blockspring.

=BLOCKSPRING("higher-quality-sentiment-analysis-indico", "text","i think this is a terrible idea!") returns a sentiment score of 0.004784229677 


=BLOCKSPRING("higher-quality-sentiment-analysis-indico", "text","i think this is a fantastic idea!") returns a sentiment score of 0.9275863767

I proceed to do this for all comments and then throw it up on  Tableau. What you see here is a dashboard showing comment activity on the streetcar article overlaid with sentiment scores and other bells and whistles.

Some of the sentiment analysis results are way off but most are pretty good. Some high level observations from the sentiment histogram reveals that opinion is divided with close to 14% of the comments having very negative sentiments while the next highest group ( 8% having very positive comments). Looking at the commenting behavior over time ; close to half the comments were generated within the first 3 hours of the article (11 PM - 2 AM).

The average sentiment is 0.4254 while the median is 0.3564. On its documentation, Indico states that "Values greater than 0.5 indicate positive sentiment, while values less than 0.5 indicate negative sentiment.", so overall sentiment is negative based on the response to this article.

Click to interact with dashboard

Click to interact with dashboard


As always "comments" / feedback / critique / any other ideas appreciated.

NYC Trolley Throwback video.

Varun, Team ARGO

PS: Advance apologies for any grammatical infractions

Print Friendly and PDF