Sentiment Analysis on iOS Using SwiftUI, Natural Language, and Combine: Hacker News Top Stories

Powering applications with the ability to understand the natural language of the text always amazes me. Apple made some significant strides with its Natural Language framework last year (2019). Specifically, the introduction of a built-in sentiment analysis feature can only help build smarter NLP-based iOS Applications.

Besides the improvements to the Natural Language framework, SwiftUI and Combine were the two biggies that were introduced during WWDC 2019.

SwiftUI is a declarative framework written in Swift that helps developers build user interfaces quickly. Combine, on the other hand, is Apple’s own reactive programming framework, designed to power modern application development, especially when handling asynchronous tasks.

Our Goal

We’ll be using the Hacker News API to fetch the top stories using a Combine-powered URLSession.
Subsequently, we’ll run Natural Language’s built-in sentiment analysis over the top-level comments of each story to get an idea of the general reaction.
Over the course of the tutorial, we’ll see how reactive programming makes it easier to chain multiple network requests and transform and pass the results to the Subscriber.

Getting Started

To start, let’s create a new Xcode SwiftUI project. We’ll be using the official Hacker News API, which offers almost real-time data.

In order to create a SwiftUI List that holds the top stories from Hacker News, we need to set up our ObservableObject class. This class is responsible for fetching the stories from the API and passing them on to the SwiftUI List. The following code does that for you:

class HNStoriesFeed : ObservableObject{

    @Published var storyItems = [StoryItem]()
    var urlBase = "https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty"

    var cancellable : Set<AnyCancellable> = Set()
        
    private var topStoryIds = [Int]() {
        didSet {
            fetchStoryById(ids: topStoryIds.prefix(10))
        }
    }

    init() {
        fetchTopStories()
    }

    func fetchStoryById<S>(ids: S) where S: Sequence, S.Element == Int{

        Publishers.MergeMany(ids.map{FetchItem(id: $0)})
        .collect()
        .receive(on: DispatchQueue.main)
        .sink(receiveCompletion: {
            if case let .failure(error) = $0 {
                print(error)
            }
        }, receiveValue: {
            self.storyItems = self.storyItems + $0
        })
        .store(in: &cancellable)
        
    }
    
    func fetchTopStories(){

        URLSession.shared.dataTaskPublisher(for: URL(string: "(urlBase)")!)
        .map{$0.data}
        .decode(type: [Int].self, decoder: JSONDecoder())
        .sink(receiveCompletion: { completion in
          switch completion {
          case .failure(let error):
            print("Something went wrong: (error)")
          case .finished:
            print("Received Completion")
          }
        }, receiveValue: { value in
            self.topStoryIds = value
        })
        .store(in: &cancellable)

    }
}

There’s a lot happening in the above code. Let’s break it down:

fetchTopStories is responsible for returning an array of integer ids for the stories.
To save time, we’re passing the top 10 story identifiers to the fetchStoryById function, where we’re fetching the Hacker News stories using a custom publisher FetchItem and merging the results.
The collect() operator of Combine is responsible for merging all the stories fetched from the API into a single array.

Let’s look at how to construct our custom Combine publisher next.

Creating a Custom Combine Publisher

To create a custom publisher, we need to conform the struct to the Publisher protocol and set the Output and Failure types of the stream as shown below:

struct FetchItem: Publisher {
    typealias Output = StoryItem
    typealias Failure = Error

    let id: Int
    
    func receive<S>(subscriber: S) where S: Subscriber, Failure == S.Failure, Output == S.Input {
        let request = URLRequest(url: URL(string: "https://hacker-news.firebaseio.com/v0/item/(id).json")!)
        URLSession.DataTaskPublisher(request: request, session: URLSession.shared)
            .map { $0.0 }
            .decode(type: StoryItem.self, decoder: JSONDecoder())
            .receive(subscriber: subscriber)
    }
}

The id defined represents the story identifier that’s passed in the initializer.
Implementing the receive(subscriber:) method is crucial. It connects the publisher to the subscriber, and we need to ensure that the output from the publisher has the same type as the input to the subscriber.
Inside the receive<S>(subscriber: S) method, we’re making another API request. This time, we’re fetching the story and decoding it using a StoryItem model, which is defined below:

The array of StoryItems is then published to the SwiftUI view to get which has a built-in subscriber. The following code is responsible for displaying the Hacker News stories in the SwiftUI list:

struct ContentView: View {
    @ObservedObject var hnFeed = HNStoriesFeed()
    
    var body: some View {
        NavigationView{
            List(hnFeed.storyItems){ articleItem in
                
                NavigationLink(destination: LazyView(CommentView(commentIds: articleItem.kids ?? []))){
                    StoryListItemView(article: articleItem)
                }
            }
            .navigationBarTitle("Hacker News Stories")
        }
    }
}

struct StoryListItemView: View {
    var article: StoryItem
    
    var body: some View {
        
        VStack(alignment: .leading) {
            Text("(article.title ?? "")")
                .font(.headline)
            Text("Author: (article.by)")
                .font(.subheadline)
        }
    }
}

The NavigationLink is responsible for taking the user to the destination screen, where the comments are displayed. We’ve wrapped our destination view — CommentView—in a lazy view. This is done to load the destination views only when the user has navigated to that view. It’s a common pitfall in NavigationLink s.

Before we jump into the comments section and the subsequent sentiment analysis using NLP, let’s look at what we’ve built so far:

Fetching Hacker News Comments and Analyzing Sentiment Scores

The kids property in the StoryItem model contains the ids for the top-level comments. We’ll use a similar approach for multiple network requests as we did earlier, using Combine publishers.

The difference here is the inclusion of Natural Language’s built-in sentiment analysis to give a sentiment score to each comment, followed by calculating the mean sentiment score for that story.

The following code is from the HNCommentFeed class, which extends the ObservableObject:

class HNCommentFeed : ObservableObject{
    
    let nlTagger = NLTagger(tagSchemes: [.sentimentScore])
    let didChange = PassthroughSubject<Void, Never>()
    var cancellable : Set<AnyCancellable> = Set()
    
    @Published var sentimentAvg : String = ""
    
    var comments = [CommentItem](){
        didSet {
            
            var sumSentiments : Float = 0.0
            
            for item in comments{
                let floatValue = (item.sentimentScore as NSString).floatValue
                sumSentiments += floatValue
            }
            
            let  ave = (sumSentiments) / Float(comments.count)
            sentimentAvg = String(format: "%.2f", ave)
            didChange.send()
        }
    }
    
    private var commentIds = [Int]() {
        didSet {
            fetchComments(ids: commentIds.prefix(10))
        }
    }
    
    func fetchComments<S>(ids: S) where S: Sequence, S.Element == Int{

        Publishers.MergeMany(ids.map{FetchComment(id: $0, nlTagger: nlTagger)})
        .collect()
        .receive(on: DispatchQueue.main)
        .sink(receiveCompletion: {
            if case let .failure(error) = $0 {
                print(error)
            }
        }, receiveValue: {

            self.comments = self.comments + $0
        })
        .store(in: &cancellable)
    }

    func getIds(ids: [Int]){
        self.commentIds = ids
    }
}

The comments property, once fetched from the API using the custom publisher, is manually published by invoking didChange.send() once we’ve calculated the mean sentiment score and set it on the sentimentAvg property, which is a @Published property wrapper itself.

Before we look at the SwiftUI view that holds the comments with their respective scores, let’s look at the custom Combine publisher FetchComment, as shown below:

struct FetchComment: Publisher {
    typealias Output = CommentItem
    typealias Failure = Error

    //1
    let id: Int
    let nlTagger: NLTagger

    func receive<S>(subscriber: S) where S: Subscriber, Failure == S.Failure, Output == S.Input {
        let request = URLRequest(url: URL(string: "https://hacker-news.firebaseio.com/v0/item/(id).json")!)
        URLSession.DataTaskPublisher(request: request, session: URLSession.shared)
            .map { $0.data }
            .decode(type: CommentItem.self, decoder: JSONDecoder())
            .map{
                commentItem in
 
                //2
                let data = Data(commentItem.text?.utf8 ?? "".utf8)
                var commentString = commentItem.text
                
                
                if let attributedString = try? NSAttributedString(data: data, options: [.documentType: NSAttributedString.DocumentType.html], documentAttributes: nil) {
                    commentString = attributedString.string
                }
                
                //3
                self.nlTagger.string = commentString

                var sentimentScore = ""
                if let string = self.nlTagger.string{
                    //4
                    let (sentiment,_) = self.nlTagger.tag(at: string.startIndex, unit: .paragraph, scheme: .sentimentScore)
                    sentimentScore = sentiment?.rawValue ?? ""
                }

                //5
                let result = CommentItem(id: commentItem.id, text: commentString, sentimentScore: sentimentScore)
                return result
            }
            .print()
            .receive(subscriber: subscriber)
    }
}

Much like the previous custom publisher, we need to define the Output and Failure types. Besides that, we’re doing quite a number of things in the map operator to transform the CommentItem into another new instance, which holds the sentiment score as well.

Let’s look at the important ones that are marked with a comment.

Passing the id of the comment and nlTagger instance from the HNCommentFeed. The nlTagger is responsible for segmenting the text into sentence or paragraph units and processing the information in each part. In our case, we’ve set it to process the sentimentScore, which is a floating-point value between -1 to 1 based on how negative or positive the text is.
The comment’s text returned from the API request in the CommentItem instance is an HTML string. By retrieving the data part (using utf8), we’re converting it into a formatted string, devoid of the HTML escape characters.
Next, we’ve set the formatted string on the nlTagger’s string property. This string is analyzed by the linguistic tagger.
Finally, we’ve created a new CommentItem instance that holds the sentimentScore. This is result is passed downstream to the subscriber.

The code for the CommentView SwiftUI struct which holds the comments along with their score is given below:

struct CommentView : View{
    
    @ObservedObject var commentFeed = HNCommentFeed()
    
    var body: some View {
    
        List(commentFeed.comments){ item in
    
            
            Text(item.sentimentScore)
                .background(((item.sentimentScore as NSString).floatValue >= 0.0) ? Color.green : Color.red)
                .frame(alignment: .trailing)
            
            Text(item.text ?? "")
            
        }
        .navigationBarTitle("Comment Score (commentFeed.sentimentAvg)")
        .navigationBarItems(trailing: (((commentFeed.sentimentAvg as NSString).floatValue >= 0.0) ? Image(systemName: "smiley.fill").foregroundColor(Color.green) : Image(systemName: "smiley.fill").foregroundColor(Color.red)))
    }

    init(commentIds: [Int]) {
        commentFeed.getIds(ids: commentIds)
    }
}

We’ve set an SF Symbol (new in iOS 13) as the Navigation Bar Button, the color of which represents the overall sentiments of the top-level comments of that story.

As a result, we get the following output in our application:

Conclusion

Using Apple’s built-in sentiment score for NLP, we see that most top stories attract polarizing opinions on Hacker News. While a lot of comments are cryptic, which can cause accuracy issues in the sentiment analysis even custom models, Apple’s built-in sentiment analysis does a fine job. The Natural Language framework has shown some good progress, and there’s a lot more to look forward to in WWDC 2020.

Let’s take a step back and look at what we’ve learned in this piece.

We saw:

How the Combine framework makes it really easy to handle multiple network requests with URLSession. We managed to chain requests, set dependency API requests, and synchronize the API results by avoiding the dreaded callback hell.
How to create custom publishers and ensure that the contract between the publisher and subscriber is maintained (visit the where clause in the receive methods).
How to use Combine operators to our advantage. We managed to transform a bunch of comments to add an additional property — sentiment score—by performing natural language processing inside the Combine operators.

Moving forward, you can extend the above implementation by adding an endless scrolling functionality. This gives you all the top Hacker News stories. Here’s a good reference for implementing endless scrolling in a SwiftUI-based application.

The full source code of the above application is available in this GitHub repository.

That’s a wrap for this one. Thanks for reading, and I hope you enjoyed the mix of Combine and the Natural Language framework.

Sentiment Analysis on iOS Using SwiftUI, Natural Language, and Combine: Hacker News Top Stories

Leveraging Apple’s reactive programming framework for handling asynchronous tasks, while also doing natural language processing in real-time

Our Goal

Getting Started

Creating a Custom Combine Publisher

Fetching Hacker News Comments and Analyzing Sentiment Scores

Conclusion

Fritz

Comments 0 Responses

Leave a Reply Cancel reply