Thoughts of techie: January 2020

Thursday, January 30, 2020

Catching up with developments in Recommendation Algorithms using Deep Learning

In my quest to identify technology and business gaps for Voicy.AI, I have been spending time to catch up with developments in Recommendation Algorithms using Deep Learning. I started my research reading Recommendations paper from YouTube. Recommendation in general is a two step process consisting of Retrieval and Re-Ranking. The authors have phrased the retrieval as a Multi Class classification instead of reusing Inverted Index scoring mechanisms. I liked the tricks of Negative Sampling and Sub-Linear scoring using hashing techniques to optimize for training and serving in production respectively. I than moved to another important development in the recommendation systems leveraging joint training of Wide And Deep Learning Neural Nets pioneered by Apps team at Google. I was impressed with the observation of authors about how Wide Model is good for Memorization and Deep Model is good for generalization.

I than stumbled upon another research paper from UCL folks. The authors have focused on the retrieval problem of recommendations in the context of journalism. I liked how the authors have used structure of the problem and seperate attention models to construct profiles to predict recommendations. It was impressive to see the big leaps by DL algorithms for Recommendation problem from Collaborative Filtering algorithms few years back.

What is your opinion about the next DL paradigm for recommendations? Any suggestions for more popular research papers in DL based recommendations?

Vision QA Platform

Any robotics companies out there? We are planning to work on Vision-QA platform leveraging Neural Modules and Search Technologies that can be used by Robotics companies to understand their environment and answer questions in Natural Language. It would be great to collaborate with Robotics companies on the platform

Section Identification in Video Content - Patent approval

USPTO has approved one of my patents at Amazon for section identification in Videos (https://lnkd.in/gn9Cv55). Knowing what I know now, I would have proposed an architecture leveraging Neural Modules and Deep Reinforcement Learning for the idea. Thanks to A9/Amazon management team for giving me the opportunity.

Google Duplex technology solution for your business

Were you amused by demo of Google Assistant calling a business? The demo fused the elements from Speech Recognition, Natural Language Understanding, Natural Language Generation and Text to Speech and was nicely presented by Google's team.

We at Voicy.AI has been perfecting Google Duplex like technology for businesses from 2015 building upon our customer engagement patents from 2010. Specifically, our patent with title "Systems and methods for virtual agents to help customers and businesses" talks about several ideas that were shown in Google's demo. We also talk about several next generation customer engagement solutions and conversational commerce ideas in the patent. It is a great sense of personal satisfaction and pride to know that our team has recognized the opportunity before the industry leaders and great teams at Google, Facebook and Amazon.

In addition to patents, we also pioneered one click virtual assistants for businesses on Telephone using our technology. We converted a complex implementation consisting of Telephony platform, Speech to Text, Dialog Engine technology and custom implementation typically costing millions of dollars from bigger companies to a simple one click SAS solution.

You can subscribe to our AI Telephone assistant solution and reduce your customer services costs from thousands to hundreds of dollars and provide 24/7 service to your customers. You not only get our technology, but also get coverage from our patents. Please reach out to info@voicy.ai, to see how we can help your business.

How Facebook can trump Google in advertising

A major percent of internet advertising revenues are shared by Facebook and Google. Facebook makes it advertising money using re-targeting, and latent targeting on user profiles. Google uses an auction model built on search queries to fill its coffers. Search query is one of the top most actionable intents from the user on internet. Google has built an amazing business around it.

Facebook has got more or less monopoly around social communication. Social tweets and messages don't have a merchandising intent, similar to search queries. Facebook has trickily used the re-targeting mechanism (Disclaimer: I patented the idea, before anyone has implemented it) to make the ads more actionable.

We will eventually be at a stage, where Facebook and Google will be fighting for the same ad dollars. Who has got a strategic advantage to win the war? In my view it will be Facebook.

Elaborating more, Facebook controls the user's interests and influences across its social properties. Facebook can use the personal data from the users to predict search queries and information that the user will "Google" in the near future and make it as part of the user stream. You might ask, is it possible?

How can Facebook predict search queries before they happen? Facebook has got a search engine on its page and has got a partnership with Microsoft's Bing. They have access to both what the user is doing at any given point of time, what their influences are and what their search queries will be through its popular properties and partners. They also have significant information on a user through their re-targeting program, about their activities outside of the social walls.

Using above data, one can use variations of Sequence to Sequence algorithms to give search queries. The input sequence can be the aggregated behavior. We can use social profile embeddings, image embeddings in the social stream, previous search queries, the location information as inputs for the Sequence to Sequence algorithm. We can us a Variational encoder for representing the input data. The output sequence can be a list of search queries that the user will type on Google. One can also pose the query prediction as a recommendation problem. We can train a wide and deep neural net on the user's data and search queries to predict search queries. We can also borrow techniques from Zero Query search engine techniques to do the predictions and generate information in Social Streams, so that users don't have to go to Google to get information.

It would be a great win for users and Facebook, if they can stop the interruption on social browsing by 50%. Facebook can make money by asking the advertisers to bid on predicted search queries. It might be an easy sell, to the advertisers, with their relationships and engagement numbers.

If I were Google, I would be really scared of this possibility and eventuality (Most probably in the next two years). I would break the Facebook's monopoly on communication as early as possible.

Disclaimer: My friends at Facebook and other social networks, if you decide to implement this idea, I would appreciate, if you can pay me royalty for the patent I filed with title "Advanced techniques to improve content presentation experiences for Businesses and Users". Please don't ignore legal notices from a poor innovator :).

CNN Graphs

Has anyone used CNN Graphs for recommendations involving images and text in an eCommerce context? Pinterest is claiming big success in the PinSage (https://lnkd.in/gE4wU8n) paper. Are there any other promising directions involving Active Deep Reinforcement Learning or Evolutionary Learning for Recommendations and Discovery problems? I would love to know experience in production from other teams in my extended network.

Natural Language Generation for Chatbot

Generating Natural Language from few seed sentences without human for an intent has been a challenging problem in Chatbot industry. Here is my proposal: a) Apply a sequence to sequence model with attention on filtered SemEval dataset. b) Loop through each seed sentence and generate a semantically equivalent sequence to sequence model from step a. c) Take generated sentences from step b and add it as a seed sentence and recursively apply step b till you get sentences which are not semantically similar as evaluated by say a Siamese Network. What are your thoughts? Can you please let me know the results, if anyone gets a chance to implement it?

Knowledge Graph relevance in the embedding world

I am curious, if you are you finding the knowledge graph useful for Natural Language Understanding, in a world where Embeddings and Deep Learning algorithms are taking over? Would it still be valid to propose knowledge graph for improving search, chatbot and recommendations use cases in an eCommerce context in addition to Deep Learning approaches? Would we just use ensemble model to figure out the noise? It would be good to know thoughts/observations (my friends at Amazon, eBay, Etsy, Pinterest and Walmart) after any recent practical implementation?

Contextual Bandits for Recommendations

Interesting framework for Contextual Bandits https://lnkd.in/gvC7zb7 Let me know, if you found success using these algorithms for Home Page recommendations.

Issac SDK from Nvidia

Nvidia released Issac SDK to build intelligent robots. Sensing and Navigation are supported by the platform. It can be a great productivity booster for companies developing custom robots. Noticeably, the platform does not have Natural Language Conversational capability. Would it not be cool to add Vision QA algorithms and Dialog Engine functionalities to make robots talk and act like humans? What are your thoughts?

Generic Virtual Assistant platform

How can you add capabilities to Google' Dialog Flow, Amazon Lex, and Microsoft Bot Framework, so that every website can have a conversational agent with few clicks? You can crawl the website offline, gather HTML tags, use a knowledge graph and build intents that can be used in Natural Language conversations. I wrote this patent back in 2015, when Conversational Systems were just catching up, anticipating a big product gap that can be addressed using technology. I am happy to share that my general Conversational Assistant platform (https://lnkd.in/gC4yuAT) got approved by the Indian Patent Office (Indian patent office, in general, is conservative in approvals compared to USPTO). Please reach out to info at voicy dot ai, if any Corp dev/Legal folks at Google, Microsoft, GoDaddy and Amazon would be interested in licensing or acquiring the patent.

How can you build a constantly learning Virtual Assistant using Graph and Search technique

Have you ever run into a problem where-in your chatbot/virtual agent at some point of time, is not able to handle a conversation sequence with the user and has to back out to humans for help. The human than analyzes the context and answers questions from the user.

Can you use the human in the loop to constantly improve the capabilities of the virtual agent?

Let us say you are developing a virtual assistant to handle customer service calls on Telphone for Hotel Chain. Your virtual assistant had to back out and take the help of a human to resolve the customer issue.

The virtual assistant can listen to the recording of the conversation between the customer service representative and the customer, converts the conversation to text using speech to text recognition techniques and analyzes the conversation for future use.

The stored conversations/dialog are used to improve the intelligence of the software system on a continuous basis by storing the conversations in a graph data structure on an inverted index for efficient future retrieval.

A dialog can be defined as the smallest element of the customer and business interaction. The system can build a bipartite graph with a hierarchy of dialogues. A dialog itself can be represented by two nodes and an edge between them. The dialogs are connected and branched off as new combinations rise for the business interactions across different communication platforms. The graph can be built on an inverted index data structure to support efficient text search.

Elaborating further, to start with the opening sentence from customer service representative such as “Hello {Customer Name}! This is {Company}. How can I help you” will be represented as the root node of the graph. We note that the data in the node will have placeholders for the customer name, the business name. The placeholders in the conversation for building the graph are identified by looking for fuzzy string matches from the input dictionary consisting of inputs such as the business name, the customer name, the items served by the business, etc. The node is annotated with information about who the speaker (customer or customer service representative) was. The node will also have features such as semantic mappings of the sentence, vector computed using sentence2vec algorithm by training a convolutional neural network on the domain that the software agent is trained for.

A different semantic response from the customer is created as a child node for the question from the customer representative. The semantic equivalence to the existing nodes on the graph can be done using learn to rank algorithms such as Lambda Mart borrowed from the search techniques after doing a first pass inexpensive ranking on the inverted index of the graph of conversation. In an implementation, the result with the highest score with Learning to Rank algorithm exceeding a certain threshold is used as a representative for the customer input. The semantic equivalence comparison and scoring is done after tokenizing, stemming, normalizing and parametrizing (recognizing placeholders) input query. Slot filling algorithms are used to parametrize the customer responses. The slot filling algorithms can use HMM/CRF models to identify part of speech tags associated with the keywords and statistical methods to identify the relationships between the words. If there is a match to an existing dialog from the customer, then the software system will store the dialog context and not create a new node. In there is not a match, than a new node is added to the node of the last conversation.

Some tasks are simple question and answers such as “User: What is your speciality? Customer Service Representative: Our specialty is Spicy Chicken Pad Kee Mow ”. These tasks can be indexed on the graph as orphan parent-child relationships in the graph.

One of the challenges we run into when we are building the graph to constantly learn is the change in context. If there is no change in the context, we create the node as a child of the previous node. If there is a change in the context, we need to start a new node different from the previous state in the graph. To figure out a change in the context when the customer talks to the customer service representative, we can use a Bayesian or SVM Machine Learning classifier. The classifier can be trained on crowdsourced training data using features such as the number of tokens common to current and the previous task, the matching score percentage between what the customer has said and the maximum score match of an existing dialog. To improve the accuracy of the classifier, we can train a different classifier for each domain.

It is to be noted that the graph can be constructed manually by an interaction designer, which can then be inserted in an inverted index. In yet another implementation, a Recurrent Neural Network can be trained on the interaction between the customer and the customer service representative, if we have a lot of training data. To implement personalization to models in a recurrent neural network, user profiles can be clustered into several macro groups. We can use an unsupervised clustering algorithm such as K-Means clustering to accomplish this or create manually curated clusters based on information about the user such as age group, location, and gender of the customer. We can then boost the weight of the examples which had a positive conversion from the customer service representatives. In an implementation, this can be done by duplicating the positive inputs in the training data. The positive inputs can be characterized by things such as the order price and satisfaction from the customer. It is to be noted that the idea of personalization in neural networks is not specific to conversational customer interactions and can be used in things such as building model which send an automatic response to emails.

The graph on the inverted index is then used to answer questions about the business by a software agent. The software agent starts from the root node of the graph and greets the customer on a call, SMS and Facebook Messenger. The customer can respond to the greeting with a question about the business by searching for the closest match to the question from the customer using techniques borrowed from information retrieval. In an implementation, this can be done using an inverted index to look up possible matches for the user input using an in-expensive algorithm to start with and then evaluating the matches with an expensive algorithm such as Gradient Boosted Decision Tree. Before hitting the inverted index, we have to run stemming, tokenization and normalization algorithm on the input query to make sure that the input can be searched properly by the algorithms looking for a match.

This was an idea I wrote in 2016, in a patent application for Vocy.AI (If you ever plan to use this technique for your company, please consider paying royalty to a poor innovator). Components such as Sentence2Vec can be replaced now with BERT and RNN can be augmented further with attention techniques.

This approach gives control as well as the evolution of the virtual agent to enterprises.

YouTube recommendations using Mixture of Experts

Interesting read from Youtube Recommendations team (https://lnkd.in/e-k4YqN). They have used shallow tower (Wide and Deep Neural Networks) to remove position bias and introduced a Mixture of Experts to optimize for multiple objectives. Another interesting variation was to use PointWise Similarity for the Learn to Rank Re-Ranker for scalability reasons. It was also interesting to know that they tuned the weights of multiple experts manually. What are your thoughts?

Conversational Interaction Conference - 2020

I will be speaking in the Conversational Interaction Conference -2020 (https://lnkd.in/gtcVYtQ) on the topic of "Techniques for Personalization in Virtual Assistants". I plan to speak on the convergence of Deep Learning, Deep Reinforcement Learning, Personalization, NLG, Digital Persona, Multi-Modal Virtual Assistance experiences topics. It would be great more suggestions from the Linkedin community.

Interactive advertising presentation

Stepping into 2020, I have been ruminating about progress in Conversational AI Marketing space. I have presented some of my thoughts in Conversational Interaction conference in 2017. The Slides for the conference are at https://lnkd.in/g4Paqnq. I still see a blue ocean in the AI-backed Interactive Marketing space. What are your thoughts?

Amazon patent approval

I am excited to share that one of my patents (https://lnkd.in/guFt7vZ) for Amazon got approved. In this patent, I came up with an approach to use Holograms to provide an experiential experience to users for Search Queries. Getting a search patent in a world dominated by Google feels good. Thanks to Amazon and my co-inventors Erick Cantu-Paz, François Huet, David (Ciemo) Ciemiewicz and Priyank Singh for getting here.

Is A/B test technology still relevant?

Lots of companies use A/B test results as a way to measure the value of features to their users. Do you think the practice is still relevant? I feel that A/B testing needs to be phased out by an infrastructure leveraging ideas from Contextual Bandits, Deep Reinforcement Learning and Counterfactual Policy Estimation. The features would get faster to the market and companies would be able to use the best algorithms for a given context. What are your thoughts?