Saturday, July 4, 2020

Digital Clone technology for devices - Thoughts from 2018

Image recognition using popular algorithms such as Imagenet have bettered the human analysis. Word Vectors composed from text corpus have increased Natural Language understanding capability for software systems. End to End Deep Learning has improved Speech Recognition and has achieved parity with humans. We can apply theses advancements, to redefine customer interactions with businesses and customers on devices.

Users carry mobile devices all the time. The phone captures all the information about the user’s favorite applications, search queries, and the places he visits. Despite the rich user information, the current generation of software and hardware is still not able to recommend content to the users, the user wants to see. The user still needs to go to a search engine or a portal to get the information and/or content. The user still needs to learn how to interact with each application, as there is no universal interface that can help the user. Switching context to Augmented Reality (AR) applications, the user needs to painfully drag objects into the real world to see how the virtual object looks in the real world. The current generation of AR applications have limited support for Natural Language Interaction. The user has to painstakingly move virtual things using hands.

In this invention disclosure, we will describe how smarter hardware and software leveraging Natural Language, Image, and User Behavior analysis can help users and businesses with their experiences. We will discuss the personalized behavior model of the user to simulate his thinking process. The personalized behavior model takes in aggregated user behavior and application context captured by pixels on the screen that the user is looking for and helps with generating actions and content recommendations.

It is to be noted that this is unlike, a virtual agent from Apple, Google, and Amazon, which are triggered by hot keywords from the user’s speech and don’t use the visual, previous context about the user to generate content recommendations.

Behavior processor:

The current generation of devices knows everything about the user. They can capture where the user was, what the user is seeing currently, has seen and read in the past, whom did he talk to, what messages he sent to the friends, etc. Deep Learning and ReInforcement learning techniques have improved Image Understanding, Text extraction, Natural Language Understanding capabilities.

Despite the rich information and technology progress, we have seen that is available to the user, the current generation of devices are still not able to predict the content that the user likes. The user still has to go to a search engine and painfully type the search query on a small keyboard to find the information he wants. The user still has to go to the browser and type www.yahoo.com, to read up the news articles. The user can’t interact with the applications on the devices, even though the machines now can read, understand what the user is reading and answer questions in Natural Language.

In this invention, we will describe a hardware and software component called behavior analyzer which can be embedded int devices. The behavior analyzer running on the device can use the application context by modeling what the user is looking at and using aggregated information about the user to generate content that the user will like and execute actions on behalf of the users.

In an embodiment, to figure out what the user is reading and viewing on phone, the behavior processor is going to have a Location Analyzer, Vision Analyzer, Text Analyzer, Application Context Analyzer, a Memory component, Controller component and a Model Manager in the behavior processor. The behavior processor will form a hypothesis on the activity of the user, continuously learn from user interactions with content on the mobile phone and try to generate appropriate content or response in a timely manner, using above components

In an embodiment, we can use a combination of multiple architectures as will be discussed in the disclosure below to generate content and action recommendations based on the context.

Behavior processor to recommend content:

Let us say, a user is taking pictures of his family in a new year. If a user is generally active on Social Network and posts his pictures on the social network after taking pictures then there is a high probability that he will share the new year pictures on Facebook. In the current experiences, the user has to go to a social network, choose pictures taken from the camera, and then post his pictures on Facebook.

Switching context to another experience, would it not be easy for the user to show search results before the user decides to go to a search engine such as Google.com and type a search query in the search engine.

Experiences like above can be improved substantially using behavior processor. The behavior processor can run as an offline process, whenever the user starts an application or as a process that executes every ‘x’ minutes or so. The number ‘x’ can be a configurable parameter. The application context analyzer component, can take the pixels on the device that the user is looking, process it against a text recognition component, to extract text embedding. The pixels can be fed to object detection DNN to get image embeddings in the application. In an embodiment of this, we can train a general model for the behavior processor based on the user cluster associated with the device. In embodiments the users can be clustered using a K-Means clustering algorithm.

The generalized model for the user cluster can be trained using a Neural Network on anonymous training data on the users in the cluster. We will use techniques borrowed from the provisional document 62543400 with title Techniques to improve Content Presentation Experiences for businesses to build a general model. In an embodiment, the generalized model for predicting a sequence of actions for the user can be done by training a Recurrent Neural Net or Sequence to Sequence Algorithm with attention on the user activity.

The generalized model can be built by using a Deep Neural Network by feeding the training data from Location Analyzer, Vision Analyzer, Application Context Analyzer, and Memory component and application actions, content follow-ups within the user cluster. The DNN will be learned to predict application actions such as entering search engine queries, sharing with social networks, sending an SMS message to a friend, calling a merchant, etc and content generation such as pro-actively showing interesting news items and an update from the social network.

The trained general model for the user cluster can then be pushed to the device. In an embodiment, the model manager component will initialize the general model either during the device setup or as part of the booting process.

The general model, can then be further retrained and personalized for the user. In an embodiment, this can be done by using Reinforcement Learning methods. We can model content and action recommendation as an MDP process. The aggregated user behavior updates from social networks, news articles can be the state for the user. The possible action space can be to show content recommendation, display an application action, or don’t do anything. A reward function can be correctly predicting the action at time t. We can then use Policy Leaning or Value Iteration approaches to figure out an action. To start with, a general Reinforcement Learning model can be learned offline on the user cluster, using the generalized model. The general model can then be personalized to the user by adjusting the Reinforcement Learning model to maximize explicit user interaction. The personalized user behavior model can then be persisted on a remote server using the internet. The personalized model can be used on other user devices and internet ecosystems.

In another embodiment, an End to End Neural Network using an architecture consisting of Policy Gradient Deep Reinforcement Learning on top of a Deep Neural Network (DNN). The DNN with attention can generate user behavior embeddings on the offline user cluster behavior data. The generic model than can be personalized for the user by adjusting the loss function in the Policy Gradient Deep Reinforcement Learning to predict the user actions.

In yet another embodiment, we can train a general model to do imitation learning for user clusters on the behavior sequence data. We can then apply techniques from One-Shot Learning to fine-tune user behavior.

It is to be noted that, we are proposing a different architecture for personalization and simulating user behavior from the current generation of ML models. Most of the current systems are built on the premise of a single global model for all groups of users. Personalization is done in a single global model by adding user features as an additional input to the mode. A single model for all users substantially simplifies the validation and debugging. The architecture in this disclosure builds out a single model for a user. A single model for a user gives the model more freedom to choose parameters that are applicable for that specific user. The single model can also be made complex to mimic the complex behavior and action of that user. We will remove the additional complexity of optimizing the burden on the model to optimize on all groups of users.

Behavior processor as a virtual agent for an application:

Patent Application US 15/356,512, talks about a Virtual Agent for an application/website which can talk in Natural Language based on using external API integration. The behavior processor can also act as a virtual agent which can interact in Natural Language/Natural Speech for an application, without the application manually adding an external API service.

The behavior processor has got the application context of what the user is seeing, what the user is looking at the application, who the user is, the buttons and text in the application. The behavior process will also have access to external intelligence added by manual rules and/or derived by crawling the application. The behavior processor can also use information about the user aggregated from multiple ecosystems.

In an embodiment, the behavior processor can use the information identified in the above paragraph to answer questions about the service in the application and do actions in the application.

In an embodiment, the behavior processor can use Imitation Learning and one-shot learning approaches to execute actions in the application on behalf of the user. The behavior processor can learn from other user interactions that happen on the cloud

Behavior processor to help with Augmented Reality application:

Companies such as Flipkart, Amazon, and Walmart sell furniture, dresses, shoes and other merchandising in their Mobile eCommerce Apps. Before purchasing merchandise, the user wants to see how the furniture fits in their living room. He also wants to check, how the dress fits on him, before purchasing.

The eCommerce companies use experiences from augmented reality to increase user engagement with merchandising in their Mobile Applications. For instance, a user can choose a TV stand from the furniture category, point the camera on their Mobile Phone at their living room, move the chosen TV stand to get a physical sense of how the TV looks in the living room.

This painful experience of moving virtual object such as furniture in the Mobile App to the physical world can be improved by adding a software Virtual Agent which can interact in Natural Language to the flow. This virtual agent can be embedded within the app or can be triggered through a general voice agent on the phone such as SIRI on iPhone or Google Assistant on Google. The virtual agent can be embedded in the app using a third-party library code or be part of the application. The behavior processor described above can also act as a virtual agent for the eCommerce application.

The virtual agent can take the voice input, convert the voice to text optionally, and figure out the intent of the user. The entities associated with the user utterance can be figured out using slot filling algorithms. The bitmap of physical visual images captured by the physical camera, a textual description of the image, and the image of the object in shopping can be provided as additional context to the virtual agent. The virtual agent can use this additional context in figuring out intents and entities.

In an embodiment, the virtual agent can use Neural Module Networks to understand the Virtual Image in the application, the title, and category of the image, the Physical Context, and the Natural Language utterance. In an implementation, the Neural Module Networks can be dynamically assembled by parsing the Natural Language utterance. In another embodiment, we can train an end to end model using Reinforcement learning.

After understanding the intent using Neural Modules, we need to complete the action. An action can move the virtual object on the site to the physical environment of the user. Another example of action, can take a fish on Google Images and put it into a physical aquarium to see how the virtual fish looks in an aquarium at home.

Action sequences for the intent such as moving an object from one location to another can be configured manually for a Natural Language Intent. A Deep Neural Network can also be used to train actions from training data consisting of actions, Natural Language utterances, and scene input. In an embodiment, we can use the Deep Reinforcement Learning approach on top of Neural Modules for Natural Language Understanding, Object Detection, and Scene Understanding to execute actions.

In another embodiment, we can use Imitation Learning techniques to execute action sequences. We can use techniques borrowed from search engine rewrite to gather training data for imitation learning. For instance, let us say a user says, I want to see how an “I want to see how Guppy fish looks in my aquarium” pointing the Augmented Reality Device to his aquarium.

Let us say the behavior processor does not recognize the utterance in the context of the visual scene and says “Sorry, I can’t help you”. The user will then go to an Image Search Engine such as Google.com and search for Guppy Fish and then move Guppy Fish to the aquarium.

The behavior processor can learn from this interaction for the user cluster and apply it for future iterations down the line. This can be done by applying one-shot learning techniques on the general model, that we trained for AR applications.

Unified Model for different application scenarios:

In this disclosure, we talked about how a Behavior Processor can use application and user context to simplify user interactions.

We proposed different use cases for Behavior Processor. We also note that we proposed different DNN architectures for the Behavior processor for different use cases. We can use a Unified software component by combining different use cases. In an embodiment, we can run a simple Deep Learning classifier on the application and user context to decide which model to run. In another embodiment, we can train an end to end Neural Network on all the use cases and build a unified model to help the user in different application contexts.

Summary:

In this disclosure, we propose a behavior processor on the user’s devices. The behavior processor simulates user behavior by leveraging application and user context and helps the user with different use cases using Natural Language and Vision techniques.

Monday, February 24, 2020

Generic virtual assistant platform

How can you add capabilities to Google' Dialog Flow, Amazon Lex, and Microsoft Bot Framework, so that every website can have a conversational agent with few clicks? You can crawl the website offline, gather HTML tags, use a knowledge graph and build intents that can be used in Natural Language conversations.

I wrote this patent back in 2015, when Conversational Systems were just catching up, anticipating a big product gap that can be addressed using technology. I am happy to share that my general Conversational Assistant platform (https://lnkd.in/gC4yuAT) got approved by the Indian Patent Office (Indian patent office, in general, is conservative in approvals compared to USPTO). Please reach out to info at voicy dot ai, if any Corp dev/Legal folks at Google, Microsoft, GoDaddy and Amazon would be interested in licensing or acquiring the patent.

How can you build a constantly learning Virtual Assistant using Graph and Search techniques

Have you ever run into a problem where-in your chatbot/virtual agent at some point of time, is not able to handle a conversation sequence with the user and has to back out to humans for help. The human than analyzes the context and answers questions from the user.

Can you use the human in the loop to constantly improve the capabilities of the virtual agent?

Let us say you are developing a virtual assistant to handle customer service calls on Telphone for Hotel Chain. Your virtual assistant had to back out and take the help of a human to resolve the customer issue.

The virtual assistant can listen to the recording of the conversation between the customer service representative and the customer, converts the conversation to text using speech to text recognition techniques and analyzes the conversation for future use. 

The stored conversations/dialog are used to improve the intelligence of the software system on a continuous basis by storing the conversations in a graph data structure on an inverted index for efficient future retrieval. 

A dialog can be defined as the smallest element of the customer and business interaction. The system can build a bipartite graph with a hierarchy of dialogues. A dialog itself can be represented by two nodes and an edge between them. The dialogs are connected and branched off as new combinations rise for the business interactions across different communication platforms. The graph can be built on an inverted index data structure to support efficient text search. 

Elaborating further, to start with the opening sentence from customer service representative such as “Hello {Customer Name}! This is {Company}. How can I help you” will be represented as the root node of the graph. We note that the data in the node will have placeholders for the customer name, the business name. The placeholders in the conversation for building the graph are identified by looking for fuzzy string matches from the input dictionary consisting of inputs such as the business name, the customer name, the items served by the business, etc. The node is annotated with information about who the speaker (customer or customer service representative) was. The node will also have features such as semantic mappings of the sentence, vector computed using sentence2vec algorithm by training a convolutional neural network on the domain that the software agent is trained for. 

A different semantic response from the customer is created as a child node for the question from the customer representative. The semantic equivalence to the existing nodes on the graph can be done using learn to rank algorithms such as Lambda Mart borrowed from the search techniques after doing a first pass inexpensive ranking on the inverted index of the graph of conversation. In an implementation, the result with the highest score with Learning to Rank algorithm exceeding a certain threshold is used as a representative for the customer input. The semantic equivalence comparison and scoring is done after tokenizing, stemming, normalizing and parametrizing (recognizing placeholders) input query. Slot filling algorithms are used to parametrize the customer responses. The slot filling algorithms can use HMM/CRF models to identify part of speech tags associated with the keywords and statistical methods to identify the relationships between the words. If there is a match to an existing dialog from the customer, then the software system will store the dialog context and not create a new node. In there is not a match, than a new node is added to the node of the last conversation. 

Some tasks are simple question and answers such as “User: What is your speciality? Customer Service Representative: Our specialty is Spicy Chicken Pad Kee Mow ”. These tasks can be indexed on the graph as orphan parent-child relationships in the graph. 

One of the challenges we run into when we are building the graph to constantly learn is the change in context. If there is no change in the context, we create the node as a child of the previous node. If there is a change in the context, we need to start a new node different from the previous state in the graph. To figure out a change in the context when the customer talks to the customer service representative, we can use a Bayesian or SVM Machine Learning classifier. The classifier can be trained on crowdsourced training data using features such as the number of tokens common to current and the previous task, the matching score percentage between what the customer has said and the maximum score match of an existing dialog. To improve the accuracy of the classifier, we can train a different classifier for each domain. 

It is to be noted that the graph can be constructed manually by an interaction designer, which can then be inserted in an inverted index. In yet another implementation, a Recurrent Neural Network can be trained on the interaction between the customer and the customer service representative, if we have a lot of training data. To implement personalization to models in a recurrent neural network, user profiles can be clustered into several macro groups. We can use an unsupervised clustering algorithm such as K-Means clustering to accomplish this or create manually curated clusters based on information about the user such as age group, location, and gender of the customer. We can then boost the weight of the examples which had a positive conversion from the customer service representatives. In an implementation, this can be done by duplicating the positive inputs in the training data. The positive inputs can be characterized by things such as the order price and satisfaction from the customer. It is to be noted that the idea of personalization in neural networks is not specific to conversational customer interactions and can be used in things such as building model which send an automatic response to emails. 

The graph on the inverted index is then used to answer questions about the business by a software agent. The software agent starts from the root node of the graph and greets the customer on a call, SMS and Facebook Messenger. The customer can respond to the greeting with a question about the business by searching for the closest match to the question from the customer using techniques borrowed from information retrieval. In an implementation, this can be done using an inverted index to look up possible matches for the user input using an in-expensive algorithm to start with and then evaluating the matches with an expensive algorithm such as Gradient Boosted Decision Tree. Before hitting the inverted index, we have to run stemming, tokenization and normalization algorithm on the input query to make sure that the input can be searched properly by the algorithms looking for a match.

This was an idea I wrote in 2016, in a patent application for Vocy.AI. Components such as Sentence2Vec can be replaced now with BERT and RNN can be augmented further with attention techniques.

This approach gives control as well as the evolution of the virtual agent to enterprises.

Conversational AI Marketing

Stepping into 2020, I have been ruminating about progress in Conversational AI Marketing space. I have presented some of my thoughts in Conversational Interaction conference in 2017. The Slides for the conference are at https://lnkd.in/g4Paqnq.

I still see a blue ocean in the AI-backed Interactive Marketing space. What are your thoughts?

Alternative platform for AB Tests

Lots of companies use A/B test results as a way to measure the value of features to their users.

Do you think the practice is still relevant? I feel that A/B testing needs to be phased out by an infrastructure leveraging ideas from Contextual Bandits, Deep Reinforcement Learning and Counterfactual Policy Estimation.

The features would get faster to the market and companies would be able to use the best algorithms for a given context.

What are your thoughts?
#reinforcementlearning #artificialintelligence

Thursday, February 13, 2020

Conference Presentation: Techniques to personalize conversations for virtual assistants

It was great pleasure presenting in the Conversational Interaction conference (https://lnkd.in/gtcVYtQ) on the topic of "Techniques to personalize conversations for virtual assistants".

I met several interesting people and heard about research happening in the space. It is a great conference for people specializing in Conversational AI.

I have updated my slides at https://lnkd.in/gRbqYjb

It would be great to know your thoughts on my presentation.

Thursday, January 30, 2020

Catching up with developments in Recommendation Algorithms using Deep Learning



In my quest to identify technology and business gaps for Voicy.AI, I have been spending time to catch up with developments in Recommendation Algorithms using Deep Learning. I started my research reading Recommendations paper from YouTube. Recommendation in general is a two step process consisting of Retrieval and Re-Ranking. The authors have phrased the retrieval as a Multi Class classification instead of reusing Inverted Index scoring mechanisms. I liked the tricks of Negative Sampling and Sub-Linear scoring using hashing techniques to optimize for training and serving in production respectively. I than moved to another important development in the recommendation systems leveraging joint training of Wide And Deep Learning Neural Nets pioneered by Apps team at Google. I was impressed with the observation of authors about how Wide Model is good for Memorization and Deep Model is good for generalization.


I than stumbled upon another research paper from UCL folks. The authors have focused on the retrieval problem of recommendations in the context of journalism. I liked how the authors have used structure of the problem and seperate attention models to construct profiles to predict recommendations. It was impressive to see the big leaps by DL algorithms for Recommendation problem from Collaborative Filtering algorithms few years back.


What is your opinion about the next DL paradigm for recommendations? Any suggestions for more popular research papers in DL based recommendations?

Vision QA Platform

Any robotics companies out there? We are planning to work on Vision-QA platform leveraging Neural Modules and Search Technologies that can be used by Robotics companies to understand their environment and answer questions in Natural Language. It would be great to collaborate with Robotics companies on the platform

Section Identification in Video Content - Patent approval

USPTO has approved one of my patents at Amazon for section identification in Videos (https://lnkd.in/gn9Cv55). Knowing what I know now, I would have proposed an architecture leveraging Neural Modules and Deep Reinforcement Learning for the idea. Thanks to A9/Amazon management team for giving me the opportunity.

Google Duplex technology solution for your business



Were you amused by demo of Google Assistant calling a business? The demo fused the elements from Speech Recognition, Natural Language Understanding, Natural Language Generation and Text to Speech and was nicely presented by Google's team.


We at Voicy.AI has been perfecting Google Duplex like technology for businesses from 2015 building upon our customer engagement patents from 2010. Specifically, our patent with title "Systems and methods for virtual agents to help customers and businesses" talks about several ideas that were shown in Google's demo. We also talk about several next generation customer engagement solutions and conversational commerce ideas in the patent. It is a great sense of personal satisfaction and pride to know that our team has recognized the opportunity before the industry leaders and great teams at Google, Facebook and Amazon.


In addition to patents, we also pioneered one click virtual assistants for businesses on Telephone using our technology. We converted a complex implementation consisting of Telephony platform, Speech to Text, Dialog Engine technology and custom implementation typically costing millions of dollars from bigger companies to a simple one click SAS solution.


You can subscribe to our AI Telephone assistant solution and reduce your customer services costs from thousands to hundreds of dollars and provide 24/7 service to your customers. You not only get our technology, but also get coverage from our patents. Please reach out to info@voicy.ai, to see how we can help your business.





How Facebook can trump Google in advertising



A major percent of internet advertising revenues are shared by Facebook and Google. Facebook makes it advertising money using re-targeting, and latent targeting on user profiles. Google uses an auction model built on search queries to fill its coffers. Search query is one of the top most actionable intents from the user on internet. Google has built an amazing business around it.


Facebook has got more or less monopoly around social communication. Social tweets and messages don't have a merchandising intent, similar to search queries. Facebook has trickily used the re-targeting mechanism (Disclaimer: I patented the idea, before anyone has implemented it) to make the ads more actionable.


We will eventually be at a stage, where Facebook and Google will be fighting for the same ad dollars. Who has got a strategic advantage to win the war? In my view it will be Facebook.


Elaborating more, Facebook controls the user's interests and influences across its social properties. Facebook can use the personal data from the users to predict search queries and information that the user will "Google" in the near future and make it as part of the user stream. You might ask, is it possible?


How can Facebook predict search queries before they happen? Facebook has got a search engine on its page and has got a partnership with Microsoft's Bing. They have access to both what the user is doing at any given point of time, what their influences are and what their search queries will be through its popular properties and partners. They also have significant information on a user through their re-targeting program, about their activities outside of the social walls.


Using above data, one can use variations of Sequence to Sequence algorithms to give search queries. The input sequence can be the aggregated behavior. We can use social profile embeddings, image embeddings in the social stream, previous search queries, the location information as inputs for the Sequence to Sequence algorithm. We can us a Variational encoder for representing the input data. The output sequence can be a list of search queries that the user will type on Google. One can also pose the query prediction as a recommendation problem. We can train a wide and deep neural net on the user's data and search queries to predict search queries. We can also borrow techniques from Zero Query search engine techniques to do the predictions and generate information in Social Streams, so that users don't have to go to Google to get information.


It would be a great win for users and Facebook, if they can stop the interruption on social browsing by 50%. Facebook can make money by asking the advertisers to bid on predicted search queries. It might be an easy sell, to the advertisers, with their relationships and engagement numbers.


If I were Google, I would be really scared of this possibility and eventuality (Most probably in the next two years). I would break the Facebook's monopoly on communication as early as possible.


Disclaimer: My friends at Facebook and other social networks, if you decide to implement this idea, I would appreciate, if you can pay me royalty for the patent I filed with title "Advanced techniques to improve content presentation experiences for Businesses and Users". Please don't ignore legal notices from a poor innovator :).

CNN Graphs

Has anyone used CNN Graphs for recommendations involving images and text in an eCommerce context? Pinterest is claiming big success in the PinSage (https://lnkd.in/gE4wU8n) paper. Are there any other promising directions involving Active Deep Reinforcement Learning or Evolutionary Learning for Recommendations and Discovery problems? I would love to know experience in production from other teams in my extended network.

Natural Language Generation for Chatbot

Generating Natural Language from few seed sentences without human for an intent has been a challenging problem in Chatbot industry. Here is my proposal: a) Apply a sequence to sequence model with attention on filtered SemEval dataset. b) Loop through each seed sentence and generate a semantically equivalent sequence to sequence model from step a. c) Take generated sentences from step b and add it as a seed sentence and recursively apply step b till you get sentences which are not semantically similar as evaluated by say a Siamese Network. What are your thoughts? Can you please let me know the results, if anyone gets a chance to implement it?

Knowledge Graph relevance in the embedding world

I am curious, if you are you finding the knowledge graph useful for Natural Language Understanding, in a world where Embeddings and Deep Learning algorithms are taking over? Would it still be valid to propose knowledge graph for improving search, chatbot and recommendations use cases in an eCommerce context in addition to Deep Learning approaches? Would we just use ensemble model to figure out the noise? It would be good to know thoughts/observations (my friends at Amazon, eBay, Etsy, Pinterest and Walmart) after any recent practical implementation?

Contextual Bandits for Recommendations

Interesting framework for Contextual Bandits https://lnkd.in/gvC7zb7 Let me know, if you found success using these algorithms for Home Page recommendations.

Issac SDK from Nvidia

Nvidia released Issac SDK to build intelligent robots. Sensing and Navigation are supported by the platform. It can be a great productivity booster for companies developing custom robots. Noticeably, the platform does not have Natural Language Conversational capability. Would it not be cool to add Vision QA algorithms and Dialog Engine functionalities to make robots talk and act like humans? What are your thoughts?

Generic Virtual Assistant platform

How can you add capabilities to Google' Dialog Flow, Amazon Lex, and Microsoft Bot Framework, so that every website can have a conversational agent with few clicks? You can crawl the website offline, gather HTML tags, use a knowledge graph and build intents that can be used in Natural Language conversations. I wrote this patent back in 2015, when Conversational Systems were just catching up, anticipating a big product gap that can be addressed using technology. I am happy to share that my general Conversational Assistant platform (https://lnkd.in/gC4yuAT) got approved by the Indian Patent Office (Indian patent office, in general, is conservative in approvals compared to USPTO). Please reach out to info at voicy dot ai, if any Corp dev/Legal folks at Google, Microsoft, GoDaddy and Amazon would be interested in licensing or acquiring the patent.

How can you build a constantly learning Virtual Assistant using Graph and Search technique

Have you ever run into a problem where-in your chatbot/virtual agent at some point of time, is not able to handle a conversation sequence with the user and has to back out to humans for help. The human than analyzes the context and answers questions from the user.
Can you use the human in the loop to constantly improve the capabilities of the virtual agent?
Let us say you are developing a virtual assistant to handle customer service calls on Telphone for Hotel Chain. Your virtual assistant had to back out and take the help of a human to resolve the customer issue.
The virtual assistant can listen to the recording of the conversation between the customer service representative and the customer, converts the conversation to text using speech to text recognition techniques and analyzes the conversation for future use. 
The stored conversations/dialog are used to improve the intelligence of the software system on a continuous basis by storing the conversations in a graph data structure on an inverted index for efficient future retrieval. 
A dialog can be defined as the smallest element of the customer and business interaction. The system can build a bipartite graph with a hierarchy of dialogues. A dialog itself can be represented by two nodes and an edge between them. The dialogs are connected and branched off as new combinations rise for the business interactions across different communication platforms. The graph can be built on an inverted index data structure to support efficient text search. 
Elaborating further, to start with the opening sentence from customer service representative such as “Hello {Customer Name}! This is {Company}. How can I help you” will be represented as the root node of the graph. We note that the data in the node will have placeholders for the customer name, the business name. The placeholders in the conversation for building the graph are identified by looking for fuzzy string matches from the input dictionary consisting of inputs such as the business name, the customer name, the items served by the business, etc. The node is annotated with information about who the speaker (customer or customer service representative) was. The node will also have features such as semantic mappings of the sentence, vector computed using sentence2vec algorithm by training a convolutional neural network on the domain that the software agent is trained for. 
A different semantic response from the customer is created as a child node for the question from the customer representative. The semantic equivalence to the existing nodes on the graph can be done using learn to rank algorithms such as Lambda Mart borrowed from the search techniques after doing a first pass inexpensive ranking on the inverted index of the graph of conversation. In an implementation, the result with the highest score with Learning to Rank algorithm exceeding a certain threshold is used as a representative for the customer input. The semantic equivalence comparison and scoring is done after tokenizing, stemming, normalizing and parametrizing (recognizing placeholders) input query. Slot filling algorithms are used to parametrize the customer responses. The slot filling algorithms can use HMM/CRF models to identify part of speech tags associated with the keywords and statistical methods to identify the relationships between the words. If there is a match to an existing dialog from the customer, then the software system will store the dialog context and not create a new node. In there is not a match, than a new node is added to the node of the last conversation. 
Some tasks are simple question and answers such as “User: What is your speciality? Customer Service Representative: Our specialty is Spicy Chicken Pad Kee Mow ”. These tasks can be indexed on the graph as orphan parent-child relationships in the graph. 
One of the challenges we run into when we are building the graph to constantly learn is the change in context. If there is no change in the context, we create the node as a child of the previous node. If there is a change in the context, we need to start a new node different from the previous state in the graph. To figure out a change in the context when the customer talks to the customer service representative, we can use a Bayesian or SVM Machine Learning classifier. The classifier can be trained on crowdsourced training data using features such as the number of tokens common to current and the previous task, the matching score percentage between what the customer has said and the maximum score match of an existing dialog. To improve the accuracy of the classifier, we can train a different classifier for each domain. 
It is to be noted that the graph can be constructed manually by an interaction designer, which can then be inserted in an inverted index. In yet another implementation, a Recurrent Neural Network can be trained on the interaction between the customer and the customer service representative, if we have a lot of training data. To implement personalization to models in a recurrent neural network, user profiles can be clustered into several macro groups. We can use an unsupervised clustering algorithm such as K-Means clustering to accomplish this or create manually curated clusters based on information about the user such as age group, location, and gender of the customer. We can then boost the weight of the examples which had a positive conversion from the customer service representatives. In an implementation, this can be done by duplicating the positive inputs in the training data. The positive inputs can be characterized by things such as the order price and satisfaction from the customer. It is to be noted that the idea of personalization in neural networks is not specific to conversational customer interactions and can be used in things such as building model which send an automatic response to emails. 
The graph on the inverted index is then used to answer questions about the business by a software agent. The software agent starts from the root node of the graph and greets the customer on a call, SMS and Facebook Messenger. The customer can respond to the greeting with a question about the business by searching for the closest match to the question from the customer using techniques borrowed from information retrieval. In an implementation, this can be done using an inverted index to look up possible matches for the user input using an in-expensive algorithm to start with and then evaluating the matches with an expensive algorithm such as Gradient Boosted Decision Tree. Before hitting the inverted index, we have to run stemming, tokenization and normalization algorithm on the input query to make sure that the input can be searched properly by the algorithms looking for a match.
This was an idea I wrote in 2016, in a patent application for Vocy.AI (If you ever plan to use this technique for your company, please consider paying royalty to a poor innovator). Components such as Sentence2Vec can be replaced now with BERT and RNN can be augmented further with attention techniques.
This approach gives control as well as the evolution of the virtual agent to enterprises.

YouTube recommendations using Mixture of Experts

Interesting read from Youtube Recommendations team (https://lnkd.in/e-k4YqN). They have used shallow tower (Wide and Deep Neural Networks) to remove position bias and introduced a Mixture of Experts to optimize for multiple objectives. Another interesting variation was to use PointWise Similarity for the Learn to Rank Re-Ranker for scalability reasons. It was also interesting to know that they tuned the weights of multiple experts manually. What are your thoughts?

Conversational Interaction Conference - 2020

I will be speaking in the Conversational Interaction Conference -2020 (https://lnkd.in/gtcVYtQ) on the topic of "Techniques for Personalization in Virtual Assistants". I plan to speak on the convergence of Deep Learning, Deep Reinforcement Learning, Personalization, NLG, Digital Persona, Multi-Modal Virtual Assistance experiences topics. It would be great more suggestions from the Linkedin community.

Interactive advertising presentation

Stepping into 2020, I have been ruminating about progress in Conversational AI Marketing space. I have presented some of my thoughts in Conversational Interaction conference in 2017. The Slides for the conference are at https://lnkd.in/g4Paqnq. I still see a blue ocean in the AI-backed Interactive Marketing space. What are your thoughts?

Amazon patent approval

I am excited to share that one of my patents (https://lnkd.in/guFt7vZ) for Amazon got approved. In this patent, I came up with an approach to use Holograms to provide an experiential experience to users for Search Queries. Getting a search patent in a world dominated by Google feels good. Thanks to Amazon and my co-inventors Erick Cantu-Paz, François Huet, David (Ciemo) Ciemiewicz and Priyank Singh for getting here.