Mohammad Omar is cofounder and CEO at LXT, an emerging leader in global AI training data that powers intelligent technology.
I believe that artificial intelligence (AI) is one of our most important technological innovations but that we’re still in the early stages of AI maturity, with much still to be achieved across industries. This pivotal technology will have an endless number of applications, and there will be many paths for innovators to shape its future.
Technology that helps machines understand the way people communicate is one of the most promising new breeds of AI. As globalization continues, the translation and localization industry represents a key area for AI innovation, and several companies in the space have undergone a transformation into AI-powered businesses to inform new language-oriented applications.
What does the future look like for these players? What are the most important customer needs driving it, and what types of organizations will emerge to support this evolution?
Simplifying The Complexity Of Language
There are more than 7,100 languages spoken in the world today. Grammar provides structure to language, and the free components that make up most of its meaning are its vocabulary. But there are translation and localization challenges that go beyond the structure of language itself. These include figures of speech, sarcasm and irony, compound verbs like “look up” or “get over,” which may not translate to other languages, as well as words with multiple meanings or no equivalent in another language. Each of these can be more easily solved with today’s AI technology.
Companies that want to access new markets with end users in different languages should prioritize making sure their AI solutions can respond accurately to users in their native languages. This requires training them with high-quality language data that has been sourced from native speakers or, in the case of search engines and social media, that query results, advertisements and more have been evaluated by individuals who have local context from their specific region of the world.
Powering The Rise Of Conversational AI
Machine learning (ML) and AI have already helped make the process of translation much more efficient, and it’s making equal contributions to the field of localization, in which the process of pairing the right contextual images with localized content can now be automated.
Chatbots and voice-powered assistants from Alexa to Siri now assist us in all kinds of ways in our digital and real-world lives. Now, a movement in conversational AI is making these machine-to-human interactions even more common and natural. Major brands have realized the value of this added layer of communication and begun to build it into customer touchpoints. In fact, according to Accenture, 95% of customer interactions are expected to be AI-enabled by 2025.
Addressing Global Language Variables
The language learning process in humans progresses across the many variables outlined above with increased exposure to sounds, words and sentence structure, which eventually leads to refinement and the understanding of more subtle and complex aspects. Similarly, AI applications need large amounts of data to be trained accurately. A general rule of thumb is that a single application will require at least 10 times the number of model parameters that exist in any given use case. In addition, 80% of that data will be needed for learning and 20% for validation.
A complete and well-designed data strategy is essential for businesses to have the needed scope of data to properly train their ML algorithms. And without a well-trained algorithm, the most sophisticated chatbots, home assistants, search engines and other apps and solutions will fall short of providing a useful end-user experience.
The data sourcing and annotation process should start from the very beginning of application development. There are several critical steps, including designing the prototype, collecting data, labeling and organizing data, training the algorithm, deploying the application, collecting live data and continually improving the experience.
Let’s take a closer look at three of these steps.
1. Designing The Prototype: Just as important as the initial prototype is the data strategy that’s going to inform the intelligence that drives the application. The key considerations include the type of data that’s needed to cover all scenarios, how to ensure and build confidence in that data, opportunities to diversify the data and how often retraining will be needed, where the data will come from and, finally, how it will be prepared for the application training process. This will require the involvement of the right stakeholders upfront.
2. Managing The Data: The first step in the data management process is to determine an appropriate source for AI training data. There are numerous options available online, from a simple Google search to open-source datasets. Or companies may choose to crowdsource their own data or work with a third-party vendor. The next step is to organize and label the data so that the ML algorithm can recognize it and begin to make a hypothesis based on the inputs. Another key consideration in the labeling process is to ensure there’s no data annotation bias. One way to do this is to source data from multiple sources.
3. Training The Algorithm And Deploying The Application: Once the AI training data has been sourced with enough examples of potential real-world scenarios, organizations can implement a data pipeline that continually feeds the algorithm to begin learning. The algorithm finds patterns and relationships in the data that should result in the application operating correctly based on anticipated scenarios. Once the product team determines that the algorithm has reached an acceptable level of accuracy, it’s time to deploy the application in a production environment.
Just as important as an initial product launch is the ongoing retraining and fine-tuning of ML algorithms. As your user base grows and expands, an application or product will be exposed to new natural language cues and contexts that require their own training data to facilitate.
To ensure a delightful and productive experience for your users, it’s important to commit to continually annotating and correcting an algorithm’s capabilities. Depending on the purpose of the application, this could mean updating it with new data on a monthly, weekly or even daily basis.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?