What is the future of work? If you’ve watched sci-fi shows, you’ve been exposed to a variety of drool-worthy visions of the future where computers do our bidding, and they probably all appear similar at first glance. Take the OG Star Trek and the recent phenom Westworld. Both feature general artificial intelligence, the type of thinking in which we humans engage. Characters talking to Star Trek’s ship the Enterprise have conversations as they would with another person, albeit one with the speed of thought and technological superpowers of a fantasy starship. Westworld takes the comparison even further with androids virtually indistinguishable from people in every way. Prick them and they do bleed.
General intelligence is widely predicted as on the horizon for AI in the real world as well, though the predicted distance of that horizon also varies widely from tens of years to hundreds. But Westworld’s Season 3 slipped in something that may be upon us much sooner. When android “host” leader Dolores leaves the “park” and enters the human world, she has a whiz-bang virtual assistant in her ear. Sure, it can summon an Uber, but it can also lease her an apartment in under 3 seconds. She can make the request as she’s walking up to the front desk and be a tenant by the time she arrives. This could be another generally intelligent AI she’s talking to, one that’s remote or even disembodied. But it doesn’t have to be. It could instead be software using nl2api. And unlike general intelligence, a scene like this in Westworld could soon become a reality. Bringing it to life is our mission here at PowerUser.
In this blog post, we’ll define nl2api, examine how it differs from current technologies, go over (at a very high level) how it can be implemented, and lay out its potential impact on the future of work. Westworld (minus the gunslinging robots), here we come.
nl2api stands for natural language to API translation. (For the non-techies, an API or Application Programming Interface is a defined set of interactions that can be used to communicate with a software application.) A user’s voice or text request is parsed for intent to determine which application the user is addressing, what action should be performed, what argument values have been provided, and what return data is desired.
Some may define nl2api as translating a user’s request to a single API call, but we consider this limiting. If that were the case, users would have to know all the available API calls in a system. Otherwise, the user is likely to request something that can’t be performed because it isn’t perfectly contained in an action. Imagine the frustration such a system would engender.
nl2api should be able to understand both the user’s intent and how to accomplish what the user needs given all the API actions available in a service. But let’s not stop there. What if carrying out the user’s intent isn’t possible within a single software service, but is when multiple services are combined? Truly valuable nl2api should be able to accommodate cross-platform interactions, as well.
Let’s be specific about some of the ways a multi-call user request can manifest. Arguments provided and return data requested may not be present in the API call that best matches the action the user wants to take, but rather spread among multiple calls. The user may want to perform an action more than once, necessitating a loop. The user may want to trigger an action based on a certain condition being met, which can be tested for by calling a different action or listening for its execution. Or the user may want to transfer data across services.
Products exist today that attempt to make this process easier, though they generally do not use natural language. They are IFTTT and no-code (more honestly low-code) drag-and-drop workflow applications that rely on hard-coded data formatting and decision tree building. As the number of SaaS platforms used in an organization has skyrocketed, users have become overwhelmed with learning how to communicate with them and how to get them to communicate with one another. No-code workflow apps are a first pass at solving this challenge. The downside to these applications is that they put the onus on the user to build and maintain every workflow they might need. This is fine for automating a few common manual workflows, but it eventually runs into problems with both scaling the number of workflows and maintaining them over time as new application versions are released and user needs change.
What nl2api brings to the table is the capabilities necessary to address both sides of the problem: understanding a user’s intent in the moment and translating it to a series of steps to accomplish the desired task. That sounds good, but it’s a hard problem. So how can it be done short of either general artificial intelligence or a trillion-parameter black-box model translation?
Most AI applications become technically successful by applying constraints to narrow the universe of possibilities they encompass, and this is no different. The universe in this case is the set of API calls available. That comes with a lot of specific structure, but also a lot of variation in the services represented. Let’s look at the constraints we can apply:
When two API calls interact, they do so by transferring data returned by one call to the input arguments of another call. Since each API call has known and well-defined data types and validation, connections between API calls can be constrained to only those for which the output data meets input validation requirements.
Conceptual data categorization
Just because the output of an action satisfies the requirements of another action’s input doesn’t mean it makes any sense to connect them. A company’s quarterly expenses and its quarterly revenue are both floating point numbers to two significant digits in the same range of magnitude, but of course they should not be substituted for one another. To connect revenue with revenue, we use data type classes beyond scalars in the API call definitions. Inputs and outputs must share the same class to connect with one another. With appropriately defined classes, their descriptions can be connected to user intent, as well.
Connecting data conceptually across services is more difficult because the class definitions are provided by different organizations. To solve this, a common set of classes is independently defined from which both can draw. Communication is facilitated only when a common class exists that effectively describes the data and both services elect to apply the class.
Conceptual activity categorization
There’s tremendous commonality in the types of tasks SaaS applications perform. Categories include create/read/update/delete (CRUD), transferring data, managing permissions, broadcasting messages, etc. Actions can be understood as corresponding to labeled common task types. Applying these labels assists in matching interpreted user intent to appropriate actions, as well as actions to each other. While a defined list of labels will not capture every possibility, it will encompass a large majority, especially if multiple labels can be applied to an action.
In addition to constraints, we can take advantage of learning patterns of interaction. Every non-read action committed by the nl2api system must receive confirmation from the user before execution to prevent the wrong action from being taken. These confirmations constitute feedback that can be used to further train the system on which API call graph solved a particular type of intent. Over time, this improves call graph construction.
The example we gave at the beginning of this post was a consumer technological utopia, in an admittedly dystopian fictional world. But the real impact of nl2api will be not on consumer behavior, but on the future of work.
As companies rely on an increasing number of SaaS applications, each offering more depth and complexity over time, the cognitive load on knowledge workers increases. It becomes impractical to learn all SaaS platforms and challenging to coordinate them. Workflow applications manage cross-communications and no-code interfaces attempt to reduce technical prerequisites for operation. But software is much better suited to manage and communicate with software than humans are. If we can reduce the number of human touchpoints, we can unlock huge gains in efficiency and coordination.
SaaS providers also benefit by making their platforms more accessible and more fully integrated. How many applications struggle with user adoption among non-admins who don’t receive extensive training? And how many applications can’t get their foot in the door of a potential new client because integration with legacy applications would present too much of a challenge?
Imagine a virtual assistant that is familiar with thousands of public APIs and how they interoperate. It’s not generally intelligent, but it’s capable of understanding a description of what you need done and translating that to a graph of API calls that span multiple services. It’s accessible by voice even if you’re not at a keyboard. There is no learning curve beyond being able to articulate in natural language what you want done. And it can do it all in less than three seconds as you walk into a building wearing stylish clothes with your hair slicked back, an earbud in your ear, and the intense expression of someone about to take over the world.