How Will AI Automate Paperwork?

14 Nov 2024 by John Werner · Forbes

Blue Cartoon Characters Design Vector Art Illustration. A smiling blue man works with AI (Generative ... [+] Artificially Intelligent, Robot Arm) to write and draw.getty

What happens when generative AI picks up a pencil, or a brush, or a hammer and chisel?

OK, the last one was kind of facetious, or to put it another way, the world is further from an AI being able to do sculpting.

But there is a certain amount of interest around getting AI agents to use tools – mostly the kinds of tools you associate with the digital desktop. With that said, you may see AI getting closer to a world where non-human actors can “do” a lot of the things that people also do online.

We (humankind) are now in the era of ‘agentic AI’ – some people would say we got there earlier this year. So it’s pretty new. If you listen to Tejas Kulkami of Common Sense Machines talking about some of the newest applications during the recent IEEE/Imagination in Action event, you get the idea that there’s an understanding of how to explore, how systems become creative, and how they work toward goals.

In describing how this works, Kulkami helped the audience to imagine the use of digital tools by an artificial agent.

“You have a set of environments - those environments have different rules,” he said. “There’s a policy that takes action.”

The Path Toward AGI

Discussing some of the projects he was involved in 2018 for DeepMind, Kulkami set up a scenario with two agent modules. One is a learner module that learns how to take action. The second is a reward agent that figures out how to use a reward system.

MORE FOR YOU
Today’s NYT Mini Crossword Hints And Answers For Thursday, November 14
Gaetz Resigns From House Before Ethics Report Can Be Released
NYT ‘Strands’ Today: Hints, Spangram And Answers For Thursday, November 14th

An additional element is that capability of the AI agents to learn those rules, master those tools, and eventually use them in creative ways.

Kulkami referenced Anthropic’s Claude model, where the agent is now learning to use a computer like a human – making keyboard and mouse inputs, navigating an operating system, and exploring what’s possible with that interface.

Up until quite recently, that was an exclusively human experience.

As observers can see, 3-D modeling is still in its infancy, and it has a lot more challenges attached to it.

“It’s one of those problems,” Kulkami said. “There’s almost no data on the Internet. It’s extremely data scarce.”

Navigating Topology

Another problem is topology - how to match items in three-dimensional space.

However, Kulkami suggested, these models will have an ace up their sleeves.

Essentially, they’ll be able to look over your shoulder and learn from what you’re doing as a human user…

Do As I Do

Describing nascent desktop analysis systems, Kulkami suggested that we’re going to have the agents learning by watching every human input, and then taking those into account.

“It watches everything that you’re doing,” he explained, “and then it learns to mimic your keyboard and mouse actions, so that it can learn to control all the tools.”

Obviously, this can have far-reaching ramifications. Kulkami talked about how it can have applications far beyond 3-D.

“If we can solve the 3-D problem, we can solve any problem,” he said.

Taking a critical look at what these systems are doing now shows us where we might be with agent AI within a couple of years, Kulkami notes computers “will be just running in the background” doing tasks. What will humans be doing?

To me, this is one of the most powerful and effective ways to imagine AI capabilities. In the analog age, you needed a physical hand to lift and use a physical pen. The biological brain directed that physical hand.

In the digital age, you don’t need a physical pen or a physical hand to fill out paperwork. And now you don’t need a human brain, either. It’s all very new, and quite an extreme concept with far-reaching impacts on every industry.

“Claude can … interact directly with other desktop apps by emulating a human user’s keystrokes, mouse movements, and cursor clicks through the ‘Computer Use’ API,” writes Andrew Tarantola at Digital Trends. “Claude’s Constitutional AI architecture means that it is tuned to provide accurate answers, rather than creative ones. The chatbot can also competently summarize research papers, generate reports based on uploaded data, and break down complex math and science questions into easily followed step-by-step instructions. While it may struggle to write you a poem, it excels at generating verifiable and reproducible responses, especially with its newly introduced analysis tool.”

Try this stuff out for yourself. You’ll be amazed! Amazon is putting robust Anthropic capability into its product dashboards, so that’s one place where execs may encounter this first. In any case, keep watching this space for more on the next big advances.