An Application of Artificial Intelligence: Creating Immortality in Digital World

Based on history, the progress of artificial intelligence is largely driven by application scenarios. Defining a good application scenario and its metric may be more important than solving the problem itself. In this blog, we propose a new and potentially important application: to enable people to live forever in digital world. Immortality is one of the dreams rooted in every people’s minds, yet our bodies are vulnerable to disease, aging, and death. Is it possible to create immortality using artificial intelligence? What is the principals of designing such a system?

As the first part of this series of blogs, we propose four principals of creating immorality in digital world (i.e. a digital person):

  1. An always-on multimodal information capturing system
  2. A good question-answering (QA) system
  3. A customizable multimodal output system
  4. (A personalized learning and evolving mechanism for the QA system)

How does the above principals correlate with my (Peidong Wang: http://www.peidongwang.com) research area, automatic speech recognition (ASR)? ASR is mainly used as one modal to capture the information in human conversations. A simplified digital person mainly using ASR can thus be:

  1. An always-on ASR system capturing conversations involving the user in daily lives
  2. A simple QA system: during inference, compare the input question with the conversations stored in step 1, output the answers whose questions are the same as / similar to the input question
  3. For audio output in the multimodal output system, play the saved audio or generate (customized) audio using text-to-speech (TTS) techniques

Some of the technical areas involved in the above ASR based system are:

  1. Since the ASR system is always-on, the microphone should be wearable
  2. To protect users’ privacy, on-device ASR may be used (In practice, the system may be a combination of on-device and server-based ASR)
  3. For conversations, we may use speaker recognition (this may also influence the QA system), speaker separation, speech diarization, single- and multi-channel speech enhancement to improve ASR accuracy

发表评论

Fill in your details below or click an icon to log in:

WordPress.com 徽标

您正在使用您的 WordPress.com 账号评论。 注销 /  更改 )

Google photo

您正在使用您的 Google 账号评论。 注销 /  更改 )

Twitter picture

您正在使用您的 Twitter 账号评论。 注销 /  更改 )

Facebook photo

您正在使用您的 Facebook 账号评论。 注销 /  更改 )

Connecting to %s

借助 WordPress.com 创建您的网站
立即开始
%d 博主赞过: