Apple Logo Windows Logo

Live Closed Captioning

For Live Broadcast and Live Webcast Closed Captions



Steno Machine vs. Speech Recognition

Since it is impossible to type as fast as words are spoken, Realtime captioning can only be done properly by using a steno machine. A professional Realtime captioner can type more than 200 words per minute with over 99% accuracy using a steno machine. Due to their great skills, these professionals are quite expensive for a long term project.

For low budget situations, there is a possible alternative solution - not as good as Realtime captioning with steno machines, but in some cases it is acceptable.

Using Speech Recognition Software

CPC makes a speech recognition, real time captioning software called YouCaption which is optimized for real-time closed captioning using your voice. It works with live presentations, live TV broadcasts, and live text streaming to a web server. It is based on the same engine as Dragon Naturally Speaking, so the accuracy is similar, but it is optimized for live captioning instead of dictation or note taking.

As of 2012, the only mainstream speech recognition engine still on the market is Dragon Naturally Speaking (Windows) or Dragon MacSpeech (Mac). It can convert speech to text fairly well after you train the software to your voice. The more you train it, better it gets. The main problem with these software packages is that you have to train the system with your voice, it does not recognize someone else's voice which the software has never heard before. Also it can only recognize the one selected voice at a time. You can have multiple profiles trained, but only one speaker's profile open at a time.

Shadow Speaking

With the aid of this software, if you can act somewhat like an interpreter who typically translates from one language to another for a live event, you can do live captioning pretty well. You do not have to translate, you just have to listen to the words from different speakers and repeat the words out loud (because the speech recognition software only recognizes your voice).

If you are a speaker for live events on a regular basis, you may train speech recognition software to your voice and then you can caption live using YouCaption and/or CaptionMaker software.


Realtime Captioning - TV News and Live Presentations
(With Steno Machine)

Published in The Community Ear, February 1996
by Dr. Dilip Som

There are three types of captions:

  1. Post production captions (Pop-on) for movies, videos, TV sitcoms, TV soap operas etc.
  2. Teleprompting captions (Roll-up) for speeches and presentations with prepared scripts and low budget local TV news.
  3. Realtime captions (Roll-up) for TV news, TV talk shows and live presentations - situations where there is no prepared scripts available beforehand.

In this article we will discuss how realtime captioning works. For a realtime captioning system, you need:

  1. A steno machine (writer) to enter data.
  2. A high-speed computer with a sizable amount of memory to store a rather large dictionary which is used to translate the steno keystrokes into words.
  3. Realtime captioning software which translates every steno keystroke into English and then sends that data to an encoder.
  4. A caption encoder to insert the caption data onto the video.
  5. A highly skilled court reporter (realtime captioner). See below.

The whole system costs about $11,000 or more. And a good captioner charges $100 per hour or more. It is definitely the most expensive captioning process of all. Many local TV stations, even if they can manage to get the money to buy the system, simply can not afford the recurring expense of the realtime captioner. For them the teleprompting/captioning system is a viable alternative solution.

Many captioners caption TV news from their office at home - with only a computer, software and a steno machine - hundreds of miles away from a TV station. The computer is connected through a telephone line to an encoder located at the TV station. The TV station feeds the news through the encoder before broadcasting. As the captioner watches the broadcast (via cable or satellite), he/she enters the words into the steno machine. The encoder receives the captions through the telephone line and combines the video with the captions. The time delay for this process is insignificant.

However, due to both human (captioner response time) and machine processing (translating steno key strokes to English) time, the captions usually appear a couple of seconds after they are spoken.

Preparation and Fixes
Before the show starts, the realtime captioner goes through the materials which will be covered in the show; and enters all the proper names and new words the captioner might not have in their dictionary. For a half hour show the captioner sometimes need to spend an additional half hour or more to do this.

Some TV stations rebroadcast their evening news later at night. You may have observed that the part of the late night news which is a rebroadcast of the evening news has virtually no errors. This happens because the captioner gets the opportunity to fix all the errors before the next air time. At the time of rebroadcast, the captioner simply presses only one key to send one line of captions at a time from the script prepared earlier for the evening news. Since one whole line goes out at a time, captions are painted on the screen very smoothly from left to right. You can easily identify these captions. You may sometimes even see the captions appear before they are spoken!

Captioning at the top of the screen
Decoders adopting the new 1993 FCC/EIA specifications can support Roll-up captioning at the top of the screen. All decoders manufactured before July 1993 can only display Roll-up captions at the bottom of the screen. If you have a new decoder or a new TV with a decoder built-in, you might sometimes see captions appear at the top of the screen to avoid conflicts with the graphics (names of people, etc. ) at the bottom of the screen. Of course, the captioner has to make an extremely fast decision to move the captions to the top when necessary. If the captioner does not do that, even with the new decoders you will see the captions covering the graphics at the bottom.

Do not blame the captioners for all errors
Not all the errors you see on the TV news are the result of mistakes made by the captioners. The captioners works under intense pressure, and it is quite possible that they make some errors. A good captioner normally does not make more than 2% errors.

Errors can also arise because the caption information is a very sensitive signal encoded in a small area inside the video. If the picture reception on your TV is not very good, you will invariably see some mistakes in captions due to the distortions of the caption signals.

Ultimate solution
Ideally, if there would be a machine (speech-recognition) to translate any speech into captions, you would not need the highly expensive captioners - the factor which prevents many TV stations from captioning their news. But the present speech-recognition systems are still at their infancy. They can only interpret speeches at about 40 wpm, and can not even interpret more than one speaker at a time. A large amount of research and money is being spent every year to make this technology feasible, but there is still a long way to go.


Realtime Captioning and Steno Writing

There are more than 20,000 court reporters in the United states. When they work in the court environment they can afford to make some mistakes, they can fix the mistakes later, and present the final corrected copy next day. In regard to realtime captioning, they do not have that luxury. Since the captions go out live instantly - there is no time for corrections. This is why not all court reporters are realtime captioners. In fact, only a small percentage (less than 2%) is certified as realtime captioners. Their average writing speed is around 250 words per minute. The steno machine has only 24 keys - with thousands of possible key strokes combinations. There are 4 keys for vowels (A, O, E and U). To write an I, you need to press E and U together. The rest of the keys are divided into two parts. The left keys are for the beginning of a word, and the right keys are for the end part of a word. Left keys are S, T, K, P, W, H and R; and right keys are F, R, P, B, L, G, T, S, D and Z. Not all consonants are available as a single key stroke. For example, to write B, you need to press P and W together. You may press only one key, or as many as 19 keys all at the same time. Each key stroke, whether it is a single key or multiple keys, can produce syllable, a word or even a phrase. It completely depends how you design your dictionary to respond to your key strokes. It is astounding to think how it is possible to remember thousands of key strokes. Of course, there are some standards which people follow. But every captioner adds their own key strokes to their system. I will give you few examples.

Keystrokes English word(s)

TH
g_
PW A_LS _D
PWLTD SKWR_ET
SPH A_FT

this
is
balanced
budget
as a matter of fact

Captioners keep a number of phrases and abbreviations like this in their repertoire to attain writing speed in excess of 250 wpm.



   


Quick Links