Live Closed Captioning
For Live Broadcast and Live Webcast Closed Captions
Steno Machine vs. Speech Recognition
Since it is impossible to type as fast as words are spoken, Realtime captioning can only be done properly by using a steno machine. A professional Realtime captioner can type more than 200 words per minute with over 99% accuracy using a steno machine. Due to their great skills, these professionals are quite expensive for a long term project.
For low budget situations, there is a possible alternative solution - not as good as Realtime captioning with steno machines, but in some cases it is acceptable.
Using Speech Recognition Software
CPC makes a speech recognition, real time captioning software called YouCaption which is optimized for real-time closed captioning using your voice. It works with live presentations, live TV broadcasts, and live text streaming to a web server. It is based on the same engine as Dragon Naturally Speaking, so the accuracy is similar, but it is optimized for live captioning instead of dictation or note taking.
As of 2012, the only mainstream speech recognition engine still on the market is Dragon Naturally Speaking (Windows) or Dragon MacSpeech (Mac). It can convert speech to text fairly well after you train the software to your voice. The more you train it, better it gets. The main problem with these software packages is that you have to train the system with your voice, it does not recognize someone else's voice which the software has never heard before. Also it can only recognize the one selected voice at a time. You can have multiple profiles trained, but only one speaker's profile open at a time.
Shadow Speaking
With the aid of this software, if you can act somewhat like an interpreter who typically translates from one language to another for a live event, you can do live captioning pretty well. You do not have to translate, you just have to listen to the words from different speakers and repeat the words out loud (because the speech recognition software only recognizes your voice).
If you are a speaker for live events on a regular basis, you may train speech recognition software to your voice and then you can caption live using YouCaption and/or CaptionMaker software.
Realtime
Captioning - TV News and Live Presentations
(With
Steno Machine)
Published
in The Community Ear, February 1996
by Dr. Dilip Som
There are three types of captions:
- Post production captions (Pop-on)
for movies, videos, TV sitcoms, TV soap operas etc.
- Teleprompting captions (Roll-up)
for speeches and presentations with prepared scripts and low
budget local TV news.
- Realtime captions (Roll-up) for TV
news, TV talk shows and live presentations - situations where
there is no prepared scripts available beforehand.
In this article we will discuss how realtime captioning works. For a realtime captioning system, you need:
- A steno machine (writer) to enter
data.
- A high-speed computer with a
sizable amount of memory to store a rather large dictionary which
is used to translate the steno keystrokes into words.
- Realtime captioning software which
translates every steno keystroke into English and then sends that
data to an encoder.
- A caption encoder to insert the
caption data onto the video.
- A highly skilled court reporter (realtime
captioner). See
below.
The whole system costs about $11,000 or more. And a good captioner charges $100 per hour or more. It is definitely the most expensive captioning process of all. Many local TV stations, even if they can manage to get the money to buy the system, simply can not afford the recurring expense of the realtime captioner. For them the teleprompting/captioning system is a viable alternative solution.
Many captioners caption TV news from their office at home - with only a computer, software and a steno machine - hundreds of miles away from a TV station. The computer is connected through a telephone line to an encoder located at the TV station. The TV station feeds the news through the encoder before broadcasting. As the captioner watches the broadcast (via cable or satellite), he/she enters the words into the steno machine. The encoder receives the captions through the telephone line and combines the video with the captions. The time delay for this process is insignificant.
However, due to both human (captioner response time) and machine processing (translating steno key strokes to English) time, the captions usually appear a couple of seconds after they are spoken.
Preparation and
Fixes
Before the show starts, the realtime captioner goes through the
materials which will be covered in the show; and enters all the proper
names and new words the captioner might not have in their dictionary.
For a half hour show the captioner sometimes need to spend an
additional half hour or more to do this.
Some TV stations rebroadcast their evening news later at night. You may have observed that the part of the late night news which is a rebroadcast of the evening news has virtually no errors. This happens because the captioner gets the opportunity to fix all the errors before the next air time. At the time of rebroadcast, the captioner simply presses only one key to send one line of captions at a time from the script prepared earlier for the evening news. Since one whole line goes out at a time, captions are painted on the screen very smoothly from left to right. You can easily identify these captions. You may sometimes even see the captions appear before they are spoken!
Captioning at the
top of the screen
Decoders adopting the new 1993 FCC/EIA specifications can support
Roll-up captioning at the top of the screen. All decoders manufactured
before July 1993 can only display Roll-up captions at the bottom of
the screen. If you have a new decoder or a new TV with a decoder
built-in, you might sometimes see captions appear at the top of the
screen to avoid conflicts with the graphics (names of people, etc. )
at the bottom of the screen. Of course, the captioner has to make an
extremely fast decision to move the captions to the top when
necessary. If the captioner does not do that, even with the new
decoders you will see the captions covering the graphics at the
bottom.
Do not blame the
captioners for all errors
Not all the errors you see on the TV news are the result of mistakes
made by the captioners. The captioners works under intense pressure,
and it is quite possible that they make some errors. A good captioner
normally does not make more than 2% errors.
Errors can also arise because the caption information is a very sensitive signal encoded in a small area inside the video. If the picture reception on your TV is not very good, you will invariably see some mistakes in captions due to the distortions of the caption signals.
Ultimate solution
Ideally, if there would be a machine (speech-recognition) to translate
any speech into captions, you would not need the highly expensive
captioners - the factor which prevents many TV stations from
captioning their news. But the present speech-recognition systems are
still at their infancy. They can only interpret speeches at about 40
wpm, and can not even interpret more than one speaker at a time. A
large amount of research and money is being spent every year to make
this technology feasible, but there is still a long way to go.
Realtime Captioning and Steno Writing
There are more than 20,000 court reporters in the United states. When they work in the court environment they can afford to make some mistakes, they can fix the mistakes later, and present the final corrected copy next day. In regard to realtime captioning, they do not have that luxury. Since the captions go out live instantly - there is no time for corrections. This is why not all court reporters are realtime captioners. In fact, only a small percentage (less than 2%) is certified as realtime captioners. Their average writing speed is around 250 words per minute. The steno machine has only 24 keys - with thousands of possible key strokes combinations. There are 4 keys for vowels (A, O, E and U). To write an I, you need to press E and U together. The rest of the keys are divided into two parts. The left keys are for the beginning of a word, and the right keys are for the end part of a word. Left keys are S, T, K, P, W, H and R; and right keys are F, R, P, B, L, G, T, S, D and Z. Not all consonants are available as a single key stroke. For example, to write B, you need to press P and W together. You may press only one key, or as many as 19 keys all at the same time. Each key stroke, whether it is a single key or multiple keys, can produce syllable, a word or even a phrase. It completely depends how you design your dictionary to respond to your key strokes. It is astounding to think how it is possible to remember thousands of key strokes. Of course, there are some standards which people follow. But every captioner adds their own key strokes to their system. I will give you few examples.
| Keystrokes | English word(s) |
|
TH |
this is balanced budget as a matter of fact |
Captioners keep a number of phrases and abbreviations like this in their repertoire to attain writing speed in excess of 250 wpm.
