|
Live
Captioning
(Steno Machine and
Speech Recognition) Steno
Machine vs. Speech Recognition Realtime
Captioning: TV News and Live Presentations
February 1996 Realtime
Captioning and Steno Writing February 1996.
Steno
Machine vs. Speech Recognition
Since it is impossible to type as
fast as words are spoken, Realtime captioning can only be done
properly by using a steno machine. A professional Realtime captioner
can type more than 200 words per minute with over 99% accuracy using a
steno machine. Due to their great skills, these professionals are
quite expensive for a long term project.
For low budget situations, there is a
possible alternatives solution - not as good as Realtime
captioning with steno machines, but in some cases it is acceptable.
Using speech
recognition software
There are quite a few software
packages like Naturally
Speaking (English and Spanish) and ViaVoice
(English) on the market which can convert speech to text fairly well
after you train the software to your voice. The more you train it,
better it gets. The main problem with these software packages is that
you have to train the system with your voice, it does not recognize
someone else's voice which the software has never heard before.
With the aid of this software, if you
can act somewhat like an interpreter who typically translates from one
language to another for a live event, you can do live captioning
pretty well. You do not have to translate, you just have to listen to
the words from different speakers and repeat the words out loud
(because the speech recognition software only recognizes your voice).
If you are a speaker for live events
on a regular basis, you may train speech recognition software to your
voice and then you can caption live using CaptionMaker software.
Realtime
Captioning - TV News and Live Presentations
(With
Steno Machine)
Published
in The Community Ear, February 1996
by Dr. Dilip Som
There are three types of captions:
- Post production captions (Pop-on)
for movies, videos, TV sitcoms, TV soap operas etc.
- Teleprompting captions (Roll-up)
for speeches and presentations with prepared scripts and low
budget local TV news.
- Realtime captions (Roll-up) for TV
news, TV talk shows and live presentations - situations where
there is no prepared scripts available beforehand.
In this article we will discuss how
realtime captioning works. For a realtime captioning system, you need:
- A steno machine (writer) to enter
data.
- A high-speed computer with a
sizable amount of memory to store a rather large dictionary which
is used to translate the steno keystrokes into words.
- Realtime captioning software which
translates every steno keystroke into English and then sends that
data to an encoder.
- A caption encoder to insert the
caption data onto the video.
- A highly skilled court reporter (realtime
captioner). See
below.
The whole system costs about $11,000
or more. And a good captioner charges $100 per hour or more. It is
definitely the most expensive captioning process of all. Many local TV
stations, even if they can manage to get the money to buy the system,
simply can not afford the recurring expense of the realtime captioner.
For them the teleprompting/captioning system is a viable alternative
solution.
Many captioners caption TV news from
their office at home - with only a computer, software and a steno
machine - hundreds of miles away from a TV station. The computer is
connected through a telephone line to an encoder located at the TV
station. The TV station feeds the news through the encoder before
broadcasting. As the captioner watches the broadcast (via cable or
satellite), he/she enters the words into the steno machine. The
encoder receives the captions through the telephone line and combines
the video with the captions. The time delay for this process is
insignificant.
However, due to both human (captioner
response time) and machine processing (translating steno key strokes
to English) time, the captions usually appear a couple of seconds
after they are spoken.
Preparation and
Fixes
Before the show starts, the realtime captioner goes through the
materials which will be covered in the show; and enters all the proper
names and new words the captioner might not have in their dictionary.
For a half hour show the captioner sometimes need to spend an
additional half hour or more to do this.
Some TV stations rebroadcast their
evening news later at night. You may have observed that the part of
the late night news which is a rebroadcast of the evening news has
virtually no errors. This happens because the captioner gets the
opportunity to fix all the errors before the next air time. At the
time of rebroadcast, the captioner simply presses only one key to send
one line of captions at a time from the script prepared earlier for
the evening news. Since one whole line goes out at a time, captions
are painted on the screen very smoothly from left to right. You can
easily identify these captions. You may sometimes even see the
captions appear before they are spoken!
Captioning at the
top of the screen
Decoders adopting the new 1993 FCC/EIA specifications can support
Roll-up captioning at the top of the screen. All decoders manufactured
before July 1993 can only display Roll-up captions at the bottom of
the screen. If you have a new decoder or a new TV with a decoder
built-in, you might sometimes see captions appear at the top of the
screen to avoid conflicts with the graphics (names of people, etc. )
at the bottom of the screen. Of course, the captioner has to make an
extremely fast decision to move the captions to the top when
necessary. If the captioner does not do that, even with the new
decoders you will see the captions covering the graphics at the
bottom.
Do not blame the
captioners for all errors
Not all the errors you see on the TV news are the result of mistakes
made by the captioners. The captioners works under intense pressure,
and it is quite possible that they make some errors. A good captioner
normally does not make more than 2% errors.
Errors can also arise because the
caption information is a very sensitive signal encoded in a small area
inside the video. If the picture reception on your TV is not very
good, you will invariably see some mistakes in captions due to the
distortions of the caption signals.
Ultimate solution
Ideally, if there would be a machine (speech-recognition) to translate
any speech into captions, you would not need the highly expensive
captioners - the factor which prevents many TV stations from
captioning their news. But the present speech-recognition systems are
still at their infancy. They can only interpret speeches at about 40
wpm, and can not even interpret more than one speaker at a time. A
large amount of research and money is being spent every year to make
this technology feasible, but there is still a long way to go.
Realtime
Captioning and Steno Writing
There are more than 20,000 court
reporters in the United states. When they work in the court
environment they can afford to make some mistakes, they can fix the
mistakes later, and present the final corrected copy next day. In
regard to realtime captioning, they do not have that luxury. Since the
captions go out live instantly - there is no time for corrections.
This is why not all court reporters are realtime captioners. In fact,
only a small percentage (less than 2%) is certified as realtime
captioners. Their average writing speed is around 250 words per
minute. The steno machine has only 24 keys - with thousands of
possible key strokes combinations. There are 4 keys for vowels (A, O,
E and U). To write an I, you need to press E and U together. The rest
of the keys are divided into two parts. The left keys are for the
beginning of a word, and the right keys are for the end part of a
word. Left keys are S, T, K, P, W, H and R; and right keys are F, R,
P, B, L, G, T, S, D and Z. Not all consonants are available as a
single key stroke. For example, to write B, you need to press P and W
together. You may press only one key, or as many as 19 keys all at the
same time. Each key stroke, whether it is a single key or multiple
keys, can produce syllable, a word or even a phrase. It completely
depends how you design your dictionary to respond to your key strokes.
It is astounding to think how it is possible to remember thousands of
key strokes. Of course, there are some standards which people follow.
But every captioner adds their own key strokes to their system. I will
give you few examples.
| Keystrokes |
English
word(s) |
|
TH
g_
PW A_LS _D
PWLTD SKWR_ET
SPH A_FT
|
this
is
balanced
budget
as a matter of fact |
Captioners keep a number of phrases
and abbreviations like this in their repertoire to attain writing
speed in excess of 250 wpm.
|