Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Listening Tests, and Other
Interesting Things
3894 (H-6)
Floyd E. Toole and Sean E. Olive
Harman International Industries, Inc.
Northridge, CA 91329, USA
Presented at
the 97th Convention
1994 November 10-13
SanFrancisco
AuD,O
This preprint has been reproduced from the author's advance
manuscript, without editing, corrections or consideration by the
Review Board. The AES takes no responsibility for the
contents.
Additional preprints may be obtained by sending request and
remittance to the Audio Engineering Society, 60 East 42nd St.,
New York, New York 10165-2520, USA.
Afl rights reserved. Reproduction of thispreprint, or any portion
thereof, is not permitted without direct permission from the
Journal of the Audio Engineering
Society.
AN AUDIO ENGINEERING SOCIETY PREPRINT
Blind
Hearing
vs. Sighted
is Believing vs. Believing
is Hearing:
Listening
Tests, and Other Interesting
Things
by Floyd E. Toole and Sean E. Olive
Harman International
Industries, Inc.
Northridge, CA 91329 U.S.A.
Although it is taken for granted that many factors influence listeners
as they form opinions of sound quality, it is interesting to actually put them to
the test, and to assess the strength of the factors. These experiments had
several objectives.
On the subjective side, they were to determine the extent to
which listeners' opinions about loudspeaker sound quality are affected by not
seeing (blind tests) and seeing (sighted tests) the loudspeakers being
evaluated, to examine the performance of listeners with and without
experience in critical listening, and to examine the influence of the sex of the
listener. On the product side, the objectives were to evaluate the differences
among three high-quality expensive loudspeakers
and a high-performance,
small, inexpensive system which would serve as an "honesty' check for
listeners in the sighted tests. The results contain some reassurances
and some
surprises.
0 INTRODUCTION
Many years of experience with listening tests, conducted under blind
and double-blind
circumstances
have proven their worth and reliability in
quantifying subjective responses to several measureable
parameters.
The
results of these psychoacoustic
investigations have been repeatable and, with
retrospect, the relationships
between the subjective and objective domains
have been logical [1,2,3]. In properly conducted controlled tests, listeners
have been shown to be extremely sensitive to small changes in sound quality.
Yet, there continue to be animated arguments about the validity of blind
tests. The majority of these apply to the "great debate" issues, related to the
question of whether certain differences are audible or not. This paper does
not address this debate. All of the differences associated with the subjective
evaluations in these experiments were very audible, even to untrained
listeners.
The question here was: given that certain differences among
products are clearly audible, to what extent are listeners' opinions altered
when they are aware of the products being listened to?
It is probably safe to say that everyone in audio has been involved in
"sighted" subjective evaluations at one time or other. It is probably a safe
generalization
to say that most people in audio think that they can ignore the
effects of prior knowledge when they focus on the sounds of the products
under examination.
Others would argue that it is difficult to impossible not to
be biased in some way by expectations.
But... have you ever put it to the test?
Probably not. Neither had we.
Experience is one of those variables among listeners that is very
difficult to quantify. For example, musicians are experienced listeners but, is
experience in focusing on musical attributes equivalent to that of focusing on
timbral and spatial attributes?
Some evidence suggests that it is not.
Gabrielsson found that musicians who were not also audiophiles, were not
especially good judges of sound quality[4]. The famous pianist Glenn Gould
came to appreciate the insights of non musicians[5].
Our own tests have
confirmed this. So, listeners with different backgrounds
could be expected to
have differing abilities or preferences in subjective evaluations.
This is an
enormously broad topic, but we thought that it would be interesting to take a
first step towards understanding
the importance of this variable.
Everyone knows that females have different preferences
to males.
Right? If so, where is the proof?. Some earlier tests included both male and
female participants
[6]. In this case, they were all professional sound
engineers and producers.
In the final analysis, their opinions were
indistinguishable
from those of their male colleagues. There was one
difference, however:, a lower percentage of them had hearing loss so that, as a
population, they were more reliable listeners. At the consumer level, there
remains the question about sexual bias in listener preferences.
We now have
some data.
The issues addressed in these evaluations are important.
They are also
not unidimensional
Definitive answers must await more data but, in the
meantime, it is interesting to have some light shed on the issues.
2 OBJECTIVES
·
·
·
·
·
·
These tests had several objectives. On the subjective side they were:
to determine the extent to which listeners' opinions about loudspeaker
sound quality are affected by not seeing (blind tests) and seeing (sighted
tests) the loudspeakers being evaluated,
to examine the influence of having experience in critical-listening,
and
to examine the influence of the sex of the listener.
On the product side, the objectives were:
to evaluate the differences between two different variations of the same
basic loudspeaker system. They employed the same drivers in the same
enclosure, but the crossover networks were designed by two different
engineers having slightly different opinions about the optimum spectral
smoothness and balance. These are high-priced, high-end products.
to compare these performances with that of a current audiophile favorite
of a comparable price and size.
to evaluate a compact, inexpensive, subwoofer/satellite
system, and to use it
as an "honesty" check for listeners in the sighted tests. Previous tests had
shown that, within its power-handling
capabilities, it performed in a
manner that belied its low price and small size.
3 METHOD
So that the results would carry some weight, forty (40) listeners
participated
in the blind and sighted tests. They were all employees of Ilarman
International
companies.
This means that, in the sighted tests, the listeners
had one bias in common: brand loyalty.
The effects were tested using male experienced listeners and both male
and female inexperienced
listeners. In these tests, listeners were considered
to be inexperienced
if they had no previous experience in controlled listening
tests. Other definitions are possible, which might include persons with no
critical listening experience whatsoever.
The participants were categorized
under the following headings.
LISTENERS
BLIND TESTS
Experienced
Inexperienced
Male
10
10
Female 0
5
Total
10
15
Sex
SIGHTED TESTS
Experienced
Inexperienced
8
4
0
:3
8
!7
Unfortunately,
it was not possible to balance all levels within each
category. In Experiment 1 different listeners participated in the blind and
sighted versions. As a further test, Experiment 2 was conducted, in which the
same four experienced listeners participated
in both versions.
The listening room was typical of a domestic listening situation.
In the
blind tests, the identities of the loudspeakers were hidden from the listeners
with a visually opaque screen made of loudspeaker grille cloth. The grilles
were removed from the loudspeakers
so that, in effect, the grille cloth hid the
entire loudspeaker,
not just the drivers. In the sighted tests, the screen was
removed and listeners were told the brand name, model number and retail
price of each speaker prior to the start of the test.
The tests were conducted over a period of 1.5 weeks using a multiple ( 4
loudspeakers
at a time) presentation method. The monophonic tests were
conducted with the loudspeakers
adjusted for equal loudness within 0.5 dB
using B -weighted pink noise. Playback levels, which were constant
throughout the tests, were set for typical "good listening". They were not
intended to explore the power-handling
capabilities of the systems.
Listeners completed two rounds, in each round giving ratings for four
different loudspeakers for each of the 4 different programs. Between rounds,
the speaker locations were changed. The order of rounds was randomized
among the ten different listening groups. Listeners remained in the same seat
locations throughout
all tests.
Listeners rated the loudspeakers
using a 10 point Preference Scale,
where higher ratings indicate greater preferences. This is not the same as the
Fidelity Scale used in earlier tests by Toole[1]. This scale is designed to
accentuate perceived differences between the loudspeakers[7].
The duration
of the entire test was about 20-30 minutes.
Excerpts from the 4 programs were digitally copied from CD on to a
hard-disc, edited into 30 second repeating loops, and then transferred
to R-DAT
for the test playbacks.
The programs used were:
ABBREVIATION
TC
LF
'PS
SS
PROGRAM
Tracy Chapman/Fast Car
Little Feat/Let the Good Times Roll
Paul Simon/Graceland
Full Orchestra/Stars
& Stripes
4 RESULTS
4.1 EXPERIMENT ONE
The ratings from both experiments were analyzed using a repeatedmeasures analysis of variance (ANOVA). This is an analysis which evaluates
the contribution of the individual variables, and the interactions between
them, to the variations in listeners' numerical judgments.
APPENDIX 1 shows the ANOVA table for each source of variance and
their interactions.
If the H-F value is <= 0.05 then the source had an effect on
the loudspeaker ratings with a probability of 95% that the listener responses
did not occur due to chance. "Method" is the variable "blind" vs. "sighted".
It is reassuring that the influential factors and interactions are those
that one would logically think should modify listener opinions. It is important
to note that, in a test of sound quality, not all of the important variables were
related to sound. Visual cues had several statistically significant influences.
·
Speaker (This is a very strong influence, as it should be.)
·
Speaker * Method (Seeing the products had a strong influence on
ratings of the loudspeakers)
·
Speaker * Seat (Where the listener sat in the room had a strong effect
on the ratings. This is an acoustical effect related to the peculiarities of the
listening room, and the directional properties of the loudspeakers.)
·
Speaker * Program (The choice of music affected the ratings. Listeners'
tastes in music occasionally are involved here but, much more important, are
the facts that (a) musical selections are not all equally good at revealing
problems in loudspeakers, and (b) that different recordings have different
spectral balances ("voices) which interact with the different "voicings" of the
loudspeakers.)
·
Speaker * Program * Experience (The choice of music affected opinions
of experienced listeners differently than those of inexperienced
listeners.
Some listeners, mainly inexperienced
ones, tend to stay with a first
impression, and not change it through quite wide variations in program
material.)
·
Speaker * Program * Experience * Method (As above, but seeing the
loudspeakers
made a difference)
·
Speaker * Round (The locations of the loudspeakers in the room made a
difference to the sound. This is a well known phenomenon.
It has been
scientifically demonstrated
that, in tests involving good and closely rated
loudspeakers,
the locations of the loudspeakers in the room can be the
dominant factor in determining
the ratings [8,9].)
·
Speaker * Round * Method (As above, but seeing the loudspeakers made
the differences matter less.)
The specifics of these factors are discussed
in the following sections.
4.1.1
Loudspeaker
Preferences
Combining
the results of all listeners
in all rounds, indicates
that models
"D" and "G' were slightly
to moderately
preferred
over "S" oald "T', Figure
1.
Be careful not to be misled by the scaling of this graph.
The total vertical
height is less than a third of the 10-point preference
scale.
In the normal
context of listening
tests, this is a small range of ratings, indicating
a fairly
close contest.
There were no truly bad loudspeakers
here.
t_
Z
"'
o
Z
Interaction
Bar Chart
Effect:
Speaker
Dependent:
PreferenceRatings
With 95% Confidence
error
8
i
i
7
bars.
i
i
T
1
I.J.I
1.1.1
1,1,.
©
,,,
TII
Z
I.IJ
',j: $
....
G
D
S
T
LOUDSPEAKER
Figure
1. Combined
results
of all tests, blind and sighted,
showing
means of preference
ratings for ali listeners.
Note that the vertical
only a portion of the 0 - 10 preference
scale.
the cell
scale is
The results show that the loudspeakers
fell into two closely
"G" and "D", and "S" and "T". The listeners
clearly preferred
the
the second pair but, within the pairs, the error bars indicate that
preference
that was statistically
important.
Remember
that these
contain a mixture of all tests, sighted and blind.
There is more to
rated pairs,
first pair to
they had no
results
this story.
4.1.2
Effect
Position
Loudspeaker
of
Test
Method
(Blind
Versus
Sighted)
There is now abundant
evidence
that the listening
positions
of loudspeakers
within the room are significant
and
room and the
influences
on
listener opinions
of loudspeakers.
In the blind tests, in which listeners
only the sound to rely on, preference
ratings were strongly
dependent
locations
of the loudspeakers,
Figure 2.
Interaction
Bar
had
on the
Chart
Effect:
Speaker
* Round * Method
Dependent:
PreferenceRatings
With 95% Confidence
error bars.
o
Z
LIJ
Z
L_J
_r
LIJ
L[
IJ.I
_r
I
II
O
Z
La
LOCATION
1
BLIND
LOCATION
2
BLIND
LOCATION
1
SIGHTED
Figure
2 A comparison
of results
the loudspeaker
locations.
for the blind
LOCATION
2
SIGHTED
and sighted
tests,
for each of
In fact, in the blind tests, where
opinions
were based solely on sound,
the location of the loudspeakers
in the room had more of an effect in the
preferences
of these loudspeakers,
than the loudspeakers
themselves.
In
location
1, the listeners
exhibited
no clear preference
for any of the four
loudspeakers.
In location
2, however,
there were strong preferences.
Obviously,
important
difference
in sound quality were introduced
by the
position
factor.
However, in the sighted
tests, the ratings were very strongly
differentiated,
and they did not change appreciably
between the two sets of
room locations.
In other words, in this test, when listeners
knew what they
were listening
to, the opinions
were dictated
more by the product
identity
than
by the sound.
If we isolate the visual and political factors we have the following
possible
scenario.
It is easy to believe
that loudspeakers
"G' and "D" would be
viewed favorably
because they were the most expensive,
the largest,
quite
attractive,
and they were products
of the company
that employed
the hsteners.
Loudspeaker
'T" was slightly
smaller,
slightly
less expensive,
a prestige
product,
but made by a competitor.
Loudspeaker
"S' was absolutely
tiny,
relatively
inexpensive,
and plastic.
It was a product
of the host company,
but
could anything
that small and cheap be any good? Many listeners
in the
sighted tests admitted
afterwards
that before the music even started they
believed
that loudspeaker
"S' would sound inferior,
although
they admitted
its
strong
performance
surprised
them.
Combining
the data from tests in both of the loudspeaker
locations,
Figure 3 isolates the effect of seeing the products
that are being evaluated.
The separation
of the products
into two groups is dear in both results, but the
preference
of loudspeaker
"T' over loudspeaker
"S" is evident
only in the
"sighted"
results.
Interaction
Bar Chart
Effect:
Speaker
* Method
Dependent:
PreferenceRatings
With 95%-Confidence
error
bars.
o
Z
re
W
O
Z
W
t'e
W
tL
LU
m'
I
©
Z
W
_r
Figure
3
combining
BLIND
SIGHTED
A comparison
of listener preferences
in blind and sighted tests,
the results of tests performed
in two loudspea_ker
locations.
Some of us would like to think that we can ignore visual factors when
at opinions of sound quality..,
but can we7
An overall effect that is interesting,
but not of any real consequence
here, is that listeners in the "sighted"
tests (both experienced
and
inexperienced
) used higher ratings compared
to listeners
in the blind tests,
Figure 4. We can speculate
that listeners
may use higher ratings when they
can see the products
because
they are more confident
in their opinions,
or
they are less concerned
about revealing
inconsistencies
in their judgments,
or
both.
arriving
_l_
8
I
I
Z
W
0
T
ill
w
T
l.l.
III
ell
LL
6
©
Z
w
_
I
l
BLIND
Figure
4
by listeners
Averaged
in blind
SIGHTED
across all experiments,
and sighted tests.
a comparison
of the ratings
used
Z
[-
W
:ir
r'l
6
o, T ;[i il ij ij [i j[ i,
Z
w
zo
'"
=
u.
ILl
7
S
!iiiiiiiiiiiiiii[iiiiiiiiiiiii
BLIND
[]
[]
Figure
listeners
T
T
:iiii!iiiiiii!il!!iiiliil
SIGHTED
EXPERIENCED LISTENERS
INEXPERIENCED LISTENERS
5 A comparison
of the ratings
in blind and sighted tests
used
by experienced
and inexperienced
4.1.3
The
Effect
of
Listener
Experience
Separating
the listeners
by experience,
Figure
5, it becomes
clear that it
is the experienced
listeners
in the blind tests that caused
the strongest
differentiation.
Experienced
listeners
used lower ratings
than inexperienced
listeners
in the blind tests but in the sighted
tests the difference
disappeared.
While it is interesting
to speculate
about why this occurs,
the absolute
ratings used by listeners
are of no consequence
to the important
result, which
is the relative
ratings
of the products
under evaluation.
In this it can be
clearly
stated
that the inexperienced
male listeners
had the same loudspeaker
preferences
as the experienced
male listeners,
Figure
6.
Interaction
Bar Chart
Effect:
Speaker * Experience * Sex
Dependent:
PreferenceRatings
With 95% Confidence
error bars.
E_
Z
w
er
tlJ
Z
LIJ
W
LI.
LLI
er
O.
u.
m
O
Z
ila
_r
EXPERIENCED
MALES
Figure 6
experience.
Loudspeaker
4.1.4
Effect
The
INEXPERIENCED
MALES
EXPERIENCED
INEXPERIENCED
FEMALES
FEMALES
preferences
classified
by sex and listening
of Sex
There is a popular
belief that females
have different
preferences
in
loudspeakers
than males.
The folklore
is rich with tales of irreconcilable
differences
between
the sexes.
Some evidence
suggests
that other factors
may
have been involved,
such as price, size, style, loudness,
purchasing
priorities,
etc. Here, though,
we ignore everything
but the sound,
and ask the question.
Figure 6 shows that the opinions
of inexperienced
male and inexperienced
female
listeners
(see the two right-hand
histograms)
are remarkably
similar.
Viva la similaritY!
4.1.5
The
Effect
of Listener
Location
The interactions
among loudspeaker,
loudspeaker
position and listener
position are strong and complex. When comparing loudspeakers
that are
comparably good in terms of timbral accuracy, as these loudspeakers
are, one
must be aware of these effects ff the results are to be trusted.
In these
experiments
the physical differences
in loudspeaker
locations between rounds
were much smaller than differences between listener locations.
Listener
location is therefore the stronger variable.
Figure 7 shows that front row listeners in seats 1 and 2 had similar
loudspeaker
preferences,
although seat 2 listeners slightly preferred
model
"D'. Back row listeners (seats 3, 4 and 5) showed differing preferences.
t_
Z
er
LId
Z
ul
er
LU
i1
UJ
ee
i
[J.
©
Z
w
1/
FRONTROW
Figure
8 Loudspeaker
preference
BACKROW
ratings
as a function
of listener
location.
It is interesting
to look for trends in these data. Loudspeaker "G" stays
within about 0.3 of a preference rating, except for seat 5. Loudspeaker "D"
stays within about 0.5 of a preference rating, except for seat 4. Loudspeaker
"S' stays within 0.8 of a preference rating for all of the seats, or 0.5 with the
exception of seat 4. Loudspeaker "T" In contrast, spans 2.3 points on the scale.
To put this in perspective,
listeners are instructed
to separate ratings by
at least 0.5 if they have a "slight preference", by about 1.5 ff they have a
"moderate preference" and by more then 2.0 if they have a "strong
preference".
The wide variations in the rating of loudspeaker
"T" as a
function of listener position is an indicator of at least two important things:
(1) listener location is not to be ignored and, (2) loudspeaker
"T' does not have
a reliable relationship
with the listening situation.
Two possibilities come to
mind: (1) there are large variations In low-frequency
coupling as a function
of listener location, (2) loudspeaker
"T' has inconsistencies
in directivity
that
are revealed differently
in different loudspeaker/listener
orientations.
The
low-frequency variations
would apply to ali of the loudspeakers,
suggesting
10
that the second possibility may have some validity. This will be illustrated
in
section 6, when measurements
are discussed.
These data underline the very great importance of having a good
listening room and loudspeaker/listener
arrangement,
and knowing the
biases that can be introduced
by loudspeaker
or listener position within the
room. Thorough randomizing of these factors can help, but it prolongs the test
enormously.
It is better to avoid strong positional biases by working in an
acoustical environment
that is a known factor, something that is rarely
possible, as we all 'know. It is also essential to track listener responses as a
function of seat, since something of importance
may be revealed.
5 RESULTS
- EXPERIMENT
TWO
in this experiment the same four experienced
listeners, seated in the
same seats, did the experiment in blind and sighted methods, in that order.
The ANOVA table for this experiment
(see Appendix 2) shows significant
interactions
between Method * Speaker and Speaker * Round, both with H-F
values near 0.03.
Interaction
Bar Chart
Effect: Method * Speaker
Dependent: PreferenceRating
With 95% Confidence error bars.
bq 9
Z
m
8
m
Z
LU 7
1.1.1 6
cl.
u
© s
kl,l4
,,
G
[] BLIND
D
S
[]
T
SIGHTED
Figure 9 Blind vs. sighted ratings for the four loudspeakers,
four experienced
listeners in both tests.
using the same
Figure 9 shows that, in the blind tests, it was a very close contest, with
no strong preferences being evident.
The group means suggest a slight
preference for "D" and "S' over "T" and "G".
11
When the screen was removed
and the test repeated,
the results
were
very different.
Even with the same experienced
listeners,
in the same seats,
performing
both tests, seeing the loudspeakers
added the same sequence
of
strong biases that was seen in the results of Experiment
One, which used
different
listeners
in blind and sighted tests (see Figure 1). The biases: the
ratings of loudspeakers
"G" and "D" are increased
by amounts
suggestive
of
moderate
preference,
loudspeaker
"S" drops by an amount suggesting
slight
(decline in) preference,
and "T" increases
by an amount
suggesting
a slight
preference.
5.1
The
Effect
of
Program
Figure
10 shows that, in the blind tests, the ratings
varied
with
program,
something
that is to be expected,
and which is commonly
seen.
In
the sighted
tests, this effect almost completely
disappeared.
Obviously,
listeners'
opinions
were more attached
to the products
that they could see,
than they were to the differences
in sound associated
with program.
Interaction
Bar Chart
Effect:
Method * Program
Dependent:
PreferenceRating
With 95% Confidence
error
O
Z
bars.
8
e'e
kU
7
Z
m
klJ
_J:
5
,
_
TC
.
,,
LF
PS
SS
PROGRAM
I-i BLIND
Figure
sighted
10 The effect
tests.
of program
E_ SIGHTED
on preference
12
ratings
for both
blind
and
5.2
The
Effect
of
Loudspeaker
Position
Figure 11 shows that the locations
of the loudspeakers
had strong effects
on the ratings in the blind tests (open bars), while in the sighted tests (dark
bars), the speaker placements
had little effect on the ratings.
Just as in
Experiment
1, the fact that the listeners
knew what was being listened
to
caused them to be much less responsive
to real differences
in sound quality.
Interaction
Bar
Chart
Effect: Method * Speaker * Round
Dependent:
PreferenceRating
With 95% Confidence
error bars.
I L........--..._
Lu
_:
G
POS.1
G
POS.2
[]
D
POS.2
S
POS.1
BLIND
Figure
11.
Preference
blind and sighted tests.
6
D
POS.1
ratings
[]
as a function
S
POS.2
T
POS.1
T
POS.2
SIGHTED
of loudspeaker
position,
for both
MEASUREMENTS
This project did not set out to be a test of opinion
vs. measurements,
but
temptation
is strong
to have a look at some limited objective
datm
Anechoic
measurements
at 2 meters were performed
on the four
loudspeakers
on axis, and 30 and 60 degrees
horizontally
off axis. These are
shown in Appendices
3(a - d). Note that the measurements
below 100 Hz are not
accurate
and should
be ignored.
Measurements
of this kind constitute
the
absolute
minimum
useful
data for loudspeaker
assessment.
Nevertheless,
they
can provide
important
insights
into why the products
might
have performed
the way they did.
Loudspeakers
"G _ and "D' (appendices
3(a) and (b) respectively)
reveal
their common
origin,
but there are significant
differences.
Overall,
they are
the
13
well behaved, exhibiting relatively smooth, relatively flat axial curves, wide
dispersion, and good directional uniformity.
Viewed overall, these are
creditable performances.
Both systems are relatively free from resonant
colorations, but loudspeaker
"D' is the less refined of the pair. There are also
distinctive spectral balances, with loudspeaker "G' being the brighter, more
treble-biased, and loudspeaker "D' exhibiting a more temperate top end. Still
these are not exaggerated cases, and this is supported by the high ratings that
both received in the listening tests. The lack of a clear preference for either
of these loudspeakers
is evidence of listeners being divided in their
acceptance of their different attributes.
Loudspeaker "S' (Appendix 3(c) ) is also well behaved on and off axis.
Directivity is relatively constant, failing noticeably only when the tweeter
tums on just above 4 kHz. Overall, the level above 400 Hz is slightly elevated
which might make instruments
whose fundamentals
fall below this frequency
sound thin or forward. Of course, this will be dependent on how the separate
subwoofer sums with the satellite in the room. This is a more than respectable
performance,
especially in this class of product. In the end, though, it is a
small loudspeaker, with inexpensive components.
As a result low bass output is
limited, and at very high sound levels there are limitations.
Its creditable
performance in the blind listening tests is evidence of a design that does many
things well, most of the time.
Loudspeaker "T' (Appendix 3 (d) presents a more complex situation.
On
axis, it has a slightly bright balance and a small interference dip at 1 kHz. The
dip is likely to be sensitive to vertical angle, but the measurement
was made on
the intended listening axis. By itself, it is not a large problem, but coupled
with the obvious directional inconsistencies
revealed in the off-axis curves, it
becomes an issue. The inconsistent directivity is right in the middle of the
very important voice-frequency
range, and causes audible coloration in this
range. The variable directivity also means that the sound quality will change
with both loudspeaker
and listener position. Still, it has other virtues, such as
extended low bass performance,
and the ability to play moderately loud without
distress.
In Section 4.1.5 it was speculated that loudspeaker "T' might have
inconsistencies
in directivity that could account for the strong positiondependent variations in preference rating. That speculation appears to have
been correct.
To sum up, even these simple measurements
are sufficient to reassure us
that the results of these subjective tests have a basis in physical reality. A
more exhaustive inquiry would be interesting.
7 CONCLUSIONS
"It is important to note that, in a test of sound quality, not all of the important
variables were related to sound. Visual cues had several statisticaIly
significant influences. ' (Section 4.1 )
"In the normal context of listening tests, this is a small range of ratings,
indicating a fairly close contest. There were no truly bad loudspeakers here."
(Section 4.1.1) Some of the following conclusions may have been different if
differences between the products had been greater.
14
"... when listeners knew what they were listening to, the opinions were
dictated more by the product identity than by the sound:
(Section 4.1.1 ) The
strength of the biases would be different in a test with products having
greater performance
differences.
Nevertheless, the visual biases would still
be present as unwanted influences.
"It can be clearly stated that the inexperienced male listeners had the same
loudspeaker preferences as the experienced mode listeners.'
(Section 4.1.3)
a race this close, it is clear that we had some very canny inexperienced
listeners.
"the opinions of inexperienced male and inexperienced female listeners
remarkably similar. Viva la similaritd! "(Section 4.1.3)
In
are
_These data underline the very great importance of having a good
listening room and loudspeaker/listener
arTangement, and knowing the
biases that can be introduced by loudspeaker or listener position within the
room. Thorough randomizing of these factors can help, but it prolongs the test
enormously.
It is better to avoid strong positional biases by working in an
acoustical environment that is a known factor, something that is rarely
possible, as we ail know. It is also essent/al to track listener responses as a
function of seat, since something of importance may be revealed. ' (Section
4.1.5)
"Even with the same experienced listeners performing both tests, seeing the
loudspeakers added the same sequence of strong biases that was seen in the
results of Experiment One, with different listeners in the blind and sighted
tests.' (Section 5) No one, it seems, is totally immune to the effect of visual
biases.
"Obviously, listeners' opinions were more attached to the products that they
could see, than they were to the differences in sound associated with
program." (Section 5.1)
"... the fact that the listeners knew what was being listened to caused them to
be much less responsive to real differences in sound quality. [caused by
changes in Ioudspeakerposition
in the room]" (page 11 ) If your opinion of
Brand X were already on record, would you change it if you thought the same
loudspeaker sounded different in another test? It could also be a special case
of selective perception.
In summary, in listening tests where the audible differences between
products were not difficult to hear, knowledge of product identity while
listening had profound effects on listener opinions. In some instances, altered
listener preferences resulted from listeners being less responsive to audible
differences in the sighted tests than they were in the blind tests. For example:
(a) they were less responsive to differences caused by loudspeaker location in
the room, and (b) they were less responsive to differences associated with
program material.
Overall, though, it was clear that the psychological
factor of simply
revealing the identities of the products altered the preference ratings by
amounts that were comparable with any physical factor examined in these
tests, including the differences between the products themselves.
That an
effect of this kind should be observed is not remarkable, nor is it unexpected.
15
What is surprising is that the effect is so strong, and that it applies about
equally to experienced
and inexperienced
listeners.
Since all of this is independent
of the sounds arriving at the listeners'
ears, we are led to conclude that, under some circumstances,
believing is
hearing!
The bottom line: if you want to know how a loudspeaker truly sounds.
you would be well advised do the listening tests "blind".
8 REFERENCES
1. Toole, F.F_ "Loudspeaker
Measurements
and Their Relationship to Listener
Preferences", J. Audio Eng, Soc., vol. 34, pt. 1 pp.227-235 (1986 April), pt. 2, pp.
323-348 (1986 May).
2.
Toole, F.E. and S.E. Olive, "The Modification of Timbre by Resonances:
Perception and Measurement", 2[.Audio Eng, Soc., vol. 36, pp. 122-142 (1988
March).
3. Toole, F.E., "Loudspeakers and Rooms for Stereophonic Sound
Reproduction",
Proceedings of the 8th International
Conference, Audio Fang,
Soc. (1990 May).
4.
Gabrielsson, A. and Sjogren, H., "Perceived Sound Quality of Sound
Reproducing Systems", i. Acoust. Soc. Am., vol. 65, 1019 (1979).
5. Gould, Glenn, "An Experiment in Listening - Who are the Most Perceptive
listeners", High Fidelity Magazine, vol.25, pp. 54-59 (August 1975).
6. Toole, F.E., "Subjective Measurements
of Loudspeaker Sound
Quality and
Listener Performance", I. Audio Eng. Soc., vol. 33, 2 (1985).
7. Toole, F.E, USubjective Evaluation", in "Loudspeaker and Headphone
Handbook", Second Edition, edited by John Borwick, Butterworths, London (in
press).
8. P.L. Schuck, S. Olive, J. Ryan, F. E. Toole, S. Sally, M. Bonneville, E.
Verreault, K. Momtahan, "Perception of Reproduced Sound in Rooms: Some
Results from the Athena Project", pp.49-73, Proceedings of the 12th
international
Conference, Audio Eng. Soc. (1993 June).
9. S.E. Olive, P. Schuck, S. Sally, M. Bonneville, "The Effects of Loudspeaker
Placement on Listeners' Preference Ratings", 93rd Convention, Audio Eng.
Soc., preprint no. 3350 (1992 Oct.).
16
Type
III
Sums
of Squares
;ource
Experience
Method
Seat
APPENDIX
df
I
1
4
Sum of Sq...
81.988
74.625
10.277
Mean Sq...
81.988
74.625
2.569
F-Value
3.777
3.438
.118
1
P-Value
.0662
.0785
.9744
Experience * Method
]
153.770
153.770
7.084
.0150
Expedence * Seat
Method * Seat
4
4
108.401
163.355
27.100
40.839
1.248
1.881
.3227
.1531
23.213
1.069
.3976
Z1.707
68.887
i Experience * Method * Seat
G-G
H-F
4
92.852
20
3
434.130
206.662
9.346
.0001
.0001
3
3
25.876
87.940
8.625
29.313
1.170
3.977
.3287
.0119
.3284i .3287
.0124 , .0119
!Speaker * Seat
]Speaker * Experience * Method
12
3
166.372
39.672
13.864
13.224
1.881
1.794
.0552
.1580
.0566 _.0552
.1591 .1580
]Speaker * Experience * Seat
Speaker * Method * Seat
12
12
91.860
70.744
7.655
5.895
1.039
.800
.4270
.6490
.4269 ' .4270
.6471 .6490
!Speaker * Experience * Method * Seat
12
99.301
8.275
1.123
.3596
.3602
60
3
3
442.267
3.365
1.200
7.371
1.122
.400
2.062
.736
.1148 .1313 .1148
.5349 .5047 .5349
3
12
.989
7.704
.330
.642
.606
1.180
.6137
.3178
iSubject(Group)
Speaker
iSpeaker * Experience
i Speaker * Method
Speaker * Subject(Group)
Program
Program* Experience
Program * Method
i
iProgram * Seat
Program * Expedence* Method
Program * Expedence * Seat
Program * Method * Seat
Program * Experience * Method * Seat
i Program * Subject(Group)
Round
3
.436
.5748
.3290
.0001
.3596
.6137
.3178
.145
.267
.8488 .8000 .8488
12
12
4.030
5.743
.336
.479
.617
.880
.8191
.5712
.7818
.5529
.8191
.5712
12
60
1
3.708
32.637
.721
.309
.544
.721
.568
.8589
.8218
.8589
.684
.4180
.4180
.4180
!Round* Expedence
1
2.618
2.618
2.482
Round * Method
,Round * Seat
I
4
.295
.411
.295
.103
.280
.097
.6026 ] .6026
.9821 .9821
.6026
.9821
;Round * Experience * Method
'Round * Experience * Seat
1
4
.087
3.581
.087
.895
.083
.849
.7764 i .7764
.5109 i .5109
.7764
.5109
:Round * Method * Seat
.1308 I .1308 .1308
4
1.011
.253
.240
.9126 ' .9126
.9126
4
20
9
3.837
21.094
23.334
.959
1.055
2.593
.910
.4774 ' .4774
.4774
3.173
.0014 ' .0136
.0014
9
9
15.741
6.245
1.749
.694
2.140
.849
.0283
.5719
.0742
.5092
.0283
.5719
Speaker * Program * Seat
36
19.478
.541
.662
.9281
.8395
.9281
Speaker * Program * Experience * Method
Speaker * Program * Expedence * Seat
9
36
16.100
24.804
1.789
.689
2.189
.843
.0247
.7219
.0686 .0247
.6463 .7219
Speaker * Program * Method * Seat
Speaker * Program * Experience * Method ...
36
36
26.950
26.963
.749
.749
.916
.917
.6092
.6086
.5617
.5612
.6092
.6086
180
3
147.085
34.095
.817
11.365
3.451
.0220
.0370
.0220
3
3
17.365
30.266
5.788
10.089
1.758
3.063
.1649
.0348
.1819
.0528
.1649
.0348
12
3
40.982
5.932
3.415
1.977
1.037
.600
.4283
.6173
.4262
.5667
.4283
.6173
' Round * Experience * Method * Seat
ZRound * Subject(Group)
' Speaker * Program
Speaker * Program * Experience
Speaker * Program * Method
Speaker * Program * Subject(Group)
Speaker * Round
Speaker * Round * Experience
Speaker* Round * Method
Speaker * Round * Seat
Speaker * Round * Experience * Method
Speaker * Round * Experience * Seat
12
27.389
2.282
.693
.7516
.7068
.7516
Speaker * Round * Method * Seat
12
15.472
1.289
.392
.9615
.9291
.9615
Type
III
Sums
APPENDIX 2
of Squares
ource
Subject
Method
df Sum of Squares
3
40.199
1
41.120
Mean Square
13.400
41.120
F-Value
P-Value
G-G
H-F
5.090 !
.10931 .1093 .1093
Method* Subject
Speaker
3
3
24.235
31.226
8.078
10.409
1.361
,3156! .3Z89 .3;)58
Speaker * Subject
Program
9
3
68.831
2.264
7.648
.755
.885
.4850!
.4364
Program * Subject
Round
9
1
7.678
.056
.853
.056
.099
.7739
.77391 .7739
Round * Subject
Method * Speaker
3
3
1.713
41,131
.571
13,710
4.624
.0320
.0751
.0320
Method * Speaker * Sub_.
Method * Program
9
3
26.685
3.016
2.965
1.005
3,130
,0802
.1191
.0802
Method * Program * Su...
Speaker * Program
9
9
2.890
5.203
.321
.578
1.566
.1758
.2778
.1758
Speaker * Program * S...
Method* Round
27
1
9.964
.063
.369
.063
Method * Round* Subject
3
4.193
Speaker * Round
Speaker * Round * Subj...
3
9
23.801
15,931
Program* Round
Program * Round * Subj...
3
9
2,573
1.704
.858
.189
Method * Speaker * Pro...
Method * Speaker * Pro,.,
9
27
.630
9 348
.070
.346
Method * Speaker * Rou...
Method * Speaker * Rou...
3
9
1 Z.163
9.193
4.054
1.021
Method * Program * Ro...
Method * Program * Ro...
3
9
.547
1.129
.045
.4636
.8461 .8461 .8461
1.398
7.934
1.770
4.4821
.0347
4.531
.0337 .1050 .0789
.2021
.0616
.0347
.9918
.8493
.9918
3.970
.0468
.1115
.0698
.182
.1Z5
1.4521
.2916
.2998
.2916
Speaker * Program * R...
9
2.106
.234
.779
.6369
_5188
.6369
Speaker * Program * R...
Method * Speaker * Pro...
Z7
9
8.110
;).905
.300
.323
1.685
.1417
.2573
.1439
Method * Speaker * Pro...
Dependent: PreferenceRating
27
5.173
.192
APPENDIX 3
ao.o
,
- .-
lS
15.0
5.0
10.0
o.oq
1
-5.0
-10.0
..... ;;''
i!i:
ove_plot
L
I
30
;
·0
]
.....
t , , , , I
-tOO. 0
, , , [ iI
1000
log Frequency
,
,
.0
i i , , I
10000
'
,0
Hz
Appendix 3 (a)
Free-field frequency response measurements of
loudspeaker "G" showing (top to bottom) 0 ° (on the listening axis), 30 ° off axis
horizontally
and 60 ° off axis horizontally.
The data were 0.17-octave smoothed.
Data below about 100 Hz are not accurate.
[ : _i
-ts.o-
i.. :. i i
i
i...........
:_..i .....
S
.'}
........
n
tO. 0
·
5.0
-'i
-s--
-
0·0
-5 ·0
-.tO .0
-15.0
-20.0
-25.0
ove_p
lot
:
23'{i).
0
100.0
1000.0
log Frequenc 9
'
-tO000.0
Hz
Appendix 3 (b)
Free-field frequency response measurements of
loudspeaker
"D' showing (top to bottom) 0 ° (on the listening axis), 30 ° off axis
horizontally
and 60 ° off axis horizontally.
The data were 0.17-octave smoothed·
Data below about 100 Hz are not accurate.
APPENDIX 3
2s.o _!
-
i_L
......
1
,°-°_
i
.,.oil
-lo.o
0.0
i s
i A
....
15.o
q
......
.:
_ 1
i
!
......
i i
:
!
-20.0
i
!
i
Iny Frt_queiicy - Hz
Appendix 3 (c)
Free-field frequency response measurements of
loudspeaker "S" (satellite only, no subwoo£er) showing (top to bottom) 0 ° (on
the listening axis), 30 ° off axis horizontally and 60 ° off axis horizontally.
The
data were 0.17-octave smoothed.
Data below about 1 O0 Hz are not accurate.
_5.0
_
.......
2!O.O_
t
4
:
ls.oj
_
....
ii
......
'
..........
_
:-
'
:
'
-
:
......
_ -'
-
......
t'
I
5.0_
.'
-_'°i
': i
-lo.o
_
"
-20.0
-25.o
o,,e_,plot
_
i
/
_
;
' '
- :i
.........
.
:
i '''
.
:
;
;
_
'_
.i
'
.
.
:
:
_ .
:
! i
lO0.0
10000.0
1000.0
log Fpequency
Hz
Appendix 3 (d)
Free-field trequency response measurements of
loudspeaker "T" showing (top to bottom) 0 ° (on the listening axis), 30 ° off axis
horizontally and 60 ° off axis horizontally.
The data were 0.17-octave smoothed.
Data below about 100 Hz are not accurate.