as a PDF

ORIGINAL RESEARCH
The statistical distribution of the intensity of
pixels within spots of DNA microarrays: what
is the appropriate single-value representative?
Javier Nuñez-Garcia, 1 Vassilios Mersinias, 2 Kwang-Hyun Cho, 3 Colin P Smith, 2 Olaf Wolkenhauer 4
1
Veterinary Laboratories Agency, Addlestone, Surrey, UK; 2School of Biomedical and Life Sciences, University of Surrey,
Guildford, Surrey, UK; 3School of Electrical Engineering, University of Ulsan, Ulsan, Korea; 4 Department of Computer
Science, University of Rostock, Rostock, Germany
Abstract: This paper opens a discussion about an important issue in the analysis of data from spotted DNA microarrays: how to summarise
into a single value the distribution for the intensity values of the pixels within a spot. Although the most popular statistic used is the
median, there is no clear study demonstrating why it is more appropriate than other measures of central tendency such as the mean or the
mode. Here, we argue that the median intensity is not the most appropriate measure for many common cases and discuss a frequently
encountered case of a ‘doughnut’-shaped spot for which the mode is closest to the ‘expected’ spot intensity. For an ‘ideal’ spot with a clear
boundary and uniformly hybridised, the intensity of its pixels should approximately be normally distributed. In practical situations, these
two requirements are often not met due to the physical properties of pins and the particularities of the printing and hybridisation processes.
As a consequence, the distribution of the intensity of the pixels is usually negatively skewed. This asymmetry results in a larger displacement
for the mean and median than for the mode from the ideal situation mentioned above.
Keywords: microarrays, spot finding, intensity distribution, mean, median, mode
Introduction
An ideal spot on a DNA microarray would have the same
amount of hybridised genetic material in each of the pixels
within a well defined spot boundary. Under these conditions
the intensity of any pixels inside the spot would be highly
similar. If a normally distributed measurement error is
assumed (eg by the scanner), the distribution of the intensities
of the pixels within a spot would approximately follow a
normal distribution. In this ideal case, the mean, median and
mode would have almost identical values and they would
provide a close approximation of the ‘real’ intensity of the
spot.
Due to technical impediments, the above conditions are
far from being a reality in many DNA microarrays. In
addition to the error introduced by the scanner, there are other
significant factors that lower the intensity of the pixels within
a spot from the ‘real’ or ‘expected’ spot intensity. For
example, the ‘doughnut’ effect, detailed in Tran et al (2002),
or when in the printing process, genetic material is nonuniformly spread within a spot. This leads to different levels
of hybridisation and, thus, regions with different intensities
within a spot.
These factors result in a negatively skewed distribution
of the intensities of pixels within the spot. In Figure 1, the
Applied Bioinformatics 2003:2(4) 229–239
© 2003 Open Mind Journals Limited. All rights reserved.
histograms of the differences between the median (me), mean
(ma) and mode (mo) in both channels (signal, s; reference,
r) are shown (see Figures 1a, 1c and 1e and Figures 1b, 1d
and 1f, respectively) for a microarray example (see Appendix
2). Note that the mode is larger than the median and this is
larger than the mean for most of the spots. This shows the
negative asymmetry of the distributions of the intensities of
pixels within the spots. It is well known that for asymmetric
probability distributions there is not a clear single-value
candidate to summarise the central tendency in such a
distribution. The choice of such a measure depends on the
context of the problem. For example, in Hyndman (1995)
and Polonik (1995) the authors use nonsymmetric intervals
around the mode to summarise density functions. We found
that the asymmetry in the distribution of the intensities of
the pixels within a spot is significant enough to consider the
choice of this representative value as a crucial step in the
process to interpret array data.
Since the ‘real’ value of the intensity of a spot is unknown,
it is difficult to prove which approach is best. In principle,
Correspondence: Olaf Wolkenhauer, Department of Computer Science,
University of Rostock, Albert Einstein Str 21, 18059 Rostock, Germany;
tel +49 381 498 3335; fax +49 381 498 3336; email
wolkenhauer@informatik.uni-rostock.de
229
Nuñez-Garcia et al
a
b
c
d
e
f
Figure 1 Distribution of the difference between the median (me), mean (ma) and mode (mo) in the signal (s) and reference (r) channels; histograms (a), (c) and (e) and
histograms (b), (d) and (f), respectively.
all values in the scale of intensities are candidates to represent
the spot intensity.1 However, the set of adequate candidates
reduces to a few statistics. In this set are included the mean
value, the median and the mode, which are most commonly
provided by spot-finding software programs such as
ImaGene™ (BioDiscovery, http://www.biodiscovery.com/
imagene.asp) or GenePix ® (Axon Instruments, http://
www.axon.com/gn_GenePixSoftware.html).2 The mean
value is sensitive to outliers, especially when the sample size
is small and the sample values are high. Outliers are
produced, for example, in the spotting or scanning stages.
Within spots, pixel outliers frequently occur. For very bright
spots, outliers lower the mean value considerably. The
230
occurrence of outliers provides a good reason to discard the
use of the mean value. Although the median has become the
standard statistic to represent the intensity of a set of pixels
that form a spot, to the best of our knowledge there is no
detailed study showing that it is the most appropriate choice.
Some supervised steps in the process of scanning and
quantifying an image are based on ‘eye’ examination; for
example, adjusting the gain of the scanner, fitting a grid to
the spots or flagging anomalous spots. In what follows, we
show a simple example where the mode is favoured by ‘eye’
examination. In Figure 2, we show the original image of a
subgrid (Figure 2a) and the reconstructed images using the
mean (Figure 2b), the median (Figure 2c) and the mode
Applied Bioinformatics 2003:2(4)
Statistical properties of spots in DNA microarrays
a
b
c
d
Figure 2 Original image of our example microarray (a) and the reconstructed images using the mean (b), the median (c) and the mode (d).
(Figure 2d). The image reconstructed from the mode is the
most similar to the original one. Note that the human eye
cannot distinguish small variations of intensities (ie about
±5000 in the scale of 0–65 535). This means that Figure 2
reveals significant differences between the choice of statistics
and the evaluation of image properties by eye examination.
Since there does not exist any methodology to prove which
statistic is the most adequate as representative of the possible
values that a variable can take according to a probability
distribution, we considered this example a reasonable
motivation to investigate the distribution of the intensity
within spots. In the following section we investigate the
variability of the distribution of the intensities of the pixels
within a spot depending on different artifacts, such as the
Applied Bioinformatics 2003:2(4)
doughnut effect or the choice of the boundaries of the spots.
This is followed by a discussion of whether the asymmetries
of the distributions of the intensities of the pixels within the
spots for both channels influence the corresponding gene
expression profile, expressed as the log2 of the ratio of both
channels’ intensities.
Throughout this paper, we use examples of spots from a
microarray that is described in Appendix 2.
Distribution of the intensities of
pixels within a spot
As mentioned above, the ideal spot would have the same
amount of hybridised genetic material for each of the pixels
within a well defined spot boundary. This is difficult to
231
Nuñez-Garcia et al
a
b
Figure 3 Two different discerned spots (circles) from two different initial positions of the grid (squares).
distributions for the intensity of its pixels and for the intensity
of pixels in the background. Figure 3 and Figure 4 illustrate
this point. The regions considered as a spot are the pixels
inside the circle. In Figure 3, we show how ImaGene detected
in two different runs with different positions of the initial
grid (squares), two different boundaries (circles) for the same
spot. In Table 1, the values for some statistics corresponding
to both images are provided. In Figures 4a and 4b, the
distribution of the intensities within the spot is almost
normal.3 The less steep slope in the left tail of Figure 4b
(thick curve) is due to the low-intensity pixels in the centre
of the spot, caused by the doughnut effect. We can also see
the background intensity distribution (Figure 4b, thin curve).
In Figures 4c and 4d, some pixels have been added to the
spot and the asymmetry becomes more obvious due to the
lower intensity of the pixels included in the circle.
Consequently, the distribution of the background is becoming
‘more’ normal. In Figures 4e and 4f, this is even more
obvious; two local modes appear in the density function (the
2 highest peaks in the thick curve in Figure 4f). The mode
achieve with the technology used in most laboratories. In
Tran et al (2002), the authors detail this point and also the
doughnut effect. They point out that the cause of this effect
is the result of some type of crystallisation complex that
prevents penetration of the labelled target into the centre of
the spot, leading to an unhybridised area. The intensity of
the pixels of an ideal spot would follow a normal distribution
with a standard deviation depending on the accuracy of the
measurement apparatus. In this case, the mean, median and
mode will lead to the same value. However, this is only a
theoretical situation. Depending on where the spot-finding
software places the boundaries of the spot, we find different
Table 1 Different statistics of the distribution of the spots in
Figure 3
Mean
Median
Mode
10 746
11 109
10 816
11 197
12 557
12 557
Table 2 Parameters for the distributions of the spots in Figure 4
Spot diameter
(pixels)
10
12
14
232
Area of
signal
(pixels)
Signal
mean
Signal
median
Signal
mode
Area of
background
(pixels)
Background
mean
Background
median
Background
mode
81
113
149
18 716.3
16 764.4
13 846.3
19 969
17 269
14 686
20 970.5
20 047.4
19 953.8
360
328
292
2246.07
1311.69
895.582
844.5
757
701
891.694
882.046
819.154
Applied Bioinformatics 2003:2(4)
Statistical properties of spots in DNA microarrays
a
b
c
d
e
f
Figure 4 Different distributions ((b), (d) and (f)) of a spot intensity (thick curve) and the background intensity (thin curve), depending on the diameter of the
corresponding discerned spot (the circle in (a), (c) and (e)). In the histograms, the vertical lines correspond to the mean (dashed line), median (solid line) and mode
(dotted line) for each curve.
Applied Bioinformatics 2003:2(4)
233
Nuñez-Garcia et al
a
b
c
d
e
f
Figure 5 For a low-intensity spot, the different distributions ((b), (d) and (f)) of spot intensity (thick curve) and the background intensity (thin curve), depending on the
diameter of the corresponding discerned spot (the circle in (a), (c) and (e)). In the histograms, the vertical lines correspond to the mean (dashed line), median (solid line)
and mode (dotted line) for each curve.
234
Applied Bioinformatics 2003:2(4)
Statistical properties of spots in DNA microarrays
a
b
Figure 6 Plot of the mode (mo) against median (me) intensities for all the spots; signal channel (a) and reference channel (b).
Table 3 Parameters for the distributions of the spots in Figure 5
Spot diameter
(pixels)
10
12
14
Area of
signal
(pixels)
Signal
mean
Signal
median
Signal
mode
Area of
background
(pixels)
Background
mean
Background
median
Background
mode
81
113
149
4211.49
4024.66
3544.72
4221
4167
3654
4252.45
4245.58
4181.92
360
328
292
1163.06
930.015
793.387
761
698
657.5
641.003
615.323
602.351
Table 4 Background subtraction for the parameters given in
Table 2
Spot
diameter
(pixels)
10
12
14
Signal mean –
Signal median–
Signal mode–
background mean background median background mode
16 470.3
15 452.7
12 950.7
19 124.5
16 512
13 985
20 078.8
19 165.4
19 134.6
Table 5 Background subtraction for the parameters given in
Table 3
Spot
diameter
(pixels)
10
12
14
Signal mean –
Signal median–
Signal mode–
background mean background median background mode
3048.44
3094.65
2751.33
Applied Bioinformatics 2003:2(4)
3460
3469
2996.5
3611.44
3630.26
3579.57
on the right represents the lower intensity pixels due to the
doughnut effect plus some pixels with the same intensity at
the boundary of the spot. The mode on the left represents a
part of the background pixels that are also included in the
circle of the spot diameter. In Table 2, the values of the mean,
median and mode for the different spot diameters are shown.
This behaviour is enhanced when the intensities of the
pixels of a spot are high, since the pixels placed on the hole
(if it exists) and at the fuzzy boundary of the spot differ even
further from the rest of the more intense pixels of the spot.
Thus, the mean value, the median and the mode tend to
separate from each other. When the intensities of a spot are
low, the asymmetry tends to disappear due to the lower
difference in intensity between the spot and the background.
Consequently, the distance between the three statistics
decreases. Figure 5 shows the equivalent to Figure 4 for a
spot with low intensity. In Figure 6, the mode is plotted
against the median intensities for both channels for the spots
235
Nuñez-Garcia et al
Effect of the asymmetry on gene
expression
In this section we show that the distance between the mode
and the median can result in a statistically significant
difference when calculating the expression level of genes.
The standard gene expression level assumed here is the log2
transformation of the ratio of both channels’ intensities. For
a gene, i ∈ {1… n}, where n is the number of spots on the
array, this is:
(
s 
Ei = log 2  i 
 ri 
(1)
where si and ri are the intensities of spot i for the signal and
reference channel respectively. When the representative value
of the intensity distribution of spot i is given by the median
of the distribution, we write:
(
(
 s me 
Eime = log 2 rime
 i 
(2)
as the expression of gene i. When the mode is used we write:
(
(
 s mo 
Eimo = log 2  imo 
 ri 
(3)
Two genes are equally expressed if the proportion of the
signal measurement with respect to the reference value is
the same for both genes. Thus, our goal is to investigate if
the following difference is significant:
(
(
(
(
 s mo 
 s me 
Eimo − Eime = log 2  imo  − log 2  ime 
 ri 
 ri 
(4)
The properties of the log transformation allow us to rewrite
equation (4) as:
(
(
(
(
 s mo 
 r mo 
Eimo − Eime = log 2  ime  − log 2  i me 
 si 
 ri 
(5)
For a distribution of the intensity of the pixels within spot i,
the following element (where k = r, s) defines a measure of
proximity between the mode and median:
(
(
236
absolute error (not relative), since the final gene expression
is usually provided in fold changes.
(
of the example microarray. Note that the higher the values
of the mode and/or median are, the larger the deviation is
from the line y=x. By looking at the corresponding Tables 2
and 3, it is apparent that the mode is more robust to variations
in the diameter of the spot, even when the background is
subtracted (see Tables 4 and 5).
Following this reasoning, suppose that we want to correct
the two effects treated here (the doughnut effect and the
fuzziness at the boundary of the spots). Kim et al (2001) and
Brown et al (2001) suggest discarding an upper and lower
percentage of pixels before the histogram is calculated; 15%
in both sides and a 2σ length interval about the mean,
respectively. For the case shown in this paper, discarding
the same percentage at both sides of the sample of pixels
does not correct the asymmetry in the histogram. However,
the lower percentage threshold can be applied. The
recalculated histogram will keep the same mode but the mean
and median will increase, getting closer to the mode. With
this solution, the mean and median become dependent on
the value of the threshold. This is a drawback since there is
not an infallible method to decide its value. What would be
the intensity of the pixels that form the hole of the ‘doughnut’
if the hybridisation was perfectly performed? The intensity
would ‘probably’ be around the mode, where there are the
most repeated intensities, ie the most likely. From this point
of view, the mode seems a good representative since it is the
most independent of both effects.
Because of the small sample size of pixels within a spot
in comparison with the range of possible intensities, it is
necessary to estimate the probability density function of the
distribution of the intensities before the mode can be
calculated. This seems a disadvantage of the mode with
respect to the mean and median, since the mode depends on
the methodology used to estimate the density function or
histogram. In this work, kernel density estimation with
Gaussian functions was used. The only parameter to be
adjusted, on which the mode depends, is the bandwidth of
the Gaussians. We tried a range from 400 to 1500 for some
random spots, and the maximum variability found for the
mode was about 400 intensity units. This was verified for a
sample of spots covering the whole range of intensities. This
value is intensity-dependent. For spots with low intensity,
the intensities of the pixels within the spots are closer to
each other than for spots with higher intensity where the
intensity of the pixels are spread over a larger region. Thus,
in the first case the bandwidth parameter is not as crucial as
for the second case, in terms of the variability mentioned
above. However, high-intensity signals can support a higher
 k mo 
Sik = log 2  ime 
 ki 
(6)
Note that by taking the ratio we obtain a value that not only
depends on the Euclidean distance between the median and
the mode but also on their value. For example, consider two
distributions (i, j) with modes equal to 60 000 and 30 000,
Applied Bioinformatics 2003:2(4)
Statistical properties of spots in DNA microarrays
simo ri mo
=
sime ri me
(7)
and medians equal to 40 000 and 20 000, respectively. Then,
we have that Si is equal to Sj even though the Euclidean
distances between the modes and the medians, ie 60 000 –
40 000 and 30 000 – 20 000, are not equal.
The difference of gene expression levels given by the
s
r
median and the mode is equal to the difference S – S between
both channels. In Figure 7, the distribution of this difference
(see equation 4) is shown for the case study. The use of the
mode or the median to calculate the gene expression level
would not cause any significant difference if Eimo − Eime is
equal to zero for all i, or likewise if Sis − Sir is equal to zero
for all i. This only happens in the following two cases:
1. When simo = sime and ri mo = ri me , which implies that the
distributions for both channels are symmetric and
unimodal. This is the ideal case with no fuzzy boundaries
or doughnut effects.
2. When Sis = Sir but simo ≠ sime and/or ri mo ≠ ri me. This is the
case when:
This only occurs if a real number t exists such that
simo = t × ri mo and sime = t × ri me. This case, although
feasible, is unlikely.
In Figure 6, the mode is plotted against the median for the
signal and reference channels for the example array. We
observe that most of the spots have an asymmetry in the left
tail of the distribution with respect to the mode (the mode is
usually larger than the median and mean). This is due to the
factors mentioned in the introduction. Figure 8 shows the
histograms of Ss and Sr. Note that in the histogram for Ss the
mass is less concentrated near zero than in the histogram for
Sr. The mean values of both distributions are 0.08 and 1.28,
respectively. We also observe that Ss and Sr are positive for
most of the spots. This is consistent with Figure 6. Examples
of this asymmetry are shown in Figures 4 and 5.
A different method of representing gene expression was
introduced by Brody et al (2001): the log transformation of
the median of the ratios of both channels, pixel-by-pixel
within a spot. In Figure 9, the asymmetric intensity
histograms of both channels are shown for a spot of our array,
as well as the histogram of the ratios of the signal and
reference channels, pixel-by-pixel, as explained in Brody et
al (2001). We also observe that for this approach the
asymmetry found in the distributions of the intensities within
spots introduces significant differences when either the
median or the mode are used. For this particular spot, the
expression levels are 0.46, 0.52 and 0.55 using the mean,
median and mode, respectively.
Since in most practical experiments ‘ideal’ spots are not
consistently obtained, we conclude that there will be a
statistically significant difference when calculating the
a
b
Figure 7 Distribution of the difference of the gene expression (see equation 4)
using the median and mode of the pixels for every spot in the example microarray.
Figure 8 Histograms of indicators Ss (a) and Sr (b).
Applied Bioinformatics 2003:2(4)
237
Nuñez-Garcia et al
a
b
Figure 9 (a) Distribution of the intensities of a spot in both channels (r, reference channel; s, signal channel). (b) The gene expression given by the log transformation of
the median of the pixels’ ratios’ intensities.
expression level of genes using different statistics. A study
is needed to decide on the best value that summarises the
whole spot.
Conclusions
We have opened a discussion about the shortcomings of a
crucial part of generating gene expression data from spotted
microarrays. The use of the median as the representative
intensity of a spot is considered within the microarray
community as standard, although there is no convincing basis
for this. We have shown that it can make a significant
difference whether the median or the mode is used to
calculate gene expression levels as a log ratio of the intensities
in both channels. Thus, it is important to study which is the
appropriate representative statistic for the kind of
distributions that we obtain from the spots of microarrays.
The mean or the median are not always the best choice, as
this paper shows. Several examples were provided of typical
spots where the mode is more appropriate due to the
asymmetry of the distribution of the intensity of the pixels
within the spot. We have investigated the robustness of the
median, mean and mode in relation to the diameter of the
spot. In our case study, the mode was the best choice.
An important conclusion from this study is that for any
subsequent analysis of spot intensities (as representations of
expression levels), the original image rather than the output
of the spot-finding software should be considered as the ‘raw
data’. While image parameters continue to be inspected by
eye and the choice of spot statistics is unclear, it will be
important to have access to the raw image through databases.
Acknowledgements
The authors would like to thank the Welcome Trust funded
238
Bacterial Microarray Group at St George’s Hospital Medical
School in London and the TB group of the Department for
Environment, Food, and Rural Affairs at the Veterinary
Laboratories Agency, Weybridge. Olaf Wolkenhauer’s work
has been supported by a post-genomics grant of the UK
Department for the Environment, Food, and Rural Affairs
(DEFRA), conducted in collaboration with the Veterinary
Laboratories Agency (VLA), Weybridge.
Notes
1
Most commonly, each pixel of the image is stored with a 16-byte
resolution, which provides a set of 65 535 possible intensity values.
2
Definitions of mean, median and mode are provided in Appendix 1.
3
The density functions are calculated by Gaussian kernel estimation. The
mode is the intensity value where the density function achieves its
maximum.
NOTE: All websites accessed 22 December 2003.
Appendix 1
Mean, median and mode definitions
For any set of pixels, a density function, say f(·), that explains
the distribution of the intensity of the pixels can be calculated.
Histograms are the most popular density functions and the
easiest to construct. Some other techniques, such as kernel
density estimation, provide better featured density functions.
For example, they could provide infinitely derivable density
functions. A density function is fully informative of the
distribution of the intensities of a group of pixels, although
always more difficult to treat than a single number. Hence, it
is very convenient to summarise the density function into a
single number. The following question arises: which is the
best intensity value that represents the intensities of a set of
pixels or a spot? There does not exist a unique and infallible
answer to this question, as we argue in the introduction.
Applied Bioinformatics 2003:2(4)
Statistical properties of spots in DNA microarrays
There are some parameters or statistics that characterise
the distribution of a random variable. Three of the most used
are the:
• Mode, which is defined as the intensity for which the
density function achieved the maximum value. Note that
the maximum could be achieved at more than one
intensity. In this case the distribution is called bimodal if
there are two modes or multimodal in general.
• Median, which is defined as the intensity that divides
the density function in two halves, each with probability
of 0.5. It is also called the 50% percentile.
• Mean, denoted by µ, which is the ‘balance point’ of the
distribution, or as defined in physics, the centre of gravity.
It is calculated by the following formula:
µ = ∫R xf ( x )dx
where f:R→R is the density function with R the set of
real numbers and x ∈ R.
These statistics can also be calculated directly from the set
of pixels without the need of a density function. Thus, for a
given set of intensities, say x1 ... xn, the mean (also called
average in this case) is calculated by the formula:
1 n
∑ xi
n i =1
If we order the intensities from the smallest to the largest,
say x(1) ... x(n), the median intensity is equal to x( n / 2 ) or
( x( n / 2 ) + x( n / 2 +1) ) / 2 for an odd or even sample size n,
respectively. The mode, on the other hand, is the intensity
most repeated among the set of pixels. Note that a spot has
relatively few pixels compared with the size of the discrete
set of possible intensities. For example, for a tiff image with
16 bits per pixel, the possible intensities are 0, 1, 2 ... 65 535.
As a consequence, the most common case would be that
there is not any repeated intensity among the set of pixels.
Thus, the mode will need to be calculated using a density
function.
The only case for which these three statistics take the
same value is when the intensity of a group of pixels has a
symmetric and unimodal density function, such as a Gaussian
function which corresponds to a normal distributed random
variable.
µ=
Appendix 2
Microarray example: materials and methods
DNA microarrays, target labelling and hybridisation
While the design and production of Streptomyces DNA
microarrays will be reported elsewhere (Hotchkiss et unpub),
information can be found in the Streptomyces coelicolor
Applied Bioinformatics 2003:2(4)
Microarray Resource at the University of Surrey, Guildford,
UK (http://www.surrey.ac.uk/SBMS/Fgenomics/Micro
arrays/index.html). In brief, 150–500-bp PCR (polymerase
chain reaction) products representing ~7300 of the predicted
ORFs (open reading frames) of the fully sequenced S.
coelicolor A3(2) genome (Bentley et al 2002) were
robotically synthesised, purified and spotted on Corning
CMT-GAPS II glass slides (Corning, Acton, MA, USA).
Post-processed slides were used for hybridisation of the
probes with labelled cDNA or genomic DNA.
Total RNA was isolated from ‘mid-logarithmic’ plate
cultures of S. coelicolor MT1110 (SCP1-, SCP2-) grown on
Oxoid nutrient agar at 30 °C. For genomic DNA isolation,
S. coelicolor M145 was grown to stationary phase in shaken
flasks of yeast extract–malt extract liquid medium at 30 °C.
DNA was extracted and purified by the Kirby mix method.
Cy3-labelled cDNA and Cy5-labelled genomic DNA were
synthesised and hybridised on the array. Protocols on RNA
isolation, target labelling and array hybridisation can been
found at http://www.surrey.ac.uk/SBMS/Fgenomics/
Microarrays/index.html.
Scanning and image acquisition of the hybridised array
was performed with an Affymetrix® 428 scanner at 10-µm
resolution. The generated tiff images were quantified with
ImaGene v5.1 software (BioDiscovery, http://www.
biodiscovery.com/imagene.asp). Mathematica® v4 (Wolfram
Research, http://www.wolfram.com) was used for any further
analysis. A typical image of a segment of the Streptomyces
microarray is shown in Figure 2a.
References
Bentley S, Chater K, Cerdeno-Tárraga A, Challis G, Thomson N, James
K, Harris D, Quail M, Kieser H, Harper D et al. 2002. Complete
genome sequence of the model actinomycete Streptomyces coelicolor
A3(2). Nature, 417:141–7.
Brody J, Williams B, Wold B, Quake S. 2001. Significance and statistical
errors in the analysis of DNA microarray data. Proc Natl Acad Sci
USA, 99:12975–8.
Brown C, Goodwin P, Sorger P. 2001. Image metrics in the statistical
analysis of DNA microarray data. Proc Natl Acad Sci USA, 98:
8944–9.
Hotchkiss G, Mersinias V, Bucca G, Hinds J, Butcher P, Smith C.
Manuscript in preparation.
Hyndman R. 1995. Highest-density forecast regions for nonlinear nonnormal time series models. J Forecasting, 14:431–41.
Kim J, Kim H, Lee Y. 2001. A novel method using edge detection for
signal extraction from cDNA microarray image analysis. Exp Mol
Med, 33:83–8.
Polonik W. 1995. Measuring mass concentrations and estimating density
contour clusters – an excess mass approach. Ann Stat, 23:855–81.
Tran P, Peiffer D, Shin Y, Meek L, Brody J, Cho K. 2002. Microarray
optimizations: increasing spot accuracy and automated identification
of true microarray signals. Nucleic Acids Res, 30:E54.
239