How to Design Diverse Libraries of Solid Catalysts?* Catharina Klanner

How to Design Diverse Libraries of Solid Catalysts?
How to Design Diverse Libraries of Solid Catalysts?*
Catharina Klannera, David Farrussengb, Laurent Baumesc, Claude Mirodatosb, Ferdi Sch¸tha
a
b
c
MPI f¸r Kohlenforschung, Kaiser-Wilhelm-Platz 1, 45470 M¸lheim, Germany
Phone: ‡ 49-208-306 2373, Fax: ‡ 49-208-306 2995, email: schueth@kofo.mpg.de
Institut de Recherches sur la Catalyse ± CNRS 2, avenue A. Einstein, 69626 Villeurbanne Cedex, France
Equipe de Recherche en Inge¬nierie des Connaissances, Universite¬ Lumie¡re Lyon 2, Ba√timent L, 5, avenue P. Mende¡s-France,
69676 Bron Cedex, France
and Institut de Recherches sur la Catalyse ± CNRS 2, avenue A. Einstein, 69626 Villeurbanne Cedex, France
Review Article
1 Introduction
High throughput experimentation (HTE) in catalysis research and materials science has ± in spite of its relatively
short history ± already reached an impressive level of
sophistication with respect to synthetic methods [1], reactor
technology [2], and fast analytical assays [3], and several
review papers are available which cover these developments
[4]. In order to fully exploit the advantages associated with
the success in the above mentioned areas, equally sophisticated methods are required to manage the flow of data and
to extract useful information from these data. However,
suitable solutions to these problems, often even partial
solutions, are still lacking. Fully integrated and adapted
informatics tools to capture, store and treat the high
throughput workflow of data for heterogeneous catalysis
and materials are yet to be developed. Equally important,
efficient software based methods for library design, for
which developments on the fundamental level are still
necessary, are urgently needed to make full use of the novel
experimental tools.
Some of the problems are similar as in high throughput
drug discovery, where advanced software support solutions
followed the experimental developments. However, the
high complexity of solids and heterogeneously catalyzed
processes creates novel challenges going beyond those faced
in drug discovery. These challenges are often not acknowledged, yet, in the community. In this essay we will point out
often-underestimated fundamental differences between
drug discovery and materials science, which are faced in
software assisted library design for high throughput approaches. Even if HTE increases the screening power by
orders of magnitude, the number of potential experiments
to be carried out is infinite (see, for example, the considerations of Jansen concerning the number of possible solid
compounds [5]). Therefore new methodologies have to be
developed which allow the design of efficient libraries.
* We would like to acknowledge the FCI and the EU (MarieCurie-Program) for financial support and S. Kaskel for helpful
discussions.
QSAR Comb. Sci. 22 (2003)
DOI: 10.1002/qsar.200320003
Novel concepts and strategies for screening with specialized
software components are proposed to enhance discovery
and optimization rates.
HTE in heterogeneous catalysis and materials science
relies on the iterative preparation and testing of large
libraries of solids, either in a parallel mode or sequentially.
The process starts with the design of an initial set of catalysts,
which can be done either randomly, or following certain
rules, or be based on the experience and intuition of the
chemist who designs the library. Such an initial library is
then prepared and tested. After analyzing the results and
based on the analysis, a new set of experiments is designed.
This methodology is not fundamentally different from the
one used in the past in the search for novel catalysts and
processes. However, the role of the chemist drastically
changes because the numbers of experiments to be conducted and the amount of data to be collected and treated
are orders of magnitude higher. Without an efficient
informatics environment, it is impossible to plan and design
such vast numbers of experiments [6].
In the beginning of a high throughput discovery program
two possible starting situations can be identified: (i) screening is based on prior information and catalytic systems are
available which show some activity for the desired reaction,
or (ii) there is essentially no precedence of a catalyst, or the
systems previously investigated do not seem to have the
potential for further improvement. The first situation is
often described by the term ™optimization program∫, the
second situation subsumed under the label ™discovery
program∫.
2 Design of Focused Libraries in Optimization
Programs
In the first case, relevant and/or potentially relevant factors
are thought to be known so that libraries can be designed in a
well defined frame. Library design would concentrate on a
composition and parameter space around systems which are
known to work and which are varied in a systematic and
efficient way. In order to do this, some tools, which will be
briefly discussed in the following, are available. These tools
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
729
Catharina Klanner et al.
have already been used in conventional and also in high
throughput catalysis research.
The response surface, which is a modeling technique from
the DoE (Design of Experiments) toolbox, provides quantitative interpolations while minimizing the number of
experiments (for examples of the application in catalysis see
refs. [7] [8] [9]). The experiments are designed in such a way
that the multivariate regression results in a more robust
model with respect to statistical calculations. However, as
for all modeling techniques, a model has to be a priori
postulated and a posteriori validated. The main limitation is
that regressions can only operate on continuous variables. In
addition, this technique can hardly be applied for more than
eight parameters.
Expert system based methods have also been used to
select or to optimize catalysts for given applications even
before the advent of high throughput experimentation in
catalysis research. The expert systems were trained either by
input of heuristic knowledge, based on literature data or
experience [10], or neural networks were used to suggest
optimized catalysts by correlating crucial parameters with
performance. The latter approach was chosen by Hattori in
the nineties[11] [12] [13] for several examples with some
success but does not seem to have been widely adapted. The
reason for this may be, that initially only very limited
consistent data sets were available, on which these methods
could be applied. The situation, however, is changing, since
high throughput experimentation now provides the possibilities to generate large, consistent sets of performance data
in reasonable time frames. Artificial neural networks can
become efficient tools to guide the combinatorial development of catalysts and economize on experiment time. Thus,
in this context the use of neural networks will probably
become much more important in the future, and the first
publications are emerging [14].
Many optimization methods inspired by Darwin×s evolution theory (Genetic Algorithms, Evolutionary Strategies, ... )
have been developed for numerous and various purposes.
They are commonly grouped under the name of Evolutionary Algorithms. The principles of natural evolution
based on population selection, crossover and mutation are
their common features, and some applications in heterogeneous catalysis have been reported for optimizing catalysts.
The group of Baerns applied an evolutionary strategy to
optimize catalysts for the oxidative dehydrogenation of
propane [15] and in the low temperature oxidation of
propane [16]. Elements of the periodic table were a priori
selected according to heuristic knowledge to form the
parameter space which bound the search. After evaluation
of randomly selected quarternary formulations in a catalytic
flow system, the catalysts were ranked with respect to the
target criterion. The next generation was designed following
principles of biological evolution and thus an altered
catalyst population was generated. Optimization occurs
through generations of populations of trial solutions with
increasing average fitness. For oxidative dehydrogenation of
propane the conversion-selectivity plot as a function of the
730
number of generation shows clearly the convergence of the
algorithm after 4 generations towards a zone at 15%
conversion and 55% selectivity to propene achieved over
catalysts containing of V, Mg, Mo and Ga. Similar approaches as the one described above were recently presented at two conferences for different reactions [17].
3 Diversity and Descriptors
While rational methods for improving initial hits are thus
available, library design in case (ii) listed above is much
more difficult. If there is no useful guideline for the design of
an initial library, or if one deliberately decides to discard
previous ideas about the relevant factors influencing the
performance of solids in a given reaction (a reason for this
possibly being that so far discovery programs using conventional wisdom have failed), one will typically desire to
design a ™diverse∫ library in order to increase chances of
discovering regions in parameter space which would justify
further exploration. Such libraries are meant to result in the
discovery of compositions and conditions around which
more focused libraries can be constructed to optimize the
systems further. In the following the terms ™diverse∫ and
™descriptor∫ are central in the discussion. We will therefore
at this point explain what is meant by these terms which are
being used extensively in high throughput drug discovery.
There is no objective definition of ™molecular diversity∫ in
drug discovery [18], and the same holds for solids. We will
call a library of solids ™diverse∫, if it gives a maximum of
different responses in a certain target application. It would
also be possible to use the term ™diverse∫ to describe the
range of different solids used, i.e. with respect to composition, synthesis conditions, etc. However, for the application
of this term as outlined in the following, the first definition
is more appropriate. One should keep in mind though, that
the term ™diverse∫ is highly context dependent. As a simple
example for the definition introduced, in the catalytic
oxidation of propene a library of only four solids would
be called ™diverse∫, if under given conditions one catalyst
would be totally inactive, one catalyst would lead to total
combustion, one catalyst lead to the formation of acrolein
and one catalyst to the formation of oligomers. A nondiverse library would be one where all solids result in the
same reaction product under the same conditions. It is
clear that the diversity in a given application has to be
defined more precisely, for instance by describing many
classes of performance, into which catalysts and conditions
could be grouped. This classification could also entail
different responses to different reactions conditions, for
instance, different temperature levels, different feed compositions, pressure, residence time and so on. Even if the
definition has to be specified for a certain application, the
general principle of a diverse library should have become
clear.
Another key concept is that of a ™descriptor∫ which is also
known from drug discovery. This concept is explained in
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
QSAR Comb. Sci. 22 (2003)
How to Design Diverse Libraries of Solid Catalysts?
Figure 1. Exemplary representation of different attributes which can be assigned to a solid catalyst. The inner circle contains attributes
which are correlated to the entire solid. The middle circle represents the mass ratio of all elements in the catalyst, and the outer contains
parameters which are related to each element, respectively. Attributes and combinations of attributes which correlate with catalytic
activity are descriptors, which all together form a descriptor vector.
fig. 1: A catalyst in a catalytic reaction is characterized by
different attributes, which are connected with the catalyst
synthesis, the ingredients, the reaction conditions and so on.
Some of these attributes are irrelevant with respect to
catalytic performance, but others are thought to somehow
correlate with catalytic activity, and these latter ones are
called ™descriptors∫. A descriptor is not necessarily only just
one attribute, but also combinations of attributes may be
descriptors. This shall be exemplified for the case of acidic
isomerization catalysts. If zeolite based catalysts are considered, the aluminum content will certainly be important
for the performance. In the case of zirconias, however,
aluminum content is most probably an unimportant category, and instead the sulfate content plays a major role.
Thus the combination of silicon and aluminum (for zeolites)
or zirconium and sulfur (for sulfated zirconias) will be
descriptors which correlate with activity in isomerization
reactions. In addition, the performance of a catalyst will
typically not be related to one descriptor only, but to a
combination of different descriptors. For instance, in the
example mentioned above (reactions catalyzed by strong
acids), the presence of silicon and aluminum alone is not
sufficient to correlate to activity. Additional requirements
to induce activity are synthesis under hydrothermal conditions and the presence of protons obtained by ion
exchange and possibly other parameters. The individual
descriptors are thus combined to form a ™descriptor vector∫.
QSAR Comb. Sci. 22 (2003)
4 Concept of a Discovery Program
The discussion below will address the question how to create
diverse libraries for solids and how to identify a descriptor
vector. The major question in the design of a discovery
library is how to select the samples to be tested in order to
maximize the browsing. When no knowledge is available, on
which the design of an initial library can be based, it is
obvious, that one should try to cover the available compositional and parameter space as completely as possible with as
few as necessary experiments. ™Design of Experiments∫
(DoE) is a long established mathematical technique also in
catalysis research [19] which is very powerful for designing
experiments to optimize efficiency and to arrive at statistically significant conclusions concerning the influence of
parameters and their cross-correlations with a minimum
number of experiments. Software commercially available
can generate experimental matrices for the analysis of more
than 100 factors. However, they are in some cases not
sufficiently flexible to be adapted to problems like materials
synthesis, for which a set of synthetic rules with given
boundary conditions already exist. In addition, in a DoE
approach one needs to decide prior to design, which factors
should be analyzed, and it is not a priori clear whether these
are the ones which are the most relevant ones. DoE tools will
suggest the statistically optimally diverse library within the
preselected boundaries, but not de novo create diverse
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
731
Catharina Klanner et al.
libraries. Nevertheless, DoE techniques can be suitable
means to increase efficiency in high throughput experimentation.
In the following, however, we will discuss some concepts
how software assisted methods may actually go beyond a
statistical optimization and support the de novo design of
diverse libraries without predefining boundaries in which
to operate. In order to illustrate the relevant issues, we will
first discuss, how diverse libraries are created in drug
discovery programs and then contrast this with the problems
faced when dealing with solids and their catalytic performance.
The search for optimal diversity in libraries is an
important subfield in high throughput drug discovery
programs, and many methods have been proposed to
optimally design diverse libraries (see, for instance, two
excellent reviews [20] [21]). Catalysis research might be able
to build upon the experience gained in these applications.
4.1 The ™Similar Property Principle∫ in Drug Discovery
Scientists working in the field of drug discovery mostly rely
on the ™similar property principle∫ which states that
structurally similar molecules will exhibit similar physicochemical and biological activity [22] [23]. This assumption is
reasonable, since identical functionality, for instance the
presence of a donor group in a drug molecule, could be
expected to lead to similar binding properties, or similar
sizes and shapes of two molecules might lead to similar fit
into a pocket of a possible receptor. Although the exact
correlation between structure of a molecule and its performance as a drug is normally not known, computer algorithms
are used to identify ™similar∫ molecules. Then only selected
examples out of a class of ™similar∫ molecules are actually
tested and the number of syntheses and tests necessary is
dramatically reduced. In order to create diverse libraries,
one tries to select molecules from as many different classes
as possible so that maximally dissimilar molecules are
explored. The assignment of a molecule to a certain class is
made possible by representing the molecules in a computer
readable form by ™descriptors∫ valid for molecular entities.
These captured features of the molecule are thought to be
correlated with its function. The descriptors can, for
instance, be two-dimensional fingerprints, such as absence
or presence of certain chemical functionalities, or can be socalled pharmocophores, which relate to the relative spatial
arrangement of three selected chemical functionalities, or
physico-chemical properties, or many others [24]. Many
different of these descriptors have been suggested, and
typically, molecules are not only described by one descriptor
but by a whole set of descriptors, which represent the
molecule in the computer and which are the basis for
diversity analysis. Using these descriptors, the degree of
similarity of two molecules can be described by certain
parameters, such as the Tanimoto coefficient, or other
measures [25]. With these means the molecules are grouped
together, for which different algorithms are available, and
732
thus classes are identified where molecules belonging to the
same class are ™similar∫ and molecules from different
classes are ™dissimilar∫. For the creation of a diverse library,
only few molecules from each class are selected and actually
synthesized and tested.
4.2 Is there a ™Similar Property Principle∫ for Solids?
If a similar approach would be possible for solids, the
efficiency of HTE in catalysis and materials research could
be substantially increased. However, there are fundamental
differences between molecules and solids with respect to
diversity and with respect to their representation in a
computer. If the synthesis sequence for an organic molecule
is known, the identity of the resulting product is typically
known a priori, even if there may be complications by side
reactions or incomplete conversion, while the properties of a
solid can only be established a posteriori by analysis, after
the solid has been made. Synthesis planning of inorganic
solids is only in its infancy [5, 26], even considering just the
structure of solids, and even more so their textural
parameters and defect structures. There is thus no basis to
judge on the property of a solid based on the ingredients
alone and the information on the synthesis protocol. In drug
discovery, many of the possible descriptors can easily be
calculated from the two-dimensional molecular structure of
the molecule, quite similar to what chemists do intuitively
while looking at the structural formula. A simple representation as the two-dimensional structural formula of an
organic molecule is impossible for complex solids. Descriptors of solids thus have to be developed following different
lines, as will be discussed below.
Even more difficult is the question of diversity. The
diversity of a library of solids can be considered from two
sides, i.e. from the side of the solid, or from the side of the
performance of the solid in the desired application. This is
the same as in drug discovery. It can be the molecules which
are diverse, but it may also be their performance in a binding
test. However, due to the assumed validity of the similar
property principle, diversity in the molecule library should
correspond to diversity in performance in the case of drug
development. Generating diverse libraries in drug discovery
is therefore simpler than when dealing with solids ± even if it
still is a formidable task ± since the problem can be reduced
to analyze the molecules without initially considering their
performance.
If dealing with solid catalysts on the other hand, a high
diversity of the library of solids could be totally irrelevant
with respect to the catalytic performance of this library. As
pointed out by Schlˆgl [27] at a very early stage of the
development of HTE approaches towards heterogeneous
catalysis, properties of solids can change in a discontinuous
way with composition, for instance, if new phases are
developed or if a miscibility region is left and phase
separation occurs. These differences between solid state
chemistry and molecular chemistry have long been acknowledged in solid state chemistry and materials science
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
QSAR Comb. Sci. 22 (2003)
How to Design Diverse Libraries of Solid Catalysts?
[28]. In catalysis, also, the effect of additives on the promoter
level is strong when the promoter is present at low
concentrations, but might level out, if the concentration is
increased or could even be detrimental for the performance.
In addition to these points, the surface of a solid, on which
the reaction proceeds, is typically not uniform, but consists
of many different sites, many of which could be metastable
and therefore difficult to predict, the catalytically relevant
sites may in fact not be regular sites, but rather defects in the
solid. Thus, the underpinning for descriptor based preselection of compounds, which is used in the drug industry,
the similar property principle, is basically missing when
dealing with solids. A purely compositional diversity does
not need to result also in a performance diversity. Solids
which appear to be very similar, judging from their
composition, may have very different catalytic performance.
Gold in the form of very small particles performs radically
different than bulk gold, for example, although the catalysts
in both cases might consist of the same support and contain
identical amounts of gold [29]. It will thus be necessary to
develop novel ways which take into account these nonlinear
dependencies for the classification of solids to allow a
computer prescreening.
However, taking into account the nonlinearities for the
creation of a diverse library creates a problem: to do so, one
would need the information, how the performance of the
solid depends on its composition and on its preparation
parameters and this information is not available (except in
very simple cases, where, for instance, volcano shaped
dependencies between certain properties and performance
are known [30]). In fact, if it were available, one would not
need the high throughput approach, because the right
composition could be calculated back from the desired
performance.
4.3 Methodology to Design Discovery Libraries
How could one then develop suitable descriptors to predict
the performance diversity of a library of solid catalysts? We
propose an approach to design diverse discovery libraries
based upon such descriptors. Briefly, the process consists of
the following steps, some of which are discussed in more
detail below, some of which are only mentioned, because
they relate to information science problems which are less
relevant in a chemistry context, although crucial for the
success of the program: (i) a diverse based upon experience
and ™chemical intuition∫ collection of compounds is synthesized, (which means that also compounds are included
for which it is known that they do not have any interesting
catalytic properties in the target application) (ii) these
compounds are described by as many attributes as possible
and encoded in computer readable form (iii) the compound
collection is tested in a target application, (iii) the results of
the test together with the attributes classified and subjected
to algorithms for dimensionality reduction and correlation
to obtain the descriptor vector. (v) Finally a new diverse
QSAR Comb. Sci. 22 (2003)
library is created in silico, made and tested to validate the
descriptor vector.
The key requirement is thus to develop a set of descriptors
± regardless what they are ± which can predict diversity in an
initial catalyst library. Then the question arises, what kinds
of properties are potentially useful as elements of a
descriptor vector and how would one identify these
descriptors among the attributes of a catalyst? A descriptor
certainly needs to be a property which is either known from
the synthesis, or tabulated in a data base, or which can be
very easily calculated. Since descriptor vectors shall be used
for virtual prescreening in order to create a diverse library,
any descriptor which needs lengthy calculations or requires
the real synthesis of the solid is not useful. Possible
descriptors are then (i) synthesis related characteristics,
for instance, elements added to the synthesis and their
concentrations, additives used in the synthesis, synthesis
temperature, synthetic method, etc., (ii) tabulated data, for
instance, electronegativities of constituent elements, the
differences in electronegativities, melting points, enthalpy
of formation of different oxides of the constituent elements,
differences in the formation enthalpies of the different
oxides, number of readily accessible oxidation states of
constituents in oxides, ratios of ionic radii (relates to
structure), or (iii) calculated data, for instance, adsorption
enthalpies of reagents on constituents of the catalyst,
provided that the calculation is fast.
It is expected that a successful descriptor vector will have
many elements, since no simple correlations of properties
and the performance of solids in certain applications can be
expected for complex catalysts. The question, however, is,
how one would select those elements which can indeed be
used in a descriptor vector for a certain target application,
since no a priori information should enter.
To implement an approach for the identification of
significant descriptor vectors, a process lined out in the
following for the example of oxidation reactions seems to be
useful (Fig. 2). At the beginning, an initial library of solids
has to be designed and synthesized. The solids are chosen
using the criterion to be as diverse as possible. This initial
diverse library is created based on chemical intuition and
literature data, which can be supported by software based
visualization tools.
In the next step each solid is described as precisely as
possible. Every accessible information on the synthesis
procedure should be taken into account and stored in a
relational database. The synthesis related data is very
important because in a later validation step, not only
physicochemical data on the solid is needed, but also a
recipe for synthesis. Another part of the database contains
element specific attributes which have been measured for
almost all elements. Two major groups of fixed parameters
may be taken into account, i.e. physical properties related to
the elements themselves and values related to the element
oxides, in the case of an oxidation reaction. The first group
consists of physical properties such as e.g. boiling point, heat
capacity, electronegativity, and atomic radius, whereas the
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
733
Catharina Klanner et al.
Figure 2. Outline of a concept for the evaluation of descriptors for solids. a) Development of a descriptor vector b) De novo library
design. For detailed information see text.
second group contains parameters like e.g. dielectric constant, enthalpy of oxide formation, or density. Additionally,
the characteristics of the element ions such as ionic radius, or
possible oxidation states are taken into consideration. It is
obvious that no simple relationship will exist between such
descriptors of a solid and its performance, but combinations
of many parameters could allow to predict the diversity in
the performance of different catalysts.
Parallel to the storage of the characteristics in the data
base, the solids are really synthesized and their performance
evaluated in a catalytic test in a model reaction. A good
model reaction should lead to a variety of reaction products,
which assures a high diversity in the response of the library.
As a good model for the evaluation of hydrocarbon
oxidation catalysts the oxidation of propene is suggested.
The wider the variety of possible products the more
precisely the activity patterns can be described. With the
knowledge of the performance of each solid in a specific
reaction, all solids can be classified into clusters of similar
behavior. This can be done by well-known classification
techniques such as clustering and partitioning methods. It is
a priori not clear which method is the most suitable one, but
in drug discovery programs hierarchical methods have
proved to be superior to non-hierarchical clustering techniques if property data are used to describe molecules
[31] [32].
Now both parts, the measured response patterns, which
are grouped in clusters of similar response, and the catalysts
coded with their attributes, are accessible for correlation.
The aim is to find descriptor vectors which are discriminative with respect to the cluster assignment. In other
734
words, each cluster, corresponding to an unique activity
pattern, will be represented by a distinct set of attributes and
correlations of attributes, the descriptor vector. This descriptor vector has predictive power for diversity of the
catalytic behavior of this library of solids. One should not
underestimate the problems with respect to the algorithms
and data treatment associated with this work. The attributes
of the catalysts in a given reaction will have different data
formats, such as binary, continuous, nominal, ordinal, and
not all algorithms are equally well ± or at all ± suited to deal
with different data types. Furthermore, the number of
attributes is very large compared with the number of
catalysts, and reduction of the diversity space is thus
necessary. Since several of the attributes may be related
and not independent of each other, suitable methods to
identify these related attributes with subsequent elimination of the redundant ones have to be used. One of the most
frequently employed methods to do this is principle
component analysis. A discussion of these more technical
aspects, however, would go far beyond the scope of this
essay and we only mention it here to alert the reader to this
problem. Many techniques have been developed in drug
industry, and some of the solutions found there will be
transferable to catalyst libraries.
In a last step, the descriptor vector suggested on the basis
of the procedure outlined above has to be validated on
solids, which have not been used to identify the descriptors.
Therefore a new, very big library of solids has to be created
™in silico∫ at random, for instance by using randomizers to
create compositions and synthetic procedures which would
be used in the synthesis of the solids. Then the descriptor
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
QSAR Comb. Sci. 22 (2003)
How to Design Diverse Libraries of Solid Catalysts?
vectors of these randomly generated solids in the library are
determined. Classification algorithms could be used to
group these suggested catalysts in different classes, and then
only few solids from each class would be selected. By this
procedure, a smaller subset of the in silico library would be
identified which should still represent the diversity of the
huge parent library. Only these selected solids are then
indeed synthesized in order to verify whether the library is
diverse as predicted by the algorithm. For the verification,
the new library has to undergo a catalytic test, and the results
have to be classified like previously described. A comparison of the experimentally measured and the theoretically
predicted performance classes will reveal whether the
concept is applicable.
5 Conclusions
We have outlined the similarities and dissimilarities between library design in drug discovery and in heterogeneous
catalysis and suggested methods to create diverse libraries
of catalysts. This method of creating diverse libraries does
not need to assume the validity of the similar property
principle, since the descriptors are identified via a ™calibration∫ run which is assumed to be prototypical for a class or
reactions, such as propene oxidation may be a prototype for
alkene oxidation reactions, possibly even for hydrocarbon
oxidation reactions. A related approach has, in fact, also
been used in drug discovery programs. The so called
™affinity fingerprint∫ [33] [34] is not any more obtained
from the molecular structure, but by (initially) measuring
the binding constant of the molecules to be described
against certain reference proteins. The list of binding
constants (the ™affinity fingerprint∫) is used as a descriptor.
For the use of the affinity fingerprint, the validity of the
similar property principle is not necessary. Since it was
subsequently realized that binding constants could also be
calculated, so that no time consuming experimental determination of the binding constants are necessary any more, in
silico prescreening of molecule libraries is now possible and
successfully applied using the affinity fingerprint [35]. We
believe that the methodology outlined above could become
as useful in library design for HTE in catalysis.
In principle, such an approach would even be very useful
without its implementation in a HTE program, since the
discovery of descriptors actually generates ™knowledge∫ on
the performance of solid catalysts. However, the generation
of consistent datasets without employing high throughput
experimentation would need an excessive amount of experiment time. Thus, the very methodology of HTE which make
library design tools necessary, for the first time also allows to
develop powerful methods for this library design. If
approaches as the one lined out above are successful,
catalysis research will benefit to a much larger extent from
HTE approaches than just by the acceleration of synthesis
and testing. The identification and analysis of attributes
which are useful as descriptors will give a deeper insight in
QSAR Comb. Sci. 22 (2003)
the factors which are key for a certain catalytic performance
and will thus bring us closer to a real design of catalysts.
Eventually, at the very end, HTE approaches may make
themselves obsolete because our basis of consistent data is
so solid that sufficient ™knowledge∫ for the design of
catalysts is available without having the need to run
excessive discovery programs. In addition, in the future
high throughput technologies might help to advance a more
knowledge based catalyst development also in another
respect: In the previous discussion it was stated, that a
descriptor should be a property which is known or can
rapidly be computed to allow ™in silico∫ screening. If high
throughput characterization tools are sufficiently powerful,
also the information from physico-chemical characterization could be used to develop descriptors. This would still
mean that the full libraries have to be synthesized as
opposed to the pure ™in silico∫ screening, but the level of
physico-chemical understanding could be dramatically
improved if measured properties were incorporated.
The discussion suggests, that software assisted methods
for the creation of diverse libraries are more difficult to
implement in the case of solids as compared to drug-like
molecules. However, there are two aspects where dealing
with solids is easier, especially for catalytic applications.
First, if the concept lined out above is properly implemented, the software based library design tool will suggest how to
prepare a certain solid to be tested, since synthesis related
parameters are among the attributes used to encode the
virtual library. This is typically not the case for drug-like
molecular entities for which a whole software based
methodology for synthesis planning has been developed in
parallel to other tools used in high throughput drug
discovery [36]. Second, and perhaps more importantly,
although a solid is more difficult to describe than a molecule,
the response space is far more simple: in the case of solid
catalysts, one wants to predict their behavior if exposed to
gaseous or liquid reagents in a (normally) simple reactor.
Although this is difficult enough, in drug industry one tries
to predict the performance of a potential drug molecule in a
living body with its multitude of feedback loops and wide
variety of different environments. This is a problem of
formidable complexity. We are thus optimistic that in spite
of the problems related to the computer representation of
solids and the non-validity of the similar property principle
there is much room for software assisted strategies in the
discovery of solid catalysts and other materials.
References
[1] a) X. D. Xiang, X. Sun, G. Briceno, Y. Lou, K.-A. Wang, H.
Chang, W. G. Wallace-Freedman, S.-W. Chen, P. G. Schultz,
Science 1995, 268, 1738 ± 1740; b) D. E. Akporiaye, I. M.
Dahl, A. Karlsson, R. Wendelbo, Angew. Chem. 1998, 110,
629 ± 631; Angew. Chem. Int. Ed. 1998, 37, 609 ± 611; c) E.
Reddington, A. Sapienza, B. Gurau, R. Viswanathan, S.
Sarangapani, E. S. Smotkin, T. E. Mallouk, Science 1998, 280,
1735 ± 1737; d) J. Klein, C. W. Lehmann, H.-W. Schmidt, W. F.
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
735
Catharina Klanner et al.
Maier, Angew. Chem. 1998, 110, 3557 ± 3561; Angew. Chem.
Int. Ed. 1998, 37, 3369 ± 3372; e) C. Hoffmann, A. Wolf, F.
Sch¸th, Angew. Chem. 1999, 111, 2971 ± 2975; Angew. Chem.
Int. Ed. 1999, 38, 2800 ± 2803; f) T. Johann, A. Brenner, M.
Schickardi, O. Busch, F. Marlow, S. Schunk, F. Sch¸th, Angew.
Chem. 2002, 114, 3096 ± 3100; Angew. Chem. Int. Ed. 2002, 41,
2966 ± 2968.
[2] a) C. Hoffmann, A. Wolf, F. Sch¸th, Angew. Chem. 1999, 111,
2971 ± 2975; Angew. Chem. Int. Ed. 1999, 38, 2800 ± 2803; b) C.
Hoffmann, H.-W. Schmidt, F. Sch¸th, J. Catal. 2001, 198,
348 ± 354; c) P. Claus, D. Hˆnicke, T. Zech, Catal. Today 2001,
67, 319 ± 339; d) S. Thomson, C. Hoffmann, S. Ruthe, H.-W.
Schmidt, F. Sch¸th, Appl. Catal., A 2001, 220, 253 ± 264; e) U.
Rodemerck, P. Ignaszewski, M. Lucas, P. Claus, M. Baerns,
Top. Catal. 2000, 13, 249 ± 252.
[3] a) F. C. Moates, M. Somani, J. Annamalai, J. T. Richardson,
D. Luss, R. C. Willson, Ind. Eng. Chem. Res. 1996, 35, 4801 ±
4803; b) A. Holzwarth, H.-W. Schmidt, W. F. Maier, Angew.
Chem. 1998, 110, 2788 ± 2792; Angew. Chem. Int. Ed. 1998, 37,
2644 ± 2647; c) S. M. Senkan, Nature 1998, 394, 350 ± 353; d) E.
Reddington, A. Sapienza, B. Gurau, R. Viswanathan, S.
Sarangapani, E. S. Smotkin, T. E. Mallouk, Science 1998, 280,
1735 ± 1737; e) P. Cong, R. D. Doolen, Q. Fan, D. M. Giaquinta, S. Guan, E. W. McFarland, D. M. Pooray, K. Self,
H. W. Turner, W. H. Weinberg, Angew. Chem. 1999, 111,
507 ± 512; Angew. Chem. Int. Ed. 1999, 38, 483 ± 488; f) S. M.
Senkan, S. Ozturk, Angew. Chem. 1999, 111, 867 ± 871;
Angew. Chem. Int. Ed. 1999, 38, 791 ± 795; g) M. Orschel, J.
Klein, H.-W. Schmidt, W. F. Maier, Angew. Chem. 1999, 111,
2961 ± 2965; Angew. Chem. Int. Ed. 1999, 38, 2791 ± 2794;
h) H. Su, E. S. Yeung, J. Am. Chem. Soc. 2000, 122, 7422 ±
7423; i) C. M. Snively, G. Okarsdottir, J. Lauterbach, Angew.
Chem. 2001, 113, 3117 ± 3120; Angew. Chem. Int. Ed. 2001, 40,
3028 ± 3030; j) T. Johann, A. Brenner, M. Schickardi, O.
Busch, F. Marlow, S. Schunk, F. Sch¸th, Angew. Chem. 2002,
114, 3096 ± 3100; Angew. Chem. Int. Ed. 2002, 41, 2966 ± 2968;
k) O. Busch, C. Hoffmann, T. Johann, H.-W. Schmidt, W.
Strehlau, F. Sch¸th, J. Am. Chem. Soc., in print.
[4] a) B. Jandeleit, D. J. Schaefer, T. S. Powers, H. W. Turner,
W. H. Weinberg, Angew. Chem. 1999, 111, 2648 ± 2689;
Angew. Chem. Int. Ed. 1999, 38, 2494 ± 2532; b) S. Senkan,
Angew. Chem. 2001, 113, 322 ± 341; Angew. Chem. Int. Ed.
2001, 40, 312 ± 329; c) M. T. Reetz, Angew. Chem. 2001, 113,
292 ± 320; Angew. Chem. Int. Ed. 2001, 40, 284 ± 310; d) J. M.
Newsam, F. Sch¸th, Biotechnol. Bioeng. 1999, 61, 203 ± 216;.
e) F. Gennari, P. Seneci, S. Miertus, Catal. Rev. ± Sci. Eng.
2000, 42, 385 ± 402.
[5] M. Jansen, Angew. Chem. 2002, 114, 3896 ± 3917; Angew.
Chem. Int. Ed. 2002, 41, 3747 ± 3766.
[6] D. Farrusseng, L. Baumes, C. Hayaud, I. Vauthey, P. Denton,
C. Mirodatos in Principles and Methods for Accelerated
Catalyst Design (Eds.: E. Derouane et al.) NATO Science
Series, Kluwer Academic Publishers, Dordrecht, 2002,
p. 469 ± 479.
[7] J. P. Pirard, B. Kalitvenzeff, Ind. Eng. Chem. Fundam. 1978,
17, 11 ± 17.
[8] M. Iborra, J. F. Izquierdo, F. Cunill, J. Tejero, Ind. Eng. Chem.
Res. 1992, 31, 1840 ± 1848.
[9] P. Rao, S. Divakar, World J. Microbiol. Biotech. 2002, 18,
341 ± 345.
[10] a) E. Kˆrting, M. Baerns, Chem. Ing. Tech. 1990, 62, 365 ± 372;
b) M. Baerns, N. Guan, E. Kˆrting, U. Lindner, M. Lohrengel,
H. Papp, Int. J. Energy Res. 1994, 18, 197 ± 204.
[11] T. Hattori, S. Kito, Catal. Today 1995, 23, 347 ± 355.
736
[12] S. Kito, T. Hattori, Y. Murakami, Appl. Catal., A. 1994, 114,
L173 ± 178.
[13] S. Kito, T. Hattori, Y. Murakami, Ind. Eng. Chem. Res. 1992,
31, 979 ± 981.
[14] A. Corma, J. M. Serra, E. Argente, V. Botti, S. Valero, Chem.
Phys. Chem. 2002, 3, 939 ± 045.
[15] D. Wolf, O. V. Buyevskaya, M. Baerns, Appl. Catal., A 2000,
200, 63 ± 77.
[16] U. Rodemerck, D. Wolf, O. V. Buyevskaya, P. Claus, S. M.
Senkan, M. Baerns, Chem. Eng. J. 2002, 82, 3 ± 11.
[17] From Eurocombicat 2002: a) A. Corma, J. M. Serra, A. Chica,
Book of Abstracts p. 47 ± 48; b) G. Grubert, S. Kolf, L.
Cholinska, M. Baerns, P. van Geem, R. Parton, Book of
Abstracts p. 49 ± 51; c) J. M. Serra, A. Corma, D. Farrusseng,
L. Baumes, C. Mirodatos, C. Flego, C. Perego, Book of
Abstracts p. 98 ± 99.
[18] H. Kubinyi, Curr. Opin. Drug Discovery. Dev. 1998, 1, 16 ± 27.
[19] For instance: a) R. Ramos, M. Menendez, J. Santamaria,
Catal. Today 2000, 56, 239 ± 245; b) M. Nele, A. Vidal, D. L.
Bhering, J. C. Pinto, V. M. M. Salim, Appl. Catal,. A 1999, 178,
177 ± 189; c) E. A. Dawson, P. A. Barnes, Appl. Catal., A 1992,
90, 217 ± 231.
[20] J. Bajorath, J. Chem. Inf. Comput. Sci. 2001, 41, 233 ± 245.
[21] D. K. Agrafiotis, J. C. Myslik, F. R. Salemme, Mol. Diversity
1999, 4, 1 ± 22.
[22] G. M. Maggiora, M. A. Johnson, Concepts and Applications
of Molecular Similarity, Wiley, New York, 1990, p. 99 ± 117.
[23] R. D. Brown, Y. C. Martin, J. Chem. Inf. Comput. Sci. 1996,
36, 572 ± 584.
[24] R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, Weinheim, 2000.
[25] For a good discussion of similarity searching see, for instance:
P. Willett, J. M. Barnard, G. M. Downs, J. Chem. Inf. Comput.
Sci. 1998, 38, 983 ± 996.
[26] J. C. Schˆn, M. Jansen, Angew. Chem. 1996, 108, 1358 ± 1377;
Angew. Chem. Int. Ed. 1996, 35, 1286 ± 1304.
[27] R. Schlˆgl, Angew. Chem. 1998, 110, 2467 ± 2470; Angew.
Chem. Int. Ed. 1998, 37, 2333 ± 2336.
[28] H. Sch‰fer, Angew. Chem. 1971, 83, 35 ± 42; Angew. Chem.
Int. Ed. 1971, 10, 43 ± 50.
[29] M. Haruta, N. Yamada, T. Kobayashi, S. Iijima, J. Catal. 1989,
115, 301 ± 309.
[30] See, for instance, several examples in: G. Ertl, H. Knˆzinger,
J. Weitkamp (Eds.) Handbook of Heterogeneous Catalysis,
Wiley-VCH, Weinheim, 1997.
[31] G. M. Downs, P. Willett, W. Fisanick, J. Chem. Inf. Comput.
Sci. 1994, 34, 1094 ± 1102.
[32] R. D. Brown, Y. C. Martin, J. Chem. Inf. Comput. Sci. 1996,
36, 572 ± 584.
[33] J. N. Weinstein, K. W. Kohn, M. R. Grever, V. N. Viswanadhan, L. V. Rubinstein, A. P. Monks, D. A. Scudiero, L. Welch,
A. D. Koutsoukos, A. J. Chiausa, K. D. Paull, Science 1992,
258, 447 ± 451.
[34] L. M. Kauvar, D. L. Higgins, H. O. Villar, J. R. Sportsman, A.
Engvist-Goldstein, R. Bukar, K. E. Bauer, H. Dilley, D. M.
Rocke, Chem. Biol. 1995, 2, 107 ± 118.
[35] H. Briem, U. F. Lessel, Perspect. Drug Discovery Des. 2000,
20, 231 ± 244.
[36] a) W.-D. Ihlenfeldt, J. Gasteiger, Angew. Chem. 1995, 107,
2807 ± 2829; Angew. Chem. Int. Ed. 1995, 34, 2613 ± 2633; b) J.
Gasteiger, M. Pfˆrtner, M. Sitzmann, R. Hˆllering, O. Sacher,
T. Kostka, N. Karg, Perspect. Drug Discovery Des. 2000, 20,
1 ± 21.
Received on May 15, 2003; Accepted on May 8, 2003
¹ 2003 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
QSAR Comb. Sci. 22 (2003)