Comparing Merged Markush Service (MMS) and Marpat Search Results: Two Case Studies

ACS Symposium – Challenges In Structure Searching
April 8, 2007
Comparing Merged Markush Service
(MMS) and Marpat Search Results:
Two Case Studies
Joe Terlizzi
Questel, Inc.
jterlizzi@questel.com
1
Background
• In October 2007, the JFA pharmaceutical group, a
group of professional searchers from the
pharmaceutical industry in Japan, asked me to
compare two chemical structures searched in both
MMS and Marpat. They had received inconclusive
results when they ran the searches.
• The following two case studies conducted at that time
illustrate many of the similarities and differences in the
two systems. They show my search procedure,
results, and conclusions.
2
Background
• The Merged Markush Service (MMS), jointly produced
by Thomson and the French Patent Office (INPI) is a
database containing both Markush structures and
specific compounds from patents. It is based on the
Markush Darc system and the service is exclusively
hosted on Questel.
• Marpat, produced by Chemical Abstracts Service and
available only on STN, contains Markush structures from
patents. It can be searched with the REGISTRY file (for
specific structures) using the CASLINK cluster on STN.
3
Background
• Both MMS and Marpat are usually recommended for a
basic chemical structure search strategy covering
Markush structures in patents.
• Derwent’s Chemical Fragmentation Code system
(only available to Derwent Subscribers) can also yield
unique answers, but since it is not a graphical system
and was not requested by the JFA, it was not used for
this study.
4
Comparing MMS and Marpat
CASE 1:
The following chemical structure is from an EP patent
document published in 1987. In a JFA study, this document
was retrieved from MMS. It was not retrieved in Marpat.
Why not?
O
COOH
X
N
N
R1
Y
N
R
X: N or CR2
Y: N or CR2
R: C1-5 alkyl, C3-6 cycloalkyl
R1: halogen, amino, -N=CH-R2
R2: independently H, halogen, hydroxyl,
C1-5 straight or branched alkyl, or an optionally
substituted aromatic or heteroaromatic residue
5
O
COOH
X
N
N
Y
N
Case 1:The query was created in
MMS in the following way:
R
R1
X: N or CR2
Y: N or CR2
R: C1-5 alkyl, C3-6 cycloalkyl
R1: halogen, amino, -N=CH-R2
R2: independently H, halogen, hydroxyl,
C1-5 straight or branched alkyl, or an optionally
substituted aromatic or heteroaromatic residue
X = G1
G0
Y =G2
6
O
COOH
X
N
N
Y
N
R
R =G3
R1
X: N or CR2
Y: N or CR2
R: C1-5 alkyl, C3-6 cycloalkyl
R1: halogen, amino, -N=CH-R2
R2: independently H, halogen, hydroxyl,
C1-5 straight or branched alkyl, or an optionally
substituted aromatic or heteroaromatic residue
R1 =G4
7
A free site was put on the carbon in G1, G2 and G4 to cover R2
O
COOH
X
N
N
Y
N
X: N or CR2
Y: N or CR2
R: C1-5 alkyl, C3-6 cycloalkyl
R1: halogen, amino, -N=CH-R2
R2: independently H, halogen, hydroxyl,
C1-5 straight or branched alkyl, or an optionally
substituted aromatic or heteroaromatic residue
R
R1
8
Comparing MMS and Marpat
In MMS, the AA search resulted in 19 answers and no RX
candidates:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
CN = 87060014-01
CN = 90010268-01
CN = 97085103-01
CN = 8722-02601
CN = 8751-17501
CN = 9004-11601
CN = 9044-07401
CN = 9048-37001
CN = 9048-43801
CN = 9144-00701
CN = 9631-16901
CN = RAITC9
CN = RAITCA
CN = RAITCM
CN = RAITCN
CN = RAITCP
CN = RAITD3
CN = RAITD8
CN = RAJLWV
The first CN listed corresponds
to the EP document
AN
CN
PN
AP
PR
RL
PA
-
IC1 IC2 ET -
EAB -
PHCN-
87060014
87060014-01-N; 87060014-01-T
EP224121 - 19870603 [EP-224121]
EP86115667 19861112 [1986EP-0115667]
IT2288785 19851119 [1985IT-0022887]
IT2288885 19851119 [1985IT-0022888]
US4758567 - 19880719 [US4758567]
RORER ITALIANA S.p.A. / Via Valosa di Sopra, 9 / I-20052 S.Fruttuoso
di Monza (Milan) (IT)
ROTTAPHARM S.p.A. / Via Dandolo, 4 / I-21100 Varese (IT) (Updated
8825)
C07D-215/56
C07D-471/04; A61K-031/47; A61K-031/53; A61K-031/495
7- 4-amino-piperazinyl - or 7- 4-chloro-piperazinyl quinolinone and
azaquinolinone derivatives, a process for the preparation thereof and
pharmaceutical compositions containing them.
4-oxo-7-piperazino-(quinoline or azaquinoline)-3-carboxylic acid
derivatives. Process of preparation thereof. These compounds are
antibacterial agents
11 : INFECTION
08 : NEPHROLOGY, UROLOGY
9
Comparing MMS and Marpat
CASE 1 results in MMS:
• There were 12 unique patent records
retrieved – 3 from PHARM and 10 from
DWPI, with 1 overlap.
• The most recent record was US7256187
from August 2007.
• The EP record from the JFA study was
only in PHARM, since it was indexed in
the BACKF segment (Backfile)
10
Comparing MMS and Marpat
The query was created for Marpat:
O
CO H
2
G
G1=
@
N
G
@
@
N
1
Ak @
G2=
N
C
G
1
Cb
@
@
N
G
2
G3=
@
N
N
@
X @
3
11
Comparing MMS and Marpat
• There were not many differences in creating this query in
MMS and Marpat. Some differences were:
• G1 could be repeated in Marpat; you cannot repeat a G group in
•
MMS
Rings default to possible non-hydrogen substitution (see next
slide) whereas in MMS, no free sites were substituted on ring
O
@
CO H
2
G
C
@
N
G
N
1
Ak @
N
@
G
1
Cb
@
@
N
G
@
2
N
N
@
X @
3
12
Default is for
Non-Hydrogen Attachments
Searchers can choose to override
defaults in Marpat.
13
Some other differences in this query in MMS and Marpat were:
•Atom/Class in Marpat (translation in MMS) was set with CLASS
(equivalent to BT in MMS) on G group substituents. The MMS
structure defaults to equal translation.
Atom/Class Match Level
14
Comparing MMS and Marpat
Case 1 Results using CASLINK (Marpat/Registry/MarpatPrev):
• CASLINK search had 25 results.
• There were unique patent results in both MMS and Marpat for
this structure.
• Most recent US7256187 patent was only in MMS.
• The EP patent missing from JFA study was found in Marpat!
15
Comparing MMS and Marpat
Why didn’t the JFA study retrieve this result in
Marpat?
The JFA query used for Marpat was not as broad as
my query; therefore the EP patent was not
retrieved.
What were the possible differences in my query and
the JFA’s?
• Not sure, but it could have been that my query
allowed for substitution on the rings.
Why the difference in Marpat results and MMS?
• Since the Marpat query was broader than the MMS
query because of the open substitution, there
were a greater number of results in Marpat.
16
Comparing MMS and Marpat
CASE 1 Conclusions:
• Default levels in both systems must always be
taken into account. Atom/Class in Marpat
corresponds to Translation Level in MMS. Defaults
are very different in both systems, with Marpat
having more broader search defaults.
• Non- Hydrogen attachment defaults (Marpat) and
free sites (MMS) must also be taken into account.
• Unique results are often achieved in both
systems.
17
Comparing MMS and Marpat
CASE 2:
The following structure is from an US patent document published in
1990. Is there unique retrieval in MMS or Marpat in a freedom-tooperate search? (Do not take into account specific structures in
MMS or differences in patent coverage.)
R1
Z
R2
Y
X
R3
X: N or CH
Y: O, S, or NH
Z: O, S, or NH
R1: an unsubstituted carbocyclic or heterocyclic aromatic
group, or a carbocyclic or heterocyclic aromatic group
substituted with at least one lower alkyl, lower alkoxy,
halogen, lower alkylthio or nitro group
R2: C1-3 alkyl
R3: C1-3 alkyl
or R2 together with R3 may form a heterocyclic ring,
which includes at least one heteroatom selected from
O, N, or S.
18
Case 2 Using MMS
Original query
R1
Z
R2
Y
X
R3
X: N or CH
Y: O, S, or NH
Z: O, S, or NH
R1: an unsubstituted carbocyclic or
heterocyclic aromatic group, or
a carbocyclic or heterocyclic
aromatic group substituted with
at least one lower alkyl, lower
alkoxy, halogen, lower alkylthio
or nitro group
R2: C1-3 alkyl
R3: C1-3 alkyl
or R2 together with R3 may form a
heterocyclic ring, which
includes at least one heteroatom
selected from O, N, or S.
X = G1
19
R1
Case 2 Using MMS
Z
R2
Y
X
R3
X: N or CH
Y: O, S, or NH
Z: O, S, or NH
R1: an unsubstituted carbocyclic or
heterocyclic aromatic group, or a
carbocyclic or heterocyclic
aromatic group substituted with at
least one lower alkyl, lower alkoxy,
halogen, lower alkylthio or nitro
group
R2: C1-3 alkyl
R3: C1-3 alkyl
or R2 together with R3 may form a
heterocyclic ring, which includes at
least one heteroatom selected from
O, N, or S.
Y = G2
Z = G3
G2 and G3 are identical
20
R1
Case 2 Using MMS
Z
R2
Y
X
R3
X: N or CH
Y: O, S, or NH
Z: O, S, or NH
R1: an unsubstituted carbocyclic or
heterocyclic aromatic group, or a
carbocyclic or heterocyclic
aromatic group substituted with at
least one lower alkyl, lower alkoxy,
halogen, lower alkylthio or nitro
group
R2: C1-3 alkyl
R3: C1-3 alkyl
or R2 together with R3 may form a
heterocyclic ring, which includes at
least one heteroatom selected from
O, N, or S.
R1 = G4
5 free sites have been
applied to the superatoms
21
R1
Z
Case 2 Using MMS
R2 and R3 are substituted
with free sites
Because of MMS tautomer rules,
unspecified bonds are applied
R2
Y
X
R3
X: N or CH
Y: O, S, or NH
Z: O, S, or NH
R1: an unsubstituted carbocyclic or
heterocyclic aromatic group, or a
carbocyclic or heterocyclic
aromatic group substituted with at
least one lower alkyl, lower alkoxy,
halogen, lower alkylthio or nitro
group
R2: C1-3 alkyl
R3: C1-3 alkyl
or R2 together with R3 may form a
heterocyclic ring, which includes at
least one heteroatom selected from
O, N, or S.
22
Case 2 Using MMS
CASE 2 RESULTS
There was 1 answer in MMS; it was the 1990 US patent.
AN
- 1990-375408 [50]
TI - New penta:cyclic furo:benzoxazine derivs. having antimicrobial
and antitumour activity and useful as intermediates to known
tetra:cyclic antitumour cpds.
PN
AP
PR
PA
CN
- US4973693
A 19901127 DW1990-50 Eng *
AP: 1989US-0401746 19890901
- JP03223292
A 19911002 DW1991-46 Jpn
AP: 1990JP-0231832 19900901
- 1989US-0401746 19890901; 1990JP-0231832 19900901
- 1989US-0401746 19890901
- (FUJI) FUJISAWA PHARM CO LTD
- (RIPT) RI PATENTS INC
- (RICV) UNIV RICE
- 9050-34901-N 9050-34902-N
23
Case 2 Using Marpat
• The query was drawn similarly in Marpat, but there was one
•
problem. G groups in Marpat cannot be attached to more than
2 nodes and the system would not prepare the query. MMS
allows attachments to more than two nodes.
Since X has three attachments in the original query, a different
solution had to be sought.
R1
Original query
Z
R2
Y
X
R3
24
Original query R1
Case 2 Using Marpat
• The following query was
•
•
•
searched using
CASLINK.
The A atom (any atom
except H) was
substituted for the G
group. This allowed for
a three node
attachment.
Cy variable was used for
cyclic substitution at R1
All nodes were open for
substitution
@
S
@
@ N
Z
R2
Y
@
O
X
@
Cy
@
G
G
A
2
2
25
R3
Original query R1
Case 2 Using Marpat
Z
R2
Y
X
• Another problem –
This query had too
many iterations in
Marpat and would
not run. The system
limit was exceeded.
S L2 SSS SAM FILE=MARPAT
SAMPLE SEARCH INITIATED 16:22:46 FILE 'MARPAT'
SAMPLE SCREEN SEARCH COMPLETED 8566 TO
ITERATE
23.3% PROCESSED
2000 ITERATIONS
0 ANSWERS
INCOMPLETE SEARCH (SYSTEM LIMIT EXCEEDED)
SEARCH TIME: 00.00.02
FULL FILE PROJECTIONS:
PROJECTED ITERATIONS:
PROJECTED ANSWERS:
ONLINE **INCOMPLETE**
BATCH
**INCOMPLETE**
167698 TO
174942
0 TO
0
26
R3
Original query R1
Case 2 Using Marpat
Z
R2
Y
X
• Replacing A with C
and N with the
intention of running
two queries also
exceeded the
system limits!
• Another tactic was
approached. R2 and
R3 were substituted
with Ak (alkyl). I
received the
following message:
4S 3
6O 5
2N
Cy
1
G2
A
G2
Ak
Ak
G1
G2 [@1-@2],[@3-@4],[@5-@6]
STRUCTURE TOO LARGE - SEARCH ENDED
A structure in your query is too large. You may delete
attributes or atoms to reduce the size of the structure
and try again.
27
R3
Original query R1
Case 2 Using Marpat
Z
R2
Y
X
• The query shown was
entered:
• The bonds coming off the
fused ring were denoted as
ring bonds.
• Match level was ATOM for
all nodes except the CY
variable, which was CLASS.
• This query ran!
@
S
@
@ N
@
O
Cy
@
@
G
G
A
2
2
28
R3
Case 2 Using Marpat
• The full search
had 1 answer:
the US patent
from 1990!
• No other
answers were
obtained in
Marpat.
=> sss l1 full
FULL SEARCH INITIATED 17:13:38 FILE 'MARPAT'
FULL SCREEN SEARCH COMPLETED 2868 TO ITERATE
100.0% PROCESSED
2868 ITERATIONS
SEARCH TIME: 00.00.06
L3
1 SEA SSS FUL L1
=> dis bib
1 ANSWERS
L3
AN
TI
ANSWER 1 OF 1 MARPAT COPYRIGHT 2007 ACS on STN
114:164261 MARPAT Full-text
Preparation of novel pentacyclic compounds as antimicrobial and
antitumor
agents
IN
Goto, Shunsuke; Fukuyama, Tohru
PA
Japan
SO
U.S., 11 pp.
CODEN: USXXAM
DT
Patent
LA
English
FAN.CNT 1
PATENT NO.
KIND DATE
APPLICATION NO. DATE
--------------- ---- ---------------------- -------PI
US 4973693
A
19901127
US 1989-401746
19890901
JP 03223292
A
19911002
JP 1990-231832
19900901
PRAI US 1989-401746
19890901
29
Comparing MMS and Marpat
Conclusions:
• Is there unique retrieval for this case in Marpat or
MMS? No
• Complex queries may not always run in MMS or Marpat
but can usually be adjusted to run.
• Features such as bond attributes, unspecified bonds,
unspecified atoms, can often help in allowing a
structure to process.
• Speed and number of iterations will be different in each
system.
30
Comparing MMS and Marpat
Summary:
• Know the defaults in both MMS and Marpat, especially
translation and free sites (MMS) and class level and
non-hydrogen attachment (Marpat) defaults
• Familiarize yourself with bond normalization rules in
both systems – they are different!
• Learn the unique features of both systems (ring
isolation, subset searching, JOIN command, etc.)
• Use both systems for a comprehensive search!
31
HELP
MMS Documentation and Information
www.questel.com/mms
MarPat Documentation and Information:
www.cas.org
Questel Help Desk:
help@questel.com
STN Help Desk:
help@cas.org
32
Finally, much thanks to:
Sandy Burcham (Service Is Our Business)
Judy Philipsen (Philipsen Search Services)
Kyoko Kaji (Pfizer) for allowing me to use these JFA examples
Thank You!
33