Describing hydrogen-bonded structures; topology graphs, nodal symbols and connectivity tables, exemplified by five polymorphs of each of sulfathiazole and sulfapyridine

Background Structural systematics is the comparison of sets of chemically related crystal structures with the aim to establish and describe relevant similarities and relationships. An important topic in this context is the comparison of hydrogen-bonded structures (HBSs) and their representation by suitable descriptors. Results Three different description methods for HBSs are proposed, a graphical representation, a symbolic representation and connectivity tables. The most comprehensive description is provided by a modified graph of the underlying net topology of an HBS which contains information on the multiplicity of links, the directionality and chemical connectivity of hydrogen bonds and on symmetry relations. By contrast, the alternative symbolic representation is restricted to essential properties of an HBS, i.e. its dimensionality, topology type and selected connectivity characteristics of nodes. A comparison of their connectivity tables readily identifies differences and similarities between crystal structures with respect to the intermolecular interaction modes adopted by their functional groups. The application of these methods to the known polymorphs of sulfathiazole and sulfapyridine is demonstrated and it is shown that they enable the rationalisation of previously reported and intricate relationships. Conclusions The proposed methods facilitate the comprehensive description of the most important relevant aspects of an HBS, including its chemical connectivity, net topology and symmetry characteristics, and they represent a new way to recognise similarities and relationships in organic crystal structures. Graphical Abstract Graphical Representation of mixing of structures StzIV and StzV to give structure StzIII. Electronic supplementary material The online version of this article (doi:10.1186/s13065-014-0076-x) contains supplementary material, which is available to authorized users.


Background
In crystallographic studies, the structural systematics approach is used to increase our knowledge and understanding of the assembly of organic molecules into crystal structures [1][2][3][4][5][6][7][8][9][10]. Such investigations are carried out on polymorphs, solvates, salts and molecular complexes, in which a particular molecule can occur in different crystal structure environments, but also with families of compounds, whose molecular structures are very closely related, through small but systematic modifications to a parent molecule.
As the forces acting during the assembly of molecules into crystal structures are diverse, they should be considered in their entirety in any assessment. Consequently, the search for packing similarities, based only on geometrical considerations, has to be the cornerstone of any strategy for the comparison of groups of structures, and the XPac software [11] was developed in our laboratory for this purpose. However, structural patterns often reflect the presence of directed intermolecular interactions, exemplified by hydrogen bonding between conventional [12] donor and acceptor groups. The identification, description and comparison of such patterns could provide valuable pointers for progress in the area of crystal structure design and crystal growth. Even though geometrically similar structure patterns associated with hydrogen bonding are regularly identified as an integral part of an XPac study, the most fundamental property of a hydrogen-bonded structure (HBS) is its specific mode of intermolecular connections, and two molecular packing arrangements which agree in this characteristic are not necessarily also geometrically similar. Accordingly, a further strategy for identifying and describing structural similarities is required which enables the systematic comparison of different crystal structures with respect to their HBSs. Several useful methods for the description of certain aspects of an HBS have been proposed in the past, but none of these provides a comprehensive picture or is particularly suited for the structural systematics approach.

Hydrogen-bonded structuressome considerations
Methodologies for describing networks in crystal structures of organic compounds which are based on intermolecular interactions have been frequently discussed over many years. Indeed, this is a sub-topic in an area of much wider scope, interest and activity, which also concerns the topology of network structures in elemental solid forms, through simple mixed inorganic solids − silicates, zeolites and the like, and, more recently metalorganic frameworks (MOFS). Palin and Powell [13] first described an organic crystal as a network with molecules as nodes, linked by H-bonds. Wells further explored this idea, initially in tandem with his descriptions of inorganic solid state structures [14] and subsequently in more focussed studies [15], and developed a classification scheme based on molecules as single points, with connecting H-bonds as lines. Kuleshova and Zorky [16] proposed a symbolic graphical description which is based on the essential unit of the underlying net of the HBS. The aforementioned authors introduced the descriptor G n m k ð Þ, where the symbol G corresponds to the dimensionality of the HBS as either an island (i.e. finite cluster, I), chain (C), layer (L) or framework (F). The parameters n and m were originally defined by Wells [15], with n being the number of intermolecular H-bonds formed by a molecule and m the number of molecules to which the latter is joined, while k denotes the size of the essential ring of the net (for the whole crystal, the ratio between the number of H-bonds and the number of molecules is n/2).
The link between molecular networks and the classical infinite structures of inorganic mineral types became very clear when topologies of both types were compared, and the same network nomenclature was shown to be relevant for their classification [17]. In their 2005 monograph on networks in molecule based materials, Öhrstrom and Larson reviewed the terminology which is largely still in use today and gave a summary of the developmental thinking [18]. More recent work has focussed on enhanced software for analysing and producing graphical representations of networks, exemplified by the program TOPOS, developed by Blatov and collaborators [19,20] which is based on the Voronoi polyhedron partitioning approach to identify intermolecular contacts. The most recent developments have included capabilities to represent packing geometries also for molecular crystals which are not necessarily dependent on hydrogen bonding [21][22][23]. Here, the connection of molecular nodes, based on positive Voronoi contact is used to define the type of net.
A method for the representation of the more local characteristics of an HBS was proposed by Etter [24,25], who implicitly considered the actual chemistry behind the H-bondsthat is, which functional groups are bonded to which others? This led to a number of papers, also by other authors (e.g. Bernstein [26,27]), in which a graph-set approach was used to describe HBSs. This methodology has been widely adopted, in particular for the description of sub-components of HBSs, such as rings and chains. Due to its very specific nature this nomenclature has somewhat limited value for comparisons, e.g. the symbol R 2 2 8 ð Þ. describes a ring which is closed by two pairs of functional groups, and the 8 identifies the total number of atoms in the ring. By contrast, in the area of "nodal networks" the size of the ring is not significant, since topologically, these atoms are mainly spacers in a system in which a node (molecule) is linked to another node via two connectors, usually of the donor-acceptor type. Analogous molecules whose donor-acceptor connectors are separated by a different number of atoms may nevertheless form HBSs of the same topology.
In 1997 Desiraju [28] revisited some of the ideas quoted above, and also the work of Robertson [29], including the use of nodes and networks to describe packing and Hbonding in crystal structures, and suggested that the node connections were of greater significance than the nodes themselves. The possibilities offered by this approach and other methods cited above were subsequently explored by one of us [30]. The aim of the present work is the definition of a set of detailed, informative and useful descriptors for comparing HBSs, which answer to the questions listed below.
1. For a molecule involved in hydrogen bonding, which donor(s) are connected to which acceptor(s)? 2. What are the symmetry relationships between connected molecules? 3. What is/are the most informative way/s to represent the type and topology of the resulting array of connected molecules?
First, three different description methods for HBSs (graphical representation, symbolic representation and connectivity table) will be described. These methods will then be applied to the polymorphs of two closely related chemical compounds, sulfathiazole and sulfapyridine. The results obtained will be discussed in the context of both previous studies and alternative HBS description methods.

Results
Methods for the representation of an HBS a) Graphical representation Conventional hydrogen bonds [12], D − H•••A, are reliably formed between molecules with suitable functional groups that can serve as H-bond donors (D) and acceptors (A). In general, different sets of H/A combinations are possible, depending on the number of hydrogen atoms (H) that can be donated and the number of available acceptor sites. Each set of H/A combinations can lead to a variety of distinct HBSs, which are either finite (islands) or periodic in 1, 2 or 3 dimensions (chains, layers, frameworks). A suitable representation method should convey a maximum of information about an individual HBS and, at the same time, enable a comparison with other HBSs that are formed by the same molecule or by closely related molecules.
The underlying topology of an HBS is described by a net composed of nodes representing molecules and links representing intermolecular connections by D − H•••A bonds. Using the TOPOS software [31,32], a diagram of the net is readily obtained and its topology can be determined. The type of the net is denoted by the three-letter RCSR (Reticular Chemistry Structure Resource) symbol [33] or in case of a novel topology its point symbol [34] can be used instead. The topological net of an HBS exhibits the following additional and important characteristics: 1. it usually contains more than one crystallographically independent type of link; 2. a link can represent a one-point or multiple-point connection, i.e. two molecules are connected to one another by a single D − H•••A interaction or by multiple H-bonds; 3. a link between two chemically identical molecules can be associated with a crystallographic symmetry operation; in the case of a Z' > 1 structure, the two H-bonded molecules can display a handedness relationship and possibly also a local symmetry or a pseudo-symmetry relationship; 4. the H-bonds which define the links possess a chemical identity, i.e. links are associated with specific H/A combinations; 5. each H-bond possesses directionality, i.e. H → A.
Therefore, a comprehensive representation of an HBS can be achieved with a modified diagram of the topological net containing the following additional features: 1. the RCSR symbol or the point symbol of the net; 2. crystallographically independent molecules are represented as nodes of different colour; 3. individual H-bonds are indicated by arrows (H → A) placed next to a link; 4. the underlying H/A combination(s) and a symbol for the associated symmetry element (or handedness relationship) are given for each link in the legend of the diagram.
Crystallographic symmetry elements are indicated by their printed symbols as defined in the International Tables of Crystallography [35]. Molecular conformations are relevant when polymorphs are compared, specifically the possible occurrence of molecular chirality. The latter can be either real, or conformational, i.e. constrained as a result of conformational restrictions, or, when fundamentally achiral molecules adopt rigid conformations when "frozen" in the solid state "pseudo-chirality". Although pseudo-chirality is generally of no importance chemically, it is of considerable importance in crystal structure pattern descriptions. For a Z' = 1 structure, this type of conformational relationship is inherent in the crystallographic symmetry elements. For connections between chemically identical but crystallographically distinct molecules, a plus symbol (+) indicates that the latter have the same handedness and a minus (−) denotes that they are of the opposite handedness. Alternatively, the relevant symbols for known (local) pseudo-symmetry elements, enclosed in brackets, may be given. A cross (×) is used if no such relationship can be identified, in particular for connections between chemically distinct molecules.

b) HBS symbols / nodal symbols
The graphical representation provides the most comprehensive information about an HBS, but it may also be useful to encode just its most essential characteristics in a descriptor of the composition where D is a dimensionality symbol (C = chain, L = layer or F = framework), n the number of intermolecular Hbonds of a molecule, m the number of neighbours to which the latter is joined and p is the number of crystallographically independent molecules in the HBS. The expression {n m } i denotes the connectivity symbol n m for the i-th molecule (node) (i = 1, 2… p). T is a topology identifier of the net consisting of its point symbol [34], followed by the three-letter RCSR symbol [33] (if available), for example 4 2 .4 8 -pts, or another common name for the net. a Both the dimensionality (D) of the HBS and the number of connected neighbours per molecule (m) are given explicitly as a matter of convenience, even though these parameters can also be deduced from the net topology type (T).
In an extended version, this descriptor is followed by a colon symbol and the symmetry information for the links of each of the i = 1, 2… p crystallographically independent molecules, enclosed in square brackets, where o j is the relationship symbol for the symmetry or handedness relationship (see above) associated with the link to the j-th neighbour (j = 1, 2, …m . Interactions between chemically distinct molecules are denoted by a cross (×) and intramolecular Hbonds by the symbol S ("self"). The involvement of an H or A site in a certain number of H-bond interactions results in the same number of entries in the corresponding row (H) or column (A). For a given molecule the sum of all entries (except for the symbol S) in the rows associated with, plus the sum of all entries in the corresponding columns equals the number n of its intermolecular H-bonds. The analysis of a set of H-bond connectivity tables gives an overview of viable H/A combinations and shows preferred H/A pairings. However it is not possible to draw conclusions about the topology type of an HBS solely from the information contained in its connectivity table. A rather different type of matrix known as NIPMAT (nonbonded interaction pattern matrix) [36] for the rationalisation of all intermolecular interactions was previously proposed by Rowland [37].

Application to polymorphs of sulfathiazole a) General
Sulfathiazole (Stz), 4-amino-N-(1,3-thiazol-2-yl)benzenesulfonamide, is a classical polymorphic compound with known crystal structures of five polymorphs (denoted Stz-I, Stz-II, Stz-III, Stz-IV and Stz-V, in accordance with the pharmaceutical nomenclature [38]; Additional file 1: Table S1) and more than 100 solvates [38][39][40][41]. Blagden et al. described the HBSs of four polymorphs [39] using Etter's graph set methodology [24], and the packing relationships of five Stz forms were previously investigated by us [38]. The Stz molecule contains three D − H and four A sites ( Figure 1) which can engage in classical D − H•••A interactions. The Stz polymorphs family provide a very good example to demonstrate the advantages of our approach because their HBSs are among the most complex and diverse found in small organic molecules.

b) Definition of matching H and A sites
Sulfathiazole is an example of a pseudo-chiral system and indeed Blagden et al. [39] first coined the term pseudochirality in their analysis of Stz polymorphs. This pseudochirality originates from the freezing-in of the conformation adopted for the S-sulfonamido single bond, characterised by the corresponding torsion angle C − N − S − C. Moreover, all the known Stz polymorphs contain the imide tautomer with the proton on the ring nitrogen atom. The A and H sites were assigned according to the following rules ( Figure 1): 1. A1 is the imido N atom;

c) Polymorph Stz-IV
The polymorph IV has the monoclinic space group P2 1 / c and its asymmetric unit contains one molecule. Two parallel hydrogen bonds link neighbouring Stz molecules into a chain with two-fold screw symmetry. In this chain, each molecule is bonded via its amido group to the aniline N atom of a neighbouring molecule (H1•••A4) and also via the aniline H3 site to the sulfonyl site A2 (H3•••A2) of the same molecule. Additionally it forms H2•••A2 bonds to two other molecules to which it is related by translations along the a axis. These latter interactions involve the second aniline H atom (H2) and again the sulfonyl O atom A2. Neither the imido N atom A1 nor the sulfonyl site A3 are used, while the sulfonyl site A2 is employed in two H-bonds, as can be seen from the connectivity table in Figure 2.
Altogether, each molecule is engaged in six hydrogen bonds which connect to four neighbouring molecules, resulting in a layer structure with sql topology which lies parallel to (001) (Figure 3a) and whose symbol is  Altogether, each molecule is connected to four neighbours via six hydrogen bonds, resulting in an sql net parallel to (101) (Figure 3b), which has the same symbol,

e) Polymorph Stz-III
The crystal structure of form III has the space group symmetry P2 1 /c and contains two independent molecules, denoted A and B. Each A molecule donates two hydrogen bonds of the H1•••A4' and H3•••A2' types to molecule B and in turn it accepts two analogous hydrogen bonds from a second B-molecule, i.e. H1'•••A4 and H3'•••A2. Resulting from these parallel two-point connections, alternating A and B molecules of the same handedness are linked into an H-bonded chain parallel to [010]. Indeed, it was shown that this chain possesses a non-crystallographic 2 1 symmetry [38]. The H2 site of the aniline NH 2 group in molecule A is bonded to the sulfonyl O site A2 of a B molecule of the opposite handedness (H2•••A1'), and the A and B molecules involved in this particular interaction are related by a local glide-reflection operation [38]. The H2' site of molecule B is bonded to the sulfonyl O site A2 of an A-type molecule which is related to this B molecule by a local translation operation [38], i.e. both are of the same handedness.
Altogether, the D − H•••A interactions result in a sql net parallel to (10 . 2 ) in which the two molecule types are arranged in an alternating fashion along the links (Figure 3c). This net is uninodal, but the A and B sites differ in the local (glide-reflection plane or translation) symmetry element (and therefore in the kind of pseudochirality relationship) associated with two of their hydrogen bonds. Simultaneously   HBS is described by the symbol L8 6 .8 6 [3 6 .4 6 .5 3 -hxl] as both types of molecule are involved in eight hydrogen bonds to six neighbours. The equivalence of the A and B molecules is also indicated by the long symbol L8 6 .8 6 [3 6 .4 6  The interpenetration of the nov framework (A) by a single hcb layer (B) structure is depicted in Figure 4b, and the two nets are linked by an H2'•••A4 bond in which the NH 2 groups of A and B molecules of the same handedness serve as the H-bond donor and acceptor site, respectively. The resulting A + B framework contains an equal number of six-connected and four-connected nodes and has the point symbol (4 4 .5 3 .6 7 .7)(5 2 .6 4 ). Therefore, the long symbol for the complete H-bonded structure is F7 6   Spn-VI and Stz-I agree in the complete set of H-bond interactions between their respective type-A molecules, which result in a nov net (Figure 4a). The H-bond interactions between type-B molecules which generate the hcb net (Figure 4b) are also the same in Spn-VI and Stz-I. Therefore, the separate H-bonded A and B nets of Spn-VI have the same symbols as their counterparts in Stz-I (Table 1)  These results are consistent with the previously reported 3D packing similarity of Spn-VI and Stz-I [44], which also implies a similar mode of interpenetration of the nov-type framework by hcb layers. This relationship was confirmed by an XPac comparison, which gave a dissimilarity index of x = 12.7 and distance parameter of d = 0.66 Å (for details, see section 4.2 of the Additional file 1), consistent with geometric deviations due to the relatively large difference in molecular shape between Stz and Spn.
A fundamental difference between Stz-I and Spn-VI concerns the H2'•••A4 link between the hcb and nov nets in Stz-I (with H•••N and N•••N distances of 2.29 and 3.22 Å, respectively, between A and B molecules of the same handedness; see Additional file 1: Table S4) which is absent from Spn-VI ( Figure 5). Instead, the shortest intermolecular contact of the aniline H2' site in Spn-VI is of the H2'•••A3 type and significantly lon- The absence of the weak H2'•••A4 connection in Spn-VI may carry a penalty in stabilisation energy but may permit the larger Spn molecules to adopt the same 3D packing arrangement as those of Stz. The interpenetration of the H-bonded framework of A molecules by the layers of B molecules in Spn-VI (Figure 6d) 3 -hcb]). For completeness, the graphical and symbolic representations and connectivity tables for four other known polymorphs of Spn are given in Figure 6, Table 1 and Figure 2, respectively, and details of the H-bonded structures the assignment of H and A sites are given in the Additional file 1.

Relationships between the Stz polymorphs IV, V and III
The topology graphs and associated chemical and symmetry information for each of Stz-IV, Stz-V and Stz-III in Figure 3a, b and c immediately reveal the following relationships: 1. An sql net is formed in each case. Note that the three nets are drawn with their actual geometry and in matching orientations when strictly the depiction of the correct connectivity between the nodes would be sufficient, for example in a standard square grid. Thus, the correct relationships between the H-bonded structures Stz-III, Stz-V and Stz-IV can be established readily with the proposed method. By contrast, it would be very difficult if not impossible to deduce these relationships from the conventional graph-set analysis of the corresponding three HBSs provided in section 5 of the Additional file 1.
The information obtained from the topology graphs is consistent and complementary with the results of a previous packing analysis [38] showing that Stz-III has a molecular bilayer in common with each of Stz-IV and Stz-V. These two types of double layer are just stacks of the H-bonded ladder fragments within the sql net which Stz-III has in common with Stz-IV and Stz-V (Figure 3a, b and c). Accordingly, Stz-IV and Stz-V have a molecular monolayer in common. This is a stack of simple chain fragments which is based on a two-point connection and forms part of their respective HBS.
In the connectivity table for Stz-III (Figure 2 Figure 7 shows an alternative version of the connectivity tables of Figure 2, in which symmetry elements are replaced by symbols for handedness relations. These still reflect similarities between HBSs, albeit on a lower level. For example, the configuration of plus and minus symbols in the tables for Stz-III, −IV and -V reflects also their complex relationships discussed above. Likewise, matching entries in the tables for Stz-I and Spn-VI reflect the similarity of their HBSs. The alternative connectivity table for Stz-IV contains exclusively plus symbols, indicating that its HBS consists of homochiral molecules. On the other hand, the absence of plus signs in the tables for Spn-II and Spn-IV indicates that all H-bonds in these polymorphs connect molecules of the opposite handedness.

Comparison of the HBSs in polymorphs of Stz and Spn
The topology graphs of the separate nov and hcb nets of Spn-VI (not shown) are in complete agreement with those of Stz-I. The very close relationship between Stz-I and Spn-VI, which is also consistent with an earlier packing comparison, is also reflected in their connectivity tables and HBS symbols (Table 1 and Figure 2). Four-(Stz-III, −IV, −V), five-(Spn-II, −III, −IV, −V) or six-connected (Stz-II) nets are formed, with the exception of Stz-I (4,6-connected) and Spn-VI (3,5-connected). There are four framework structures (Stz-I, Spn-III, −IV, −VI) and six layer structures. Overall, the connectivity tables in Figure 2 indicate that Spn has a general preference for the formation of D1•••A1 interactions (four forms) which in all cases but one (Spn-V)

Conclusions
The objective to compare different HBSs and to identify relationships between them has led to a graphical solution which combines established concepts (i.e. the interpretation of an HBS as a net, determination and classification of topology) with specific characteristics of HBSs (a link is defined by one or more H-bonds, all of which possess a chemical identity as well as directionality; a homomolecular link is associated with a handedness relationship/symmetry operation; differentiation between nodes that are topologically equivalent but crystallographically distinct). By comparison, only selected information about an HBS can be deduced from the proposed HBS symbol (its topology and specific characteristics of nodes) and connectivity table (the chemical identity of all H-bonds) representation. The former is intended as a general HBS descriptor in printed texts while the latter facilitates the comparison of the connections present in different HBSs that are based on matching H-bond donor and acceptor functional groups.
Ultimately, the usefulness of the proposed methodologies will have to be tested by applying them to other sets of crystal structures, and this will also provide pointers to necessary adjustments of their setup. The examples in this report demonstrate that HBS analysis and the identification of packing similarity based on geometrical methods are complementary. We intend to explore this topic further with an analysis of the more than 100 solvate structures of sulfathiazole.

Experimental
Crystal structure data Crystal structure data from the Cambridge Structural Database [45] were used throughout (for details, see Additional file 1: Table S1). However, in the case of Spn-IV and Spn-V the HBS analysis was carried out with recalculated idealised positions of the NH 2 hydrogen atoms, and in the case of Spn-IV the NH hydrogen atom was also recalculated (for details, see sections 3.5 and 3.6 of the Additional file 1). Details of the H-bonds defining the HBSs are collected in Additional file 1: Tables S4-S13).

Determination, classification and visualisation of topology
The topologies of HBSs were determined and classified with the programs ADS and IsoTest of the TOPOS package [31] in the manner described by Barburin & Blatov [32]. The topology graphs for HBSs (Figures 3, 4 and 6) are based on nets drawn with the IsoCryst program of the TOPOS package [31].