SourceForge.net Logo

Version 1.0

 

By M. Haranczyk

Email: maharan@chem.univ.gda.pl

 

 

 

1.Introduction

 

ConGENER is a software package for combinatorial generation and characterization of molecular libraries containing substitution isomers, so called congeners. The molecules are built by substituting hydrogen atoms in the parent molecule with a combination of desired atoms. The ConGENER program is described in Sections 2-5. The scripts to run QM calculations are described in Section 6.

 

ConGENER was developed by Maciej Haranczyk at University of Gdansk as a part of PhD project. It is now available free of charge under GNU GPL license. ConGENER package is under constant development. If you need help or have suggestions&comments, please feel free to contact the author (email address provided above).

 

If you publish any results obtained with ConGENER, we kindly ask you to reference following publication:

 

M. Haranczyk, T. Puzyn and P. Sadowski, ConGENER – A Tool for Modeling of the Congeneric Sets of Environmental Pollutants. Sumitted to QSAR & Comb. Sci.

 

Author is not responsible for any damaged caused by this software. Use at your own risk.

 


You can download the latest version here.

2. Installation and execution

 

ConGENER has been written in standard C++. In order to use it, you have to download the source code and compile it. To compile under any common Linux/Unix distribution, just type:

g++ congener.cpp –o congener

 

In order to use the program, execute the „congener” binary. For example:

./congener

 

The program runs in the interactive mode asking user a number of questions about the geometry of the parent molecule and isomer space to be expanded. However, to facilitate running multiple jobs or restarting, all answers can be combined as one input file. In order use the passive mode and read the input file use:

./congener < input_file

The information how to prepare input data for ConGENER is described in the following sections of this manual.

 

3. Preparation of parent molecule geometry

 

ConGENER works with molecules represented as internal coordinates In Gaussian format.  Those Z-matrixes should have text strings to code variables (Bond lengths and angels) and the values should be provided at the end of the file in two blocks (Variables/Constants). The example of proper Z-matrix In Gaussian format is provided below:

C

C 1 r1

C 1 r2 2 v1

H 3 r3 2 v2 1 t1

 

r1 1.45

r2 1.45

v1 120.00

r3 1.08

v2 109.0

t1 180.00

 

In order to convert such Z-matrix into ConGENER input file, each block of data (molecule specification, variables and constants) has to be terminated with 0 (zero). Hydrogen atoms to be substituted with desired atom in the process of congeners generation has to be marked with * (star). Star has to be placed before H symbol and shouldn’t be separated with space or any other sign. The example of proper input file is provided below:

C

C 1 r1

C 1 r2 2 v1

*H 3 r3 2 v2 1 t1

0

r1 1.45

r2 1.45

v1 120.00

r3 1.08

v2 109.0

t1 180.00

0

 

 

REMARK:

* In the current version ConGENER does not work with z-matrixes containing constants. (The should be only one block with variables)

* Hydrogen atoms cannot be placed In the first three lines of z-matrix

 

REMARK 2:

*Order of hydrogen atoms in z-matrix is very important because ConGENER relies on this order when detecting equivalent isomers and generating proper names. When building the z-matrix of parent molecule: 1) the first hydrogen atom added to the molecule should be the one of the lowest position number; 2) when adding subsequent hydrogens atoms,  assume that the molecule is flat and you’re moving “on the edge” of the molecule  in clockwise (or counterclockwise) direction.

 

Example:

 

For dibenzofurane the proper ordering of hydrogen atoms In z-matrix should be:

1,2,3,4,6,7,8,9 or 9,8,7,6,4,3,2,1

For biphynyl:

2,3,4,5,6,6’,5’,4’,3’,2’ lub 2’,3’,4’,5’,6’,6,5,4,3,2

 

During the execution, ConGENER will generale all possibile congeners. However, having two identical, like for example 1,2-dichlorobiphenylene and 7,8-dichlorobiphenylene, only the first one will be saved as it uses hydrogen sites of higher priority (provided first) in z-matrix.

 

 

4. Question asked by program during execution

 

 Name of the input file ?

Provide filename containing molecule specification as described in point 3.

 

Number of hydrogen atoms to be substituted ?

Obvious.

 

Symbol of atom to be substituted with ?

Symbol of atom that will be used to substitute hydrogen atoms, exmaple Br lub Cl

 

String to same variable related to substitute atom bond lenght

For example, if one sets this string to „clx”, when hydrogen with Bond lenght defined by string hc10 will be substituted with Rother atom, the next Bond lenght will be named clxhc10.

 

Change of Bond lenght ??.

Value of Bond lenght (In Andrestrem) with will be addend to Bond lenght of hydrogen when it is substituted with desired atom.

 

Number of equal numbering schemes ?

Answering this question allows to remove duplicates. For example, for biphenylene there are 4 numbering schemes:

1,2,3,4,5,6,7,8 or 5,6,7,8,1,2,3,4 or 8,7,6,5,4,3,2,1 or 4,3,2,1,8,7,6,5

 

Number of numbering schemes In clockwise direction ?

As In example above, there are two numbering schemes, first starts with 1, the other with 5.

 

Analogous questions about counterclockwise direction…

 

Number of single bonds allowing for rotation of two fragments ? (in current version only 0 or 1)

 

(if answer to previous question is 1)

Enter last atom In first fragment

For example, In biphenyl the first fragment is 2’,3’,4’,5’,6’ , the second is 6,5,4,3,2 so the answer is 5 (5th atom is in 6’ position)

 

Enter part of filename used for congeners

 

For example, bph for biphynylu. All congeners will be saved to bph0.zmat, bph1.zmat etc.

 

Display names of congeners ?

 

Answer yes or no.

 

If answer is yes:

 

Enter name of the molecule ?

E.g biphenylene

 

Enter substituent name ?

e.g. bromo

 

Enter names of positions in order as hydrogen atoms are defined

E.g. for biphenyl 2,3,4,5,6,2’,3’,4’,5’,6’

 

Enter words to describe number of substituents :

Enter as many as the maximum number of substituents. One per line. Example:. mono, di, tri…

 

5. Examples

 

There are few examples provided together with the program. The examples of input files are in *.input files and examples of input geometries are in *.zmat files.

 

To generate congeners of biphenylene using provided input files, type:

 

./congener < biphenylene.input

 

 

 

6. Scripts

 

The ConGENER package is supplemented with a set of scripts that allows performing QM calculations with Mopac2007 program. The scripts can be easily modified to work with any other program. To perform calculations for a set of congeners you must execute scripts as follows:

  • Conv – converts all .zmat files into Mopac input files and creates “run” script to sequentially perform Mopac geometry optimization
  • Cutgeom – extract last geometries from Mopac geometry optimization
  • Analyze – check Mopac output files to make sure geometry optimization converged. If not, prepares new input files and new “rerun” script to execute Mopac
  • Proceed – one all geometry optimization are completed, this script run calculation of desriptors, each descriptor is calculated in different directory
  • Extractresults inputfile – Extract results and provide them in tabularized form. The input file should be a log file from ConGENER program (it contain geometry filenames and basic information abut congeners (eg. numer of substituents and IUPAC names). Note that the table with results will be saved in the same directory as inputfile.

 

Note: Before running scripts, make sure they are in the same directory as .zmat files generated with ConGENER (or set up correct PATH).