Universitšt Hamburg
    Hamburg University - Chemistry - TMC - Stribeck - nonlin SearchDeutsche Version
POLYMER PHYSICS   IDF Analysis by Nonlinear Regression
Basics
arrowOverview
arrowModels
The Programs
arrowRequirements
arrowProcedure
arrowParameter file
arrowKeys
arrowDownload

Overview

Increasingly interface distribution functions (IDF) are computed from small-angle X-ray scattering (SAXS, USAXS) and interpreted. But the quantitative analysis of the IDFs by fitting of models is advancing rather slow. Here fitting programs for different models are presented and described.

The basic models

Using the presented programs domain thickness distributions can be extracted from the IDFs. The distributions are characterized by their position (center of gravity), their width and their skewness. The model, in general, is based on stacking statistics. Two different kinds of stacking statistics are common:

  1. Stacking of domain thicknesses ("Stacking model")
  2. Stacking of distances between crystallites ("distorted paracrystalline lattice model").

As was shown by me, each of these models can be unified with a third one ("homogeneous long period distribution").

The Programs

Requirements concerning computer and data

I offer executable programs for Linux and MS-Dos. The DOS-version runs in the DOS-boxes of Linux, Windows95, Windows98 and Windows Me. Under Windows95 and Windows98 the programs are quite slow or can only be started after COMMAND.COM is exchanged (The program work under the shell 4DOS).

The data supplied must represent the IDF and must be written in ASCII format. Here an extract from a data file (h-6.dat):

Ap8.8dz1.05Fl595smx.43 60512-6 idf
 1.40000E+0000  2.28419E+0000
 1.50000E+0000  2.38694E+0000
 1.60000E+0000  2.50201E+0000
 1.70000E+0000  2.62754E+0000
 1.80000E+0000  2.76114E+0000
 1.90000E+0000  2.89997E+0000
 2.00000E+0000  3.04085E+0000
 ...
The first line is a deliberate comment, which will be repeated in the output of the regression program. The following lines contain the curve data. The first column contains x-values (in units of nanometer), the second column contains the values of the IDF g1(x). The x-values must be equidistantly gridded. Negative x-values are not allowed. In order to accelerate the fitting process, every second point in the curve is ignored. The extrapolation of the resulting grid must contain the value x=0. Not more than 512 points are allowed. Commonly the file contains approx. 200 points.

Be careful not to supply IDF values which are too small or too big. If the x-values range from 0 to 100, the values of g1(x) should cover an interval of approx. the same order of magnitude - running poorly conditioned input data will not result in reasonable output.

My program topas generates suitable data files when the command #write is issued.

Regression procedure

The programs read data and parameter files. Then the initial simplex vertex is generated in parameter space and moved thereafter. Every improvement of the residual sum of squares (RSS) is reported on screen in the terminal window. If no improvement is achieved, only the counter is advanced.

If the program encounters its regular exit, the old parameter file is renamed. Under MS-DOS it gets the suffix .bak. Under Linux a second suffix .bak is appended. After this a new parameter file is written. It receives the values found, but an "annealed" Simplex vertex. Thus it is quite frequently possible to simply start the program once again without having edited the parameter file. The program is started repeatedly, until there is no more improvement.

After each regular program run the two files fit.dat and err.dat are written. Utilizing suitable prgrams, additional information on the quality of the fit can be extracted from these ASCII files.

Examples for program calls:
mr_stac h-6
The files h-6.par and h-6.dat are read and iterated. Output on fit quality is sparse. Under Linux please, read the file CON
mr_stack h-6 h-6.out
A complete protocol (with graphics) concerning the fit quality is placed in the ASCII-file h-6.out. The file can be sent directly to a HP-Laserjet printer.
mr_stack h-6 lpt1
Under MS-DOS the output is sent directly to a HP-Laserjet connected to lpt1. Under Linux the file lpt1 is generated.

In the complete protocol the asymptotic interval of confidence is reported with each parameter value. Moreover, the parameter correlation matrix is reported. If the matrix contains values > 0.96, a reduction of parameters in the model should be considered. But perhaps the starting values have been poor. If no intervals of confidence are reported, the input data are poor, inacceptable or poorly conditioned. Multiplication of the y-values by a factor may help to improve the matrix condition.

The parameter file

Each of the regression programs requires a parameter file. The program is started with the parameter values from this file.

Name: It is suitable to name the parameter file <exper>.par, if data are supplied in <exper>.dat. The data file h-6.dat, e.g., should be accompanied by a parametr file h-6.par, which might look as follows:

#10000 10
 7 0.4  6  0.3 0.3 0.3
 8 4    7  0.3 0.3 0.3
 1.000000E-004  1.000000E-004  1.000000E-004  1.000000E-004  1.000000E-004
 1.000000E-004  1.000000E-004  1.000000E-004  1.000000E-004  1.000000E-004
 1.000000E-004  1.000000E-004  1.000000E-006

This is a parameter file for the program mr_2stac, which fits two stacks from domains to the IDF. Syntax:

If the first character on the first line is a "#", the parameter file is a simplified one.

After this two numbers have to be supplied on the first line. The first number is the maximum iteration count. The second number determines, how big the initial Simplex vertex shall be blown up. The number is in percent with respect to the starting parameter values given thereafter.

The following numbers may be deliberately distributed on different lines. Now the starting values for the model parameters are given (in the order in which they are defined in the model function). Thus "7" designates the starting value for the weight of the first stack. The following two numbers are two average layer thicknesses (0.4 nm and 6 nm). Thereafter the three relative widths of the generating Gaussians are given. In almost any case it is a good choice to start with values of 0.3. The first relative width indicates the heterogeneiy of the stack when moved across the irradiated volume (or the skewness of the distribution functions, resp.). The two remaining widths are related to the generating Gaussians of the two layer thickness distributions.

Then a similar parameter set is given for the second stack. The parameter file is completed by a list of constraints. Here, e.g., it is specified that the minimum is found, when parameter variation is constrained to the 5th decimal and when the residual sum of squares (last number: 1.000000E-6) is constrained to variations in the7th decimal

Program control keys

Two keys can be used to stop the running program:

/
stops the program with error. The found parameters are not placed in the parameter file.
#
finishes the program in the same way, as if a minimum would have been found. The present parameter file becomes the backup file. The actual data replace the parameter file. A new, enlarged Simplex vertex is defined in the parameter file. Thus a repeated start of the program blows up the Simplex, resembling a forced step of "simulated annealing".

Download

Uncorrelated domains (Lamellae)

Name Parameters
mr_solo W, d1, sgH, sg1/d1 MS-DOS Linux
 
 
 

One stack

Name Parameters
mr_stac W, d1, d2, sgH, sg1/d1 sg2/d2 MS-DOS Linux
 
 
 

Two stacks

Name Parameters
mr_2stac W1, d1, d2, sgH, sg1/d1 sg2/d2, W2, d1, d2, sgH, sg1/d1 sg2/d2 MS-DOS Linux
 
 
 

One stack plus uncorrelated lamellaae

Name Parameters
mr_st1p W1, d1, d2, sgH, sg1/d1 sg2/d2, W2, d1, sg1/d1 MS-DOS Linux
 
 
 

Here it was assumed that the thicknesses of the uncorrelated lamellae vary according to a (symmetric) Gaussian function. Thus a parameter sgH is missing in the corresponding parameter set.

The most important helper program is extrafit (Linux, DOS). It extracts the model fit found from the file "fit.dat" generated in the latest run of a fitting program and writes it into an ASCII file.

    Hamburg University - Chemistry - TMC - Stribeck - nonlin SearchDeutsche Version
Last modification March 5, 2001 by Norbert Stribeck    Imprint