Lancelot / test / ssb / dbgen / HISTORY
HISTORY
Raw
# @(#)HISTORY	2.1.8.3
Changes as of 10/11/99
   -- versions: TPCH 1.2.0a, TPCR 1.1.0a
   -- Correction to segmented updates that was causing extra file to be 
	  generated
   -- Porting changes for DigUnix
Changes as of 08/28/99
   -- versions: TPCH 1.2.0, TPCR 1.1.0
   -- reduced parameter substitution range for Q18
   -- added new option to specify location of dists file (-b)
   -- added DBGEN option to suppress all output (-q)
Changes as of 08/16/99
   -- versions: TPCH 1.1.0a, TPCR 1.0.1e
   -- prevent "reuse" of original data in update files
   -- correction to lint target in makefile.suite
   -- removal of vestigal l_partkey predicate from 21.sql
   -- reorder lineitem/order join in q5
   -- removal of table aliases from 2.sql
   -- randomize seeding of qgen RNG to close bug 52
   -- correct possible round off error in segmented update files
   -- corrected soft copy answer set for Q22
   -- corrected percision of answer set for Q19
Changes as of 07/08/99
   -- versions: TPCH 1.1.0, TPCR 1.0.1
   -- WORKLOAD must be set to either TPCH or TPCR in the makefile
   -- unneeded reference to part table removed from q21 template
Changes as of 06/04/99
   -- version 1.0.1d
   -- Restarted version numbering to match specification revisions for
	  TPC-H and TPC-R
   -- Corrected answer set for for Q13
   -- Corrected parameter substitutions for Q16, Q17, Q19, Q20, Q21, Q22
   -- Corrected RNG initialization in qgen.c
   -- added adhoc.c adhoc.h to code base to support randomized data sets;
	  currently disabled
   -- replaced calls to UnifInt() row_stop with call to NthElement()
   -- Corrected a problem that caused small negative money values to print as 
      a positive value
   -- Simplication of PR_xxx macros
   -- QGEN building correct parameter logs again

******************
* NOTE NOTE NOTE *
******************
Below this line the file refers to TPC-D which was retired in favor of 
TPC-H and TPC-R. Since the new speicifications are numbered  from 1.0.0
the program version was reset.
******************
* NOTE NOTE NOTE *
******************

Changes as of 01/05/99
   -- version 2.0.1
   -- added 1999 to the copyright notice
   -- corrected C++ compilation problem
   -- sub-select phrasing corrected in Q4, Q21, Q22
   -- added support for segmenting update files (contributed by Larry Kemp, HP)
Changes as of 12/08/98
   -- version 2.0.0
   -- removed permute.h from clean target in makefile
Changes as of 11/17/98
   -- version 2.0.0 Alpha 8
   -- corrected o_custkey overrun bug
   -- removed upper bound on -C command option
   -- added static permute.h to distribution to match the specification
Changes as of 10/23/98
   -- version 2.0.0 Alpha 7
   -- removed references to DSS_SEED and SEED_TAG
   -- minor query template cleanup
   -- V2 answer sets added
   -- correction to hd_sparse for SF > 300
   -- added static declaration to row types in gen_tbl to fix update problem
   -- permuted params to Q22
Changes as of 5/19/98
   -- version 2.0.0 Alpha6b
   -- removed trailing apostrophe from dists.dss nouns for Tandem loader
   -- corrected mk_sparse() problem with alpha6
   -- added 64b support for NCR/Metaware
   -- corrected revision problem with 2.0.0.6
Changes as of 5/7/98
   -- version 2.0.0 Alpha6
   -- corrected generation of parent/child tables in parallel
   -- renamed ORDER table to ORDERS table
   -- revision of DBGEN synced with revision of 2.0 specification
   -- portability changes to process termination provided by John Matzka
   -- portability changes for Watcom C provided by Andrew Eisenberg
   -- indentation of specifications/templates now matches
   -- queries now include a consistant header format
Changes as of 4/28/98
   -- version 2.0.0 Alpha5
   -- NO RELEASE OF ALPHA 5 ; skipped to sync spec/DBGEN revision levels
Changes as of 4/6/98
   -- version 2.0.0 Alpha4
   -- corrected parallel table generation
   -- minor corrections to query templates
   -- portability changes for HP
Changes as of 3/24/98
   -- version 2.0.0 Alpha3
   -- include substitution parameters for Q22
   -- correct substitution parameters for Q16 under AIX
   -- include permute.h until unix/NT makefile fix
   -- correct orderkey generation
Changes as of 3/20/98
   -- version 2.0.0 Alpha2
   -- correct runtime malloc error from bad INIT_HUGE macro
   -- improve pseudo text distribution in comments
   -- fix problem with parallelism of data gen
   -- re-enable generation of parent/child tables
   -- remove recombinaton code for parallel flat files
Changes as of 3/11/98
   -- version 2.0.0 Alpha1
   -- removed the TIME table
   -- removed the need for seed files
   -- made 1GB the validation database size
   -- add pseudo text support in comments
   -- correct character selection in a_rnd()
   -- correct population of P_NAME
   -- removed unclaimed variants
   -- added new queries 18-22, replaced Q13
Changes as of 2/6/98
   -- version 1.3.1
   -- Revised 64 bit support to clean up bcd2_bin()and mk_sparse()
   -- Add 64b support for NT
Changes as of 12/31/97
   -- version 1.3.0
   -- support for seed generation > 1TB (data gen still to be tested)
   -- rework of 64b support
   -- added bcd support for subtraction, comparison, modulo
   -- added 1998 to the copyright notice
   -- clarified comments in dists.dss
   -- corrected substitution problem in Q11
   -- standardized fopen() error messages with OPEN_CHECK()
   -- introduced PATH_SEP in config.h to allow changes in path separators
Changes as of 12/15/96
   -- version 1.2.0
   -- corrected typos in queries 8a, 8c, 8d, 11a, 12F and 14F, 17a
   -- added variant 15c
   -- defined MAX_SCALE and MIN_SCALE; issued error messages for SF > 1000
         since implementation is incomplete
   -- seed file generation can now be resumed with dbgen -R <n> ...
   -- corrected slight compile bug under Solaris 2.5.1
   -- documented compile problems under SunOS
Changes as of 8/1/96
   -- version 1.1.0D
   -- included new variants for queries 8 and 15
   -- re-introduced answer sets in the source tree
Changes as of 5/1/96
   -- version 1.1.0C
   -- unified version numbering of DBGEN and QGEN
   -- updated BUGS list
   -- removed FAQ from soft appendix; web site will keep the current 
           version of the FAQ
   -- added 1996 to the copyright notice
   -- corrected bug in PR_DATE macro; NO CHANGE TO DATA SET
   -- properly initialize param values for cleaner logging
   -- adjusted output format of Q11 partam to allow scaling to 1TB
   -- corrected typos in variant 14c
   -- corrected data type for YEAR in variant 8c
   -- corrected typos in variant 10a
   -- added variant 8d
Changes as of 1/23/96
   -- qgen version 1.1.0B
   -- include support for ANSI semantics
   -- improved patch for seed sensetivity
Changes as of 1/23/96
   -- updated BUGS list
   -- dbgen version 1.1.0A
   -- patch to limit BCD2 fields to 12 characters for columnar output
   -- qgen version 1.1.0A
   -- patch to fix the "unknown flag" problem
   -- patch to fix the seed sensetivity problem
Changes as of 12/19/95
   -- updated BUGS list
   -- dbgen version 1.1.0
   -- upped default value of MAX_CHILDREN to 1000
   -- corrected naming of detail tables in incremental load
   -- corrected range delete output
   -- forced delete files to truncate existing files
   -- removed fixed size tables from seed generation
   -- corrected overflow problem with large scale seed generation
   -- allow date generation as MM-DD-YY based on config.h #define
   -- correct truncation problem with columnar output in PR_VSTR()
   -- added support for Windows NT
   -- added PLATFORM macro to makefile, removed platform defines from
           config.h
   -- removed MAX_CHILDREN define from config.h (set to 1000 in dss.h)
   -- qgen version 1.1.0
   -- correct SET_OUTPUT macro to TDAT
   -- use %ld in output for q17; portability
   -- add support for SQLSERVER database dialect
   -- add support for SYBASE database dialect
   -- adjust parameter ranges for Q1, Q3, Q6
   -- add -T/-t option to usage summary
   -- added support for Windows NT
Changes as of 09/01/95
   -- qgen version 1.0.1
   -- formalized version numbering 
   -- -p now generates correct query permutations
   -- added separate verion number for qgen
   -- corrected Q3 substitution problem
   -- updated permissible range for Q10 
   -- corrected rowcount_dflt and the MAX row indicator (-1)
   -- expanded param logging to include all possible parameters
   -- allowed qgen's -d option to be used at all scale factors
   -- made parameter substitution permutation-independent
   -- added qgen suppport for END_TRAN (-E) and DFLT_NUM (-N)
   -- correct handling of :n directive
   -- added more complete explanation of QGEN to README
   -- rename of random to rndm, for portability
   -- dbgen version 1.0.1
   -- formalized version numbering 
   -- inclusion of SF=1 seed file
   -- correct typo in usage() update example
   -- patch to driver.c to allow correct updates
   -- documentation change to README to clarify seed/stage/update
           intereaction
   -- corrected minor glitch in "open failed" error msg in print.c
   -- added missing line continuation to makefile.suite
   -- seed files are now based on scale factor and number of generators
   -- seed files now hold seeds for one "step" of a given build
   -- clean up of parallel load routines
   -- inclusion of faster seed generation routines from Susanne Englert
   -- removed the -E(xisting) option
   -- assure proper scaling of O_CUSTKEY
   -- corrected default update percentage
   -- proper handling of child tables with '-O f'
   -- removed seed files from the distribution
   -- modified rpb_routine() to limit contribution of partkey in 
           retailprice
   -- added '-S(tep)' option to allow multi-stage loads
   -- roll in of 32 bit speed_seed routines from Dick Shelton
   -- miscelaneous typo corrections in the documentation
   -- cleanup of usage output
Changes as of 05/08/95
   -- version 1.0
   -- add Teradata defines to tpcd.h for QGEN
   -- add :c to query templates for database CONNECT syntax
   -- add examples of DBGEN and QGEN usage to README
   -- add -T option to qgen to allow time able usage
   -- query template names only requre .sql suffix, rest is arbitrary
Changes as of 03/13/95
   -- version 9.1
   -- surround DBNAME with ifndef in config.h
   -- remove -DDBNAME from makefile.suite
   -- sync varchar handling with 9.1 draft
Changes as of 02/21/95
   -- version 9.0a
   -- fixed bug in qgen that incorrectly included rnd.h
   -- included revised DDL with changes for char/varchar and l_quantity
   -- updated DBGEN help message to include new single table options for
           order/lineitem and part/partsupp
   -- included handling for multi-set seed files TPCDSEED.xxx
   -- generated seeds up through 400GB; headed to 1TB!
   -- ANSI lint cleanup; more needed
   -- UF2 now defaults to key lists; use "-O r" to generate key ranges
           also note, this routine this routine does NOT use the BCD2_* 
           routines. As a result, it WILL fail if the keys being deleted 
           exceed 32 bits. Since this would require ~660 update iterations, 
           this seems an acceptable oversight
Changes as of 01/19/95
   -- version 9.0
   -- allowed command line seeding of RNG for QGEN
   -- order and number of params in QGEN now matches 
         presentation in spec
   -- fixed bug in time table format of O_ORDERDATE
   -- changed l_QUANTITY to FLOAT in dss.ddl
   -- reworked QGEN options to be more useful
   -- allowed creation of sparse keys beyond 32 bits (for 1TB)
   -- removed unused '#ifdef' and associated code
   -- allowed independent generation of master/detail tables 
           (eg, order/lineitem)
Changes as of 12/06/94
   -- version 8.6
   -- fixed renaming of flat files for child tables
   -- various documentation fixes
   -- added naming convention section to Porting.Notes
   -- added -DIBM flag to config.h
   -- synced up QGEN with draft 8.1
Changes as of 10/25/94
   -- version 8.5a
   -- corrected bug in columnar output of pr_supp
   -- added pr_drange to generate a list of order keys to be 
           deleted instead of generating SQL
   -- added '-O d' to generate range delete as SQL
   -- updated default values for QGEN to sync with spec 8.1
   -- corrected MK_SPARSE to reflect groups of 8
   -- corrected a bug in o_orderstatus
   -- regenerated seed files for SF in [1,10]
   -- ANSI cleanup (primarily function declarations)
Changes as of 10/11/94
   -- version 8.5
   -- remove deletes/inserts to other than order/lineitem
   -- increased cardinality for part.type part.container
   -- '-r' argument is now integer; percentage in basis points
   -- initial roll-in of new update scheme
   -- added BBB comments to supplier table
Changes as of 9/27/94
   -- version 8.4
   -- all money calculations now use integer math. This should 
           bring everyone's data sets into exact aggreement.
Changes as of 9/21/94
   -- version 8.3b
   -- fixed handling of MAX_STREAM
   -- added floor function to RPRICE bridge
   -- misc lint cleanup (type fixes, new prototypes, etc.)
   -- MONEY format becomes lf for DOS
   -- further cleanup of PR_VSTR and its length argument
   -- change to parameter generation for Q6 to allow for float 
           discount
Changes as of 9/15/94
   -- version 8.3a
   -- isolated MONEY format for Unisys (Lf) using DOS
   -- make sure all arguments to MAKE_MONEY were double's
   -- rolled in NEW_PTEXT to allow Berni to experiment
Changes as of 9/12/94
   -- version 8.3
   -- added -T n and -T r to usage to match getopt() and README
   -- changed PR_MONEY to remove leading blanks
   -- included revised DDL from Berni
   -- included some MVS portability fixes in re malloc.h
   -- cleaned up error messages in qgen and made #define ofp usage
           universal
   -- additional DOS portability changes
   -- added {c,a}len to provide specific length for columnar 
           output of varchar
   -- added PR_VSTR to handle varchar printing under MVS
   -- fixed bit masking in a_rnd and cleaned up prototype match 
           with V_STR
   -- PR_MONEY now used %Lf
   -- added revised pseudo text under NEW_PTEXT ifdef for 
           experiments
Changes as of 9/09/94
   -- version 8.2
   -- l_discount and l_tax are now fractional (per teleconference)
   -- money calculations moved to scaled integer math to clean up 
           answer sets
   -- changed PR_FLT() to PR_MONEY to clarify usage
   -- portability changes for SYBASE: dbname --> db_name
           STATUS --> DBGEN_STATUS
   -- added nations2 to dists.dss to handle qgen needs for now
   -- reintroduced #ifndef DOS
   -- reintroduced U2200 define to control kill_load()
   -- broke out nation and region separately in -T option
   -- updated dss.ddl based on mail from Berni
Changes as of 8/31/94
   -- version 8.1
   -- scaling for clerks needed to be 1000 (was 100)
   -- added qgen parameter for scale
   -- changed qgen parameter from s)tream to p)ermutation
   -- synced qgen paramter values with 8.0 spec
   -- corrected duplications in dists.dss
Changes as of 8/24/94
   -- version 8.0
   -- added sparse keys to lineitem/order
   -- added varchar generation for comments/addresses
   -- added variable lineitems/orders
   -- removed ifdef for normalized code_tables
   -- included code for parameter generation and template->EQT 
           routines
   -- updated README and Porting.Notes to reflect QGEN
   -- included DDL and RI examples from Berni
Changes as of 6/15/94
   -- version 7.0b (numbers now match spec revsion)
   -- rework of code tables to properly map nation/region; when 
           compiled with -DCODE_TABLES distributions are taken from 
           code.dss and two additional fields are generated for 
           customers and suppliers, [cs]_ncode and [cs]_rcode, 
           immediately following [cs]_region
   -- replaced ifdef's around DEAD_DATA with opposites. DEAD_DATA 
           is now the default
   -- worked through code to see that it conformed to 7.0 
           specification
   -- adjusted scale factors/rowcounts for 1 GB == sf1
   -- brought help message in line with current code
   -- fixed order per customer at 10
   -- make suppkey scalable in lineitem/partsupp
Changes as of 4/25/94
   -- version 1.5
   -- added the customers with no orders; Compile with -DDEAD_DATA 
           to activate the change.
   -- added the code table for nation and region; 
           Compile with -DCODE_TABLES to activate the change.
Changes as of 3/17/94
   -- version 1.41
   -- completed implementation of JULIAN_DAY after talks with Berni
   -- misc cleanup in usage/README files
   -- removed all tabs and capped line length at 75
   -- added -n option to allowing naming of inline-loaded database
Changes as of 3/16/94
   -- version 1.4
   -- prottyped julian day/month for query re-write work. Compile 
           with -DJULIAN_DAY to enable
   -- removed gen_times() from driver.c
   -- added VMS ifdef to config.h to clean up fork/signal issues
   -- added ICL ifdef to config.h to clean up getopt() issues
   -- changed header file references to config.h from machine.h
Changes as of 3/2/94
   -- version 1.31
   -- corrected format of C_NAME to match S_NAME and O_CLERK
   -- re-allowed fractional scale factors < 1 (updates not 
           contiguous)
   -- added DSS_CONFIG environemnt variable
   -- reworked read_dist() to look for DSS_DIST in DSS_CONFIG
   -- updated the README file
Changes as of 2/16/94
   -- version 1.3
   -- added command line options for parallel load and data set 
           expansion
   -- changed dists.dss delimiter to | for portability
   -- limited scale factors to integer values
   -- added command line option for seed file generation
   -- added all seed files to distribution for SFs 1 - 10
   -- moved machine.h to config.h and added MAX_CHILDREN define
   -- added 'f' flag to options to allow renaming of output files
   -- added generation of SQL delete statements to match updates
           (Note: updates are still single-threaded; -C is cleared 
           by -U)
   -- corrected field sizing in dsstypes.h typedefs to match v 6.4
   -- update percentage default set to 1%
Changes as of 12/3/93
   -- version 1.2
   -- added command line option to adjust update percentage
   -- fixed update gneration for proper primary key ordering
   -- renamed UUSR/PRC to RUSSIA/CHINA in dists.dss
   -- cleaned up phone number generation to be consistant regard-
           less of order of evaluation
   -- adjusted size of lineitem comment to bring data in line with 
           100 MB == SF=1
Changes as of 10/15/93
   -- added command line option for update data creation
   -- miscelaneous porting and cleanup changes
   -- reworked table generation to allow reuse for updates
   -- added comment field to tdefs structure
   -- added load_state and store_state to sync data gen and 
           update gen
Changes as of 7/26/93
   -- combined loader and header stubs in load_stubs.c
   -- separated Revision History (this file) from README
   -- simplified makefile
   -- removed redundancies from colors distribution
   -- added getopt() for portability
   -- created Porting.Notes
   -- adjusted scaling rules
   -- added help option to the command line
Changes as of 2/26/93
   -- combined all typedefs in one header: dsstypes.h
   -- combined flat file generation in print.ec
   -- combined typedef population in build.ec
   -- added -P to control rowcnt scaling (P for percentage)
   -- added -D option for Direct data generation and added 
           appropriate hooks in tdefs[] structure
   -- added -F option for flat file generation
   -- reused -T option (use -P 0.1 to build test size database)
           now accepts suboptions c,o,p,s for single table builds.
   -- dropped -M option (scaling is now by rowcount)
   -- added -O option for optional controls. Currently defined:
           -O t -- generate optional time table a join fields in 
                   order/lineitem
           -O h -- generate headers for flat file output
           -O m -- generate fixed column-length output
   -- removed dynamic memory allocation, redundant calls to 
           UnifInt, etc to improve performance
Changes as of 1/12/92
   -- julian() changed to handle orders->orderdate correctly
   -- rflag distributions corrected in dists.dss
   -- sea, gold removed from color distribution to clean up substring 
      problems
   -- part->number and supplier-> adjusted for 1-based indexing
   -- time->day changed to be day of month, not day of year
   -- t.week changed to be week in year, not day of week
Changes as of 11/18/92
   -- checked line length and tab for transmission
   -- another chapter in the portability wars. added #include 
      "machine.h" to dss.h (which is included by everyone else). Any 
      machine particular porting changes should go here.
   -- fixed fixed-field formats to prevent double printing
   -- expanded PR_FLT formats to %010.2
Changes as of 10/21/92
   -- added fixed format and column header handling; users of headers 
      will have to define the header functions to be called in 
      int (*tdefs.header)()
Changes as of 10/09/92:
   -- added ansi prototypes and recompiled with gcc -ansi. users may 
      need to change the CC definition in the makefile and the contents 
      of CFLAGS to reflect their particular ansi compiler.
   -- replaced all int references with long
   -- replaced all float references with double
   -- found and fixed odate/julian problem TS mentioned in 10/09 phone 
      call

Changes as of 9/09/92:
   -- Park/Miller random number generator included
   -- clerk scaling changed to 100 * scale
   -- parts.name always built from 5 selections from colors set
   -- test scaling changed to ~60MB (TEST_SCALING == 10)
   -- logarithmic scaling removed
   -- mfgcost removed and retail/supplier cost bounds adjusted
   -- agg_str memory leak fixed
   -- independent RNG streams on a per column basis

This is the revised data generator for DSS. 

The rewrite tried to accomplish three things: (1) identify and isolate 
all the implicit assumptions about limits, bounds, ranges, distribu-
tions, etc.; (2) standardize the way any given table was generated/
printed to ease understanding and maintenance; (3) bring the generator 
in line with the current work of the committee and the excellent spec 
the Indira put together; (4) provide an easy way to adjust distribu-
tions, string contents and to facilitate experimentation to get a 
better idea of the impact of data population changes.

The files included are:

driver.c       ------- main and the calling routines for the generators
dist.c         ------- should really be named dss_util.c; misc routines
customer.c     ------- generation and print routines for customer table
orders.c       -------            ""             ""      order table
parts.c        -------            ""             ""      parts/partsupp 
suppliers.c    -------            ""             ""      suppliers table
time.c         -------            ""             ""      time table
customer.h     ------- associate header files; contain structure 
                       definitions
dss.h                  dss.h holds the large number of assumptions and
orders.h               values that have been used as IFDEFs.
parts.h  
suppliers.h
time.h   
dists.dss   ------- string selections and weights; used to build 
                    distributions

Running make will create an executable (using the compiler flags in 
CFLAGS, the ld flags in LDFLAGS and the libraries in LIBS [-O, -s, 
and -lm by default]) which will create flat files suitable for dbload.
t