Release notes for DBGEN and QGEN
These notes are taken from the History file which is distributed with the TPC-D soft appendix.
- Changes as of 5/11/00
- versions: TPCH 1.3.0, TPCR 1.3.0
- Corrected update range rollover after 1000 update segements
- Corrected problem in permute.c causing invalid substitutions in Q16
- Changes as of 10/11/99
- versions: TPCH 1.2.0a, TPCR 1.1.0a
- Corrected range setting of segmented updates that was causing extra file to be generated
- Porting corrections for DigUnix
- Changes as of 08/30/99
- versions: TPCH 1.2.0, TPCR 1.1.0
- reduced parameter substitution range for Q18
- added new option to specify location of dists file (-b)
- added DBGEN option to suppress all output (-q)
- Changes as of 08/16/99
- versions: TPCH 1.1.0a, TPCR 1.0.1e
- prevent "reuse" of original data in update files
- correction to lint target in makefile.suite
- removal of vestigal l_partkey predicate from 21.sql
- reorder lineitem/order join in q5
- removal of table aliases from 2.sql
- randomize seeding of qgen RNG to close bug 52
- correct possible round off error in segmented update files
- corrected soft copy answer set for Q22
- corrected percision of answer set for Q19
- Changes as of 07/08/99
- versions: TPCH 1.1.0, TPCR 1.0.1
- WORKLOAD must be set to either TPCH or TPCR in the makefile
- unneeded reference to part table removed from q21 template
- Changes as of 06/04/99
- version 1.0.1d
- Restarted version numbering to match specification revisions for
TPC-H and TPC-R
- Corrected answer set for for Q13
- Corrected parameter substitutions for Q16, Q17, Q19, Q20, Q21, Q22
- Corrected RNG initialization in qgen.c
- added adhoc.c adhoc.h to code base to support randomized data sets;
currently disabled
- replaced calls to UnifInt() row_stop with call to NthElement()
- Corrected a problem that caused small negative money values to
print as a positive value
- Simplification of PR_xxx macros
- QGEN building correct parameter logs again
******************
* NOTE NOTE NOTE *
******************
Below this line the file refers to TPC-D which was retired in favor of
TPC-H and TPC-R. Since the new speicifications are numbered from 1.0.0
the program version was reset.
******************
* NOTE NOTE NOTE *
******************
- Changes as of 01/05/99
- version 2.0.1
- added 1999 to the copyright notice
- corrected C++ compilation problem
- sub-select phrasing corrected in Q4, Q21, Q22
- added support for segmenting update files (contributed by Larry Kemp, HP)
- Changes as of 12/08/98
- version 2.0.0
- Removed permute.h from clean target in makefile.suite
- Changes as of 11/17/98
- version 2.0.0 Alpha 8
- corrected o_custkey overrun bug
- removed upper bound on -C command option
- added static permute.h to distribution to match the specification
- Changes as of 10/23/98
- version 2.0.0 Alpha 7
- removed references to DSS_SEED and SEED_TAG
- minor query template cleanup
- V2 answer sets added
- correction to hd_sparse for SF > 300
- added static declaration to row types in gen_tbl to fix update problem
- permuted params to Q22
- Changes as of 5/20/98
- version 2.0.0 Alpha6b
- removed trailing apostrophe from dists.dss nouns for Tandem loader
- corrected mk_sparse() problem with alpha6
- added 64b support for NCR/Metaware
- corrected generation of parent/child tables in parallel
- renamed ORDER table to ORDERS table
- revision of DBGEN synced with revision of 2.0 specification
- portability changes to process termination provided by John Matzka
- portability changes for Watcom C provided by Andrew Eisenberg
- standardized query template format
- queries now include a consistant header format
- Changes as of 4/28/98
- version 2.0.0 Alpha5
- NO RELEASE OF ALPHA 5 ; skipped to sync spec/DBGEN revision levels
- Changes as of 7 April 98
- version 2.0.0 Alpha4
- Query template corrections for Q9, Q12, Q15a, Q22
- Parallel generation of parent/child tables fixed
- Minor corrections to dists.dss
- Portability changes for HPUX
- Changes as of 3/24/98
- version 2.0.0 Alpha3
- include substitution parameters for Q22
- correct substitution parameters for Q16 under AIX
- include permute.h until unix/NT makefile fix
- correct orderkey generation
- Changes as of 3/20/98
- version 2.0.0 Alpha2
- correct runtime malloc error from bad INIT_HUGE macro
- improve pseudo text distribution in comments
- fix problem with parallelism of data gen
- re-enable generation of parent/child tables
- remove recombinaton code for parallel flat files
- Changes as of 3/11/98
- version 2.0.0 Alpha1
- removed the TIME table
- removed the need for seed files
- made 1GB the validation database size
- add pseudo text support in comments
- correct character selection in a_rnd()
- correct population of P_NAME
- removed unclaimed variants
- added new queries 18-22, replaced Q13
- Changes as of 2/6/98
- version 1.3.1
- Revised 64 bit support to clean up bcd2_bin()and mk_sparse()
- Add 64b support for NT
- Changes as of 12/31/97
- version 1.3.0
- support for seed generation > 1TB (data gen still to be tested)
- rework of 64b support
- added bcd support for subtraction, comparison, modulo
- added 1998 to the copyright notice
- clarified comments in dists.dss
- corrected substitution problem in Q11
- standardized fopen() error messages with OPEN_CHECK()
- introduced PATH_SEP in config.h to allow changes in path separators
- Changes as of 12/15/96
- version 1.2.0
- corrected typos in queries 8a, 8c, 8d, 11a, 12F and 14F, 17a
- added variant 15c
- defined MAX_SCALE and MIN_SCALE; issued error messages for SF >& 1000
since implementation is incomplete
- seed file generation can now be resumed with dbgen -R < n > ...
- corrected slight compile bug under Solaris 2.5.1
- documented compile problems under SunOS
- Changes as of 8/1/96
- version 1.1.0D
- included new variants for queries 8 and 15
- re-introduced answer sets in the source tree
- Changes as of 5/1/96
- version 1.1.0C
- unified version numbering of DBGEN and QGEN
- updated BUGS list
- removed FAQ from soft appendix; web site will keep the current
version of the FAQ
- added 1996 to the copyright notice
- corrected bug in PR_DATE macro; NO CHANGE TO DATA SET
- properly initialize param values for cleaner logging
- adjusted output format of Q11 partam to allow scaling to 1TB
- corrected typos in variant 14c
- corrected data type for YEAR in variant 8c
- corrected typos in variant 10a
- added variant 8d
- Changes as of 1/23/96
- qgen version 1.1.0B
- include support for ANSI semantics
- improved patch for seed sensetivity
- Changes as of 1/23/96
- updated BUGS list
- dbgen version 1.1.0A
- patch to limit BCD2 fields to 12 characters for columnar output
- qgen version 1.1.0A
- patch to fix the "unknown flag" problem
- patch to fix the seed sensetivity problem
- Changes as of 12/19/95
- updated BUGS list
- dbgen version 1.1.0
- upped default value of MAX_CHILDREN to 1000
- corrected naming of detail tables in incremental load
- corrected range delete output
- forced delete files to truncate existing files
- removed fixed size tables from seed generation
- corrected overflow problem with large scale seed generation
- allow date generation as MM-DD-YY based on config.h #define
- correct truncation problem with columnar output in PR_VSTR()
- added support for Windows NT
- added PLATFORM macro to makefile, removed platform defines from
config.h
- removed MAX_CHILDREN define from config.h (set to 1000 in dss.h)
- qgen version 1.1.0
- correct SET_OUTPUT macro to TDAT
- use %ld in output for q17; portability
- add support for SQLSERVER database dialect
- add support for SYBASE database dialect
- adjust parameter ranges for Q1, Q3, Q6
- add -T/-t option to usage summary
- added support for Windows NT
- Changes as of 09/01/95
- qgen version 1.0.1
- formalized version numbering
- -p now generates correct query permutations
- added separate verion number for qgen
- corrected Q3 substitution problem
- updated permissible range for Q10
- corrected rowcount_dflt and the MAX row indicator (-1)
- expanded param logging to include all possible parameters
- allowed qgen's -d option to be used at all scale factors
- made parameter substitution permutation-independent
- added qgen suppport for END_TRAN (-E) and DFLT_NUM (-N)
- correct handling of :n directive
- added more complete explanation of QGEN to README
- rename of random to rndm, for portability
- dbgen version 1.0.1
- formalized version numbering
- inclusion of SF=1 seed file
- correct typo in usage() update example
- patch to driver.c to allow correct updates
- documentation change to README to clarify seed/stage/update
intereaction
- corrected minor glitch in "open failed" error msg in print.c
- added missing line continuation to makefile.suite
- seed files are now based on scale factor and number of generators
- seed files now hold seeds for one "step" of a given build
- clean up of parallel load routines
- inclusion of faster seed generation routines from Susanne Englert
- removed the -E(xisting) option
- assure proper scaling of O_CUSTKEY
- corrected default update percentage
- proper handling of child tables with '-O f'
- removed seed files from the distribution
- modified rpb_routine() to limit contribution of partkey in
retailprice
- added '-S(tep)' option to allow multi-stage loads
- roll in of 32 bit speed_seed routines from Dick Shelton
- miscelaneous typo corrections in the documentation
- cleanup of usage output
- Changes as of 05/08/95
- version 1.0
- add Teradata defines to tpcd.h for QGEN
- add :c to query templates for database CONNECT syntax
- add examples of DBGEN and QGEN usage to README
- add -T option to qgen to allow time table usage
- query template names only requre .sql suffix, rest is arbitrary
- Changes as of 03/13/95
- version 9.1
- surround DBNAME with ifndef in config.h
- remove -DDBNAME from makefile.suite
- sync varchar handling with 9.1 draft
- Changes as of 02/21/95
- version 9.0a
- fixed bug in qgen that incorrectly included rnd.h
- included revised DDL with Changes for char/varchar and l_quantity
- updated DBGEN help message to include new single table options for
order/lineitem and part/partsupp
- included handling for multi-set seed files TPCDSEED.xxx
- generated seeds up through 400GB; headed to 1TB!
- ANSI lint cleanup; more needed
- UF2 now defaults to key lists; use "-O r" to generate key ranges
also note, this routine this routine does NOT use the BCD2_*
routines. As a result, it WILL fail if the keys being deleted
exceed 32 bits. Since this would require ~660 update iterations,
this seems an acceptable oversight
- Changes as of 01/19/95
- version 9.0
- allowed command line seeding of RNG for QGEN
- order and number of params in QGEN now matches
presentation in spec
- fixed bug in time table format of O_ORDERDATE
- changed l_QUANTITY to FLOAT in dss.ddl
- reworked QGEN options to be more useful
- allowed creation of sparse keys beyond 32 bits (for 1TB)
- removed unused '#ifdef' and associated code
- allowed independent generation of master/detail tables
(eg, order/lineitem)
- Changes as of 12/06/94
- version 8.6
- fixed renaming of flat files for child tables
- various documentation fixes
- added naming convention section to Porting.Notes
- added -DIBM flag to config.h
- synced up QGEN with draft 8.1
- Changes as of 10/25/94
- version 8.5a
- corrected bug in columnar output of pr_supp
- added pr_drange to generate a list of order keys to be
deleted instead of generating SQL
- added '-O d' to generate range delete as SQL
- updated default values for QGEN to sync with spec 8.1
- corrected MK_SPARSE to reflect groups of 8
- corrected a bug in o_orderstatus
- regenerated seed files for SF in [1,10]
- ANSI cleanup (primarily function declarations)
- Changes as of 10/11/94
- version 8.5
- remove deletes/inserts to other than order/lineitem
- increased cardinality for part.type part.container
- '-r' argument is now integer; percentage in basis points
- initial roll-in of new update scheme
- added BBB comments to supplier table
- Changes as of 9/27/94
- version 8.4
- all money calculations now use integer math. This should
bring everyone's data sets into exact aggreement.
- Changes as of 9/21/94
- version 8.3b
- fixed handling of MAX_STREAM
- added floor function to RPRICE bridge
- misc lint cleanup (type fixes, new prototypes, etc.)
- MONEY format becomes lf for DOS
- further cleanup of PR_VSTR and its length argument
- change to parameter generation for Q6 to allow for float
discount
- Changes as of 9/15/94
- version 8.3a
- isolated MONEY format for Unisys (Lf) using DOS
- make sure all arguments to MAKE_MONEY were double's
- rolled in NEW_PTEXT to allow Berni to experiment
- Changes as of 9/12/94
- version 8.3
- added -T n and -T r to usage to match getopt() and README
- changed PR_MONEY to remove leading blanks
- included revised DDL from Berni
- included some MVS portability fixes in re malloc.h
- cleaned up error messages in qgen and made #define ofp usage
universal
- additional DOS portability changes
- added {c,a}len to provide specific length for columnar
output of varchar
- added PR_VSTR to handle varchar printing under MVS
- fixed bit masking in a_rnd and cleaned up prototype match
with V_STR
- PR_MONEY now used %Lf
- added revised pseudo text under NEW_PTEXT ifdef for
experiments
- Changes as of 9/09/94
- version 8.2
- l_discount and l_tax are now fractional (per teleconference)
- money calculations moved to scaled integer math to clean up
answer sets
- changed PR_FLT() to PR_MONEY to clarify usage
- portability changes for SYBASE: dbname
- > db_name
STATUS
- > DBGEN_STATUS
- added nations2 to dists.dss to handle qgen needs for now
- reintroduced #ifndef DOS
- reintroduced U2200 define to control kill_load()
- broke out nation and region separately in -T option
- updated dss.ddl based on mail from Berni
- Changes as of 8/31/94
- version 8.1
- scaling for clerks needed to be 1000 (was 100)
- added qgen parameter for scale
- changed qgen parameter from s)tream to p)ermutation
- synced qgen paramter values with 8.0 spec
- corrected duplications in dists.dss
- Changes as of 8/24/94
- version 8.0
- added sparse keys to lineitem/order
- added varchar generation for comments/addresses
- added variable lineitems/orders
- removed ifdef for normalized code_tables
- included code for parameter generation and template->EQT
routines
- updated README and Porting.Notes to reflect QGEN
- included DDL and RI examples from Berni
- Changes as of 6/15/94
- version 7.0b (numbers now match spec revsion)
- rework of code tables to properly map nation/region; when
compiled with -DCODE_TABLES distributions are taken from
code.dss and two additional fields are generated for
customers and suppliers, [cs]_ncode and [cs]_rcode,
immediately following [cs]_region
- replaced ifdef's around DEAD_DATA with opposites. DEAD_DATA
is now the default
- worked through code to see that it conformed to 7.0
specification
- adjusted scale factors/rowcounts for 1 GB == sf1
- brought help message in line with current code
- fixed order per customer at 10
- make suppkey scalable in lineitem/partsupp
- Changes as of 4/25/94
- version 1.5
- added the customers with no orders; Compile with -DDEAD_DATA
to activate the change.
- added the code table for nation and region;
Compile with -DCODE_TABLES to activate the change.
- Changes as of 3/17/94
- version 1.41
- completed implementation of JULIAN_DAY after talks with Berni
- misc cleanup in usage/README files
- removed all tabs and capped line length at 75
- added -n option to allowing naming of inline-loaded database
- Changes as of 3/16/94
- version 1.4
- prottyped julian day/month for query re-write work. Compile
with -DJULIAN_DAY to enable
- removed gen_times() from driver.c
- added VMS ifdef to config.h to clean up fork/signal issues
- added ICL ifdef to config.h to clean up getopt() issues
- changed header file references to config.h from machine.h
- Changes as of 3/2/94
- version 1.31
- corrected format of C_NAME to match S_NAME and O_CLERK
- re-allowed fractional scale factors < 1 (updates not
contiguous)
- added DSS_CONFIG environemnt variable
- reworked read_dist() to look for DSS_DIST in DSS_CONFIG
- updated the README file
- Changes as of 2/16/94
- version 1.3
- added command line options for parallel load and data set
expansion
- changed dists.dss delimiter to | for portability
- limited scale factors to integer values
- added command line option for seed file generation
- added all seed files to distribution for SFs 1 - 10
- moved machine.h to config.h and added MAX_CHILDREN define
- added 'f' flag to options to allow renaming of output files
- added generation of SQL delete statements to match updates
(Note: updates are still single-threaded; -C is cleared
by -U)
- corrected field sizing in dsstypes.h typedefs to match v 6.4
- update percentage default set to 1%
- Changes as of 12/3/93
- version 1.2
- added command line option to adjust update percentage
- fixed update gneration for proper primary key ordering
- renamed UUSR/PRC to RUSSIA/CHINA in dists.dss
- cleaned up phone number generation to be consistant regard-
less of order of evaluation
- adjusted size of lineitem comment to bring data in line with
100 MB == SF=1
- Changes as of 10/15/93
- added command line option for update data creation
- miscelaneous porting and cleanup changes
- reworked table generation to allow reuse for updates
- added comment field to tdefs structure
- added load_state and store_state to sync data gen and
update gen
- Changes as of 7/26/93
- combined loader and header stubs in load_stubs.c
- separated Revision History (this file) from README
- simplified makefile
- removed redundancies from colors distribution
- added getopt() for portability
- created Porting.Notes
- adjusted scaling rules
- added help option to the command line
- Changes as of 2/26/93
- combined all typedefs in one header: dsstypes.h
- combined flat file generation in print.ec
- combined typedef population in build.ec
- added -P to control rowcnt scaling (P for percentage)
- added -D option for Direct data generation and added
appropriate hooks in tdefs[] structure
- added -F option for flat file generation
- reused -T option (use -P 0.1 to build test size database)
now accepts suboptions c,o,p,s for single table builds.
- dropped -M option (scaling is now by rowcount)
- added -O option for optional controls. Currently defined:
-O t
- generate optional time table a join fields in
order/lineitem
-O h
- generate headers for flat file output
-O m
- generate fixed column-length output
- removed dynamic memory allocation, redundant calls to
UnifInt, etc to improve performance
- Changes as of 1/12/92
- julian() changed to handle orders -> orderdate correctly
- rflag distributions corrected in dists.dss
- sea, gold removed from color distribution to clean up substring
problems
- part-> number and supplier-> adjusted for 1-based indexing
- time-> day changed to be day of month, not day of year
- t.week changed to be week in year, not day of week
- Changes as of 11/18/92
- checked line length and tab for transmission
- another chapter in the portability wars. added #include
"machine.h" to dss.h (which is included by everyone else). Any
machine particular porting changes should go here.
- fixed fixed-field formats to prevent double printing
- expanded PR_FLT formats to %010.2
- Changes as of 10/21/92
- added fixed format and column header handling; users of headers
will have to define the header functions to be called in
int (*tdefs.header)()
- Changes as of 10/09/92:
- added ansi prototypes and recompiled with gcc -ansi. users may
need to change the CC definition in the makefile and the contents
of CFLAGS to reflect their particular ansi compiler.
- replaced all int references with long
- replaced all float references with double
- found and fixed odate/julian problem TS mentioned in 10/09 phone
call
- Changes as of 9/09/92:
- Park/Miller random number generator included
- clerk scaling changed to 100 * scale
- parts.name always built from 5 selections from colors set
- test scaling changed to ~60MB (TEST_SCALING == 10)
- logarithmic scaling removed
- mfgcost removed and retail/supplier cost bounds adjusted
- agg_str memory leak fixed
- independent RNG streams on a per column basis
This is the revised data generator for DSS.
The rewrite tried to accomplish three things:
- identify and isolate
all the implicit assumptions about limits, bounds, ranges, distributions, etc.;
- standardize the way any given table was generated/
printed to ease understanding and maintenance;
- bring the generator
in line with the current work of the committee and the excellent spec
the Indira put together;
- provide an easy way to adjust distributions, string contents and to facilitate experimentation to get a
better idea of the impact of data population changes.
The files included are:
- driver.c
- main and the calling routines for the generator
- dist.c
- should really be named dss_util.c; misc routines
- customer.c
- generation and print routines for customer table
- orders.c
- "" "" order table
- parts.c
- "" "" parts/partsupp
- suppliers.c
- "" "" suppliers table
- time.c
- "" "" time table
- customer.h
- associate header files; contain structure
definitions
- dss.h
- dss.h holds the large number of assumptions and
- orders.h
- values that have been used as IFDEFs.
- parts.h
-
- suppliers.h
-
- time.h
-
- dists.dss
- string selections and weights; used to build
distributions
Running make will create an executable (using the compiler flags in
CFLAGS, the ld flags in LDFLAGS and the libraries in LIBS [-O, -s,
and -lm by default]) which will create flat files suitable for dbload.