[Net 2000 Ltd. Home][Data Masker Home][Data Masker Manual][Data Masker FAQ]
Scrambling Data in the HR Database
The Data Masker software contains a set of masking rules for the sample HR schema supplied with the Oracle database. You are encouraged to add your own masking rules to this masking set. The first section of this page discusses the existing rules (as supplied with the installation) and the second part provides some hints of additional rules which might be added. An ER diagram for the HR schema is available.
Masking Rules in the Sample HR Masking Set
The Existing Rules
- If you wish to explore each rule further, load the masking set into the Data Masker software and double click on the rule with the mouse to launch it into editing mode. You can then explore the rules configuration in detail.
- Rule 01-0001
- This is a Rule Controller. Notice how the other rules are dependent on it. A Rule Controller defines the database on which the dependent rules will act. If you change the login information in the Rule Controller its rules will operate on the new database. It is up to you to ensure that the Rule Controller points at an appropriate database. All other types of masking rule must have a parent Rule Controller and every masking set must contain at least one Rule Controller.
- Rule 01-0002
- A Trigger Manager Rule on the HR schema. If update triggers are enabled on the table during masking operations, the effect can be a dramatic slow down in execution speed. In this example it was determined that it was not necessary for the triggers to fire during the masking process as they update history and security tables which will not be required in the masked test system. Rule 01-0002 disables the triggers and its companion, Trigger Manager Rule 99-0003, will re-enable the triggers again after the masking rules execute.
- Rule 01-0004
- A Substitution Rule on the EMPLOYEES table. This rule is a good example of how to configure a rule to simultaneously mask two columns in a single pass. In this example the FIRST_NAME and LAST_NAME columns are masked with the Male+Female First Names and Random Surnames datasets. Be sure to double click on this rule to see how the rule is configured.
- Rules 02-0005 and xx-0006
- These two Substitution Rules mask the PHONE_NUMBER column in the EMPLOYEES table. The first rule, 02-0005 just uses the User Defined Text to place a value of <Masked> in the column. Be sure to note the WHERE Clause on this rule. It only substitutes the text value where the PHONE_NUMBER is not already null. As a general principle, it is a bad idea to let a masking rule add data where there was none before.
Important Point:Rule xx-0006 uses a complex WHERE Clause to mask over the PHONE_NUMBER values set by rule 02-0005 and set them to values appropriate to employees located in the UK (example: 01527 559 131). Note carefully the usage of the DMO1 alias in the WHERE Clause in conjunction with the Alphanumeric Text dataset to produce the desired output value. Have a look at the ER diagram for the HR schema to see why such a complex WHERE statement was necessary - the country code is not located in the EMPLOYEES table and an intermediate join table was required.
Also note that rule xx-0006 is dependent on rule 02-0005 and will not execute until rule 02-0005 completes. If you do not understand the concept of rule dependencies it is highly recommended that you read the Rule Blocks and Dependencies help page and also view the Using Rule Blocks and Dependencies tutorial to better understand how to explicitly control the execution order of the masking rules. [local] [internet]. It would also be useful to review the discussion of rules 0005 and 0006 in the Scrambling Data in the Scott Schema help page.
- Rule 03-0007
- A Substitution Rule on the EMPLOYEES table. This rule uses the Random Email Addresses dataset to provide the substitution data which masks the EMAIL column.
- Rule 99-0003
- A companion Trigger Manager Rule for rule 01-0002. This rule re-enables the triggers again after the other masking rules complete. Note that this rule is in rule block 99. It is only the higher rule block value that causes it to execute after the other masking rules. If the rule block was changed to a value lower than other masking rules it would execute before those rules. If you do not understand the concept of rule blocks it is highly recommended that you read the Rule Blocks and Dependencies help page and also view the Using Rule Blocks and Dependencies tutorial to better understand how to explicitly control the execution order of the masking rules. [local] [internet].
- Below are some suggestions for additional rules which can be added to the HR database sample masking set. An ER diagram for HR is available to assist you in your analysis of the database structure. If you do add new masking rules, it is a good idea to save the modified masking set under a new name - that way if you upgrade the Data Masker software you will not overwrite your enhancements.
- Use a Substitution Rule and the User Specified Numbers dataset to set each value in the COMMMISSON_PCT column to a fixed sum of 0.15. Place a WHERE Clause on the rule so that values that are already NULL are not affected.
- A Substitution Rule would be appropriate on the SALARY column. Use the Number Variance dataset to adjust each value in the SALARY column to a value +/- 20% of the existing value. Place a WHERE Clause on the rule so that values that are already NULL are not affected.
- Use rule xx-0006 as an example and create a Substitution Rule on the PHONE_NUMBER column which creates appropriate North American style phone numbers for employees in the USA and Canada. Use the format (nnn) nnn nnnn. Use the pre-defined North American Telephone Number dataset or build it from the Alphanumeric Text or Formatted Numbers datasets.
- Rule 03-0007 masks the EMAIL column with a value from the Random Email Addresses dataset. A look at the original data indicates that the existing format was formatted like UPPER(<Initial><6DigitsLastName>)@companyname.com where "companyname" is hard coded text. Delete or disable rule 03-007 and use a Row-Internal rule to build an email address in the previous format for each non null email address. You will need to use Oracle SQL functions such as UPPER() and SUBSTR() to do this. Review the Row-Internal Sync Rule help page and Creating Row-Internal Synchronization Rule tutorial [local] [internet] if you need assistance with this. Also review the discussion of rule 0006 in the Scrambling Data in the Scott Schema help page for a similar example. Make sure this rule is configured (with a rule block or dependency) so that it executes after rule 01-0003.
- Convert the Row-Internal rule in the above example to replace the hard coded "companyname" text in the replacement data to a random word from the Dictionary Words dataset. This will build a wide variety of realistic looking email addresses.
- Try using the WHERE Clause techniques outlined above in rules 02-0005 and xx-0006 to mask the POSTAL_CODE field. There are specific replacement datasets for US zip codes and Canadian and UK postcodes. Set all other countries to a generic text field of 999999 to ensure they get masked. Take care not to enter new values if the existing postal_code is a null value.
- Examine the other columns in the LOCATIONS table and make decisions about which ones contain sensitive information and implement suitable Substitution Rules or Shuffle Rules for them. Hint, there are pre-defined datasets for street addresses and town names.
- Read about User Defined DataSets, make some of your own department names, and use them in a Substitution Rule on the DEPARTMENT_NAME column.
[Net 2000 Ltd. Home][Data Masker Home][Data Masker Manual][Data Masker FAQ]