MS Access Forum / New Users / November 2009
How would I remove all data from northwinds template.
|
|
Thread rating:  |
coffeeman - 28 Oct 2009 06:43 GMT How would I remove all data from northwinds template.
BruceM - 28 Oct 2009 12:32 GMT You could make a new blank database. Import the objects from Northwinds (File >> Get External Data >> Import). Navigate to Northwinds. Click the Options button, and select to import tables as Definition Only. Import the rest of the objects (queries, forms, etc.) too, if you want to have the same interface as in Northwinds.
>How would I remove all data from northwinds template. KenSheridan - 31 Oct 2009 19:32 GMT However, do note that the Customers table in Northwind is not properly normalized. The following is a simple summary of basic normalization, and asa an example includes a brief explanation of the flaws in the Customers table in Northwind:
"Normalization is the process of eliminating redundancy from a database, and involves decomposing a table into several related tables. In a relational database each table represents an entity type, e.g. Contacts, Companies, Cities, States etc. and each column in a table represents an attribute type of the entity type, e.g. ContactID, FirstName and LastName might be attribute types of Contacts and hence columns of a Contacts table. Its important that each attribute type must be specific to the entity type, so that each 'fact' is stored once only. In the jargon its said that the attribute type is 'functionally dependent' solely on the whole of the primary key of a table.
To relate tables a 'referencing' table will have a foreign key column which makes the link to the 'referenced' table, e.g. a Contacts table might have a CompanyID column as a foreign key, while a Companies table has a CompanyID column as its primary key. Consequently no data other than the CompanyID needs to be stored in a row in the Contacts table for all the company information for that contact to be known; its available via the relationship and can be returned in a query joining the two tables on the CompanyID columns.
Similarly the Companies table might have a CityID column and the Cities table a StateID column. If its an international database the States (or more generically Regions) table would have a CountryID referencing the primary key of a Countries table. So via the relationships, simply by entering (in reality this would be selected from a list of Companies in a combo box, not typed in) a CompanyID in the Contacts table the location of the contact's company is also known. Redundancy, and therefore repetitive data entry is avoided.
To see how a database is made up of related tables take a look at the sample Northwind database. Its not completely normalized in fact (deliberately so for the sake of simplicity) but illustrates the main principles of how tables representing entity types relate to each other. An example of its lack of proper normalization can be found in its Customers table. You'll see that this has City, Region and Country columns so we are told numerous times that São Paulo is in SP region (as is Resende) and that SP region is in Brazil. Not only does this require repetitive data entry, but more importantly it opens up the risk of inconsistent data, e.g. it would be perfectly possible to put São Paulo in California in one row and California in Ireland! Proper normalization as I described above would prevent this as the fact that São Paulo is in SP region would be stored only once in the database as would the fact that SP region is in Brazil and California is in the USA.
An example of what at first sight might seem to be redundancy, but in fact is not, can also be found in Northwind. The Products table and the OrderDetails table both have UnitPrice columns. It might be thought that the unit price of a product could always be looked up from the Products table, so its unnecessary in Order Details. However, the unit price of a product will change over time, but each order needs to retain the price in force at the time the order was created. Consequently a UnitPrice column is needed in both tables; that in products holds the current price and is used to get the value for that in Order Details (code in the ProductID control's AfterUpdate event procedure in the Order Details Subform does this), which then remains static when the current price (in products) changes. In each case UnitPrice is functionally dependent on the key of the table, so there is no redundancy"
By all means use Northwind as the basis for your application, but be aware of these flaws and their possible consequences. It would not be a difficult job to decompose the Customers table by creating separate Cities, Regions and Countries tables, but this would be best done before entering any data. You'd then need to amend as necessary any forms, queries or reports which use the Customers table.
For more on normalization see:
http://www.datamodel.org/NormalizationRules.html
Ken Sheridan Stafford, England
>How would I remove all data from northwinds template. BruceM - 02 Nov 2009 18:00 GMT The link is not available. There is a notice telling the owner to call a number. Perhaps domain name registration has expired or some such thing.
>However, do note that the Customers table in Northwind is not properly >normalized. The following is a simple summary of basic normalization, and [quoted text clipped - 71 lines] > >>How would I remove all data from northwinds template. KenSheridan - 02 Nov 2009 18:42 GMT So I see. It was there yesterday! I never post a link without checking first. There's always Wiki:
http://en.wikipedia.org/wiki/Database_normalization
or just Google 'normalization'; there's no shortage of sites!
Ken Sheridan Stafford, England
>The link is not available. There is a notice telling the owner to call a >number. Perhaps domain name registration has expired or some such thing. [quoted text clipped - 4 lines] >> >>>How would I remove all data from northwinds template. John_G - 03 Nov 2009 02:38 GMT Hi Ken -
I know this is a bit off-topic for the thread, but I disagree with you that the Northwind Customers table is not properly normalized, at least with respect to City, Region and Country. All of those are needed in that table to properly describe an address. You are assuming that there is ony ONE São Paulo (or any other city name) in the world, in which case you would be right. But there are probably many São Paulo's, so each São Paulo address must specify which São Paulo it is. An example closer to home, if someone says to me "I'm going to London" I have to ask them "London where - Canada or England?". Similarly, in Canada we can have the same city name in two or more provinces (Region). Now, I suppose that the combination of City + Region could be unique, but there is no guarantee of that, and indeed no requirement.
I think we have to leave the Customers table as it is.
Cheers!
John
>... An example of its lack of >proper normalization can be found in its Customers table. You'll see that [quoted text clipped - 6 lines] >Paulo is in SP region would be stored only once in the database as would the >fact that SP region is in Brazil and California is in the USA.
 Signature John Goddard E-Mail: jrgoddard AT cyberus DOT ca
John W. Vinson - 03 Nov 2009 05:04 GMT > Now, I suppose that the combination of City + Region >could be unique, but there is no guarantee of that, and indeed no requirement. There are two places named Los Alamos in New Mexico.
 Signature
John W. Vinson [MVP]
KenSheridan - 03 Nov 2009 12:26 GMT John:
Thanks for the comments, but I have to differ I'm afraid. I don't think that there can be any question that Northwind's Customers table is not adequately normalized.
Taking a couple of rows at random Great Lakes Food Market is in Eugene which is in Oregon which is in the USA. Hungry Coyote Import Store is in Elgin which is in Oregon which is in the USA. So we are told twice that Oregon is in the USA. That is redundancy and leaves the door wide open to inconsistent data. Its perfectly possible to update one of these by changing USA to UK, both incorrectly and inconsistently. Country is determined by Region, which is a non-key column, so there is a transitive functional dependency and the table is not normalized to Third Normal Form, the definition (from Date) of which is:
'Third Normal Form: A relvar is in 3NF if and only if its in 2NF and every non-key attribute is non-transitively dependent on the primary key.'
Very loosely speaking a table is the equivalent of the term relation, a table definition is the equivalent of the term relvar (relation variable), a column (aka field) is the equivalent of the term attribute, and a row (aka record) is the equivalent of the term tuple in the formal language of the relational model.
The table needs decomposing into Cities, Regions and Countries. As regards city names these can be legitimately duplicated, so the table should have a numeric CityID column referencing a surrogate CityID primary key of Cities. While 'regions' have distinct values in the USA (states), here (for addressing purposes the county is used as the regional unit here) and I assume in Canada, I'm not sure if this is true worldwide, so I'd play safe and use a surrogate RegionID key, but if the database covers only a single country a natural key could be used. Country names do have distinct values so a natural key would be fine.
For ways of handling this type of hierarchical data by means of correlated combo boxes while maintaining a normalized structure see my demo at:
http://community.netscape.com/n/pfx/forum.aspx?nav=libraryMessages&tsn=1&tid=236 26&webtag=ws-msdevapps
which uses the local administrative area of Civil Paris, District and County in my area.
If one wished to be relationally purist to an absurd level one could of course use multiple keys, but that would mean the key of a Cities table in Northwind would be made up of City, Region and Country columns, and Regions would have a key of Region and Country. This would allow the enforcement of referential integrity and avoid the potential inconsistencies to which Northwind is currently wide open, but it would be very cumbersome. In any event Northwind has no such tables. Anyone using Northwind as a model, who doesn't wish to decompose the Customers table would nevertheless be well advised to create and fill such tables, which is a trivial task, and create enforced relationships.
I actually came across an unusual example of how easily incorrect and inconsistent data can be entered only a few days ago, not in a database as such, but in the 1911 census returns. My wife, who's a professional genealogist, had printed off a copy of the census return for my great- grandfather's house as an example for her students. He'd entered the birthplace of my great-grandmother (the only non-Irish member of my family in that generation) as Cumberland, Scotland, whereas it is in fact in England (not far from the border, its true, but England all the same). Now if only he'd been using a database with well normalized tables, he'd have got it right! At a more prosaic level I once found in one database three versions of my own name as author of technical papers in my own field of work as a result of poor normalization.
Ken Sheridan Stafford, England
>Hi Ken - > [quoted text clipped - 21 lines] >>Paulo is in SP region would be stored only once in the database as would the >>fact that SP region is in Brazil and California is in the USA. John_G - 03 Nov 2009 14:12 GMT Hi Ken --
Sorry, I still don't agree. You are making assumptions that are not valid.
>Taking a couple of rows at random Great Lakes Food Market is in Eugene which >is in Oregon which is in the USA. Hungry Coyote Import Store is in Elgin >which is in Oregon which is in the USA. So we are told twice that Oregon is >in the USA. Not true. We only know that THAT Oregon is in the USA. The might be another "Oregon" region somewhere else.
>That is redundancy and leaves the door wide open to inconsistent >data. Its perfectly possible to update one of these by changing USA to UK, >both incorrectly and inconsistently. Only because you know that there is no Oregon in the UK. What if there were? It's a perfectly valid option (from a human point of view), but there is no data rule that prevents it.
>Country is determined by Region No, it's not. On what basis can you make this statement? First, you would have to know every region (whatever that is) in every country, and second, there would have to be a world-wide enforced rule (even more than a convention) that no country could duplicate a region name.
>city names these can be legitimately duplicated, so the table should have a >numeric CityID column referencing a surrogate CityID primary key of Cities. Which for a large database could number in the thousands. Would you really expect your poor users to deal with that mess?
>... He'd entered the >birthplace of my great-grandmother (the only non-Irish member of my family in >that generation) as Cumberland, Scotland, whereas it is in fact in England >(not far from the border, its true, but England all the same). Now if only >he'd been using a database with well normalized tables, he'd have got it >right! Again, only because you yourself happen to know that. What if there was a Cumberland in Scotland as well? Could easily be, and no data rules are violated - from a database theory viewpoint. For that particular application, yes, but that's not relevant.
Do you see what I'm getting at here? What might be true for a given application (e.g. only one country) might lend itself to your techniques, but from a general theoretical viewpoint it is not correct.
John
 Signature John Goddard E-Mail: jrgoddard AT cyberus DOT ca
KenSheridan - 03 Nov 2009 20:00 GMT John:
I think we'll have to agree to disagree on this. It seems to me your understanding of what Functional Dependency means is the problem. You are treating it as though it’s the individual attribute value which is the determinant, whereas it is in fact the attribute type. Its possible to determine that a table is not in 3NF irrespective of the values. This obviously has to be the case because there is no knowing what values may need to be inserted in the future.
Its not that 'Oregon' determines 'USA', but that the attribute type Region determines the attribute type Country, i.e. for every value of Region the value of Country is known; in Northwind this can't be assumed to be so hence the redundancy and the resulting risk of bad data where Region is a non-key column. Whether there is one Oregon worldwide or 6 is immaterial. Normalization to 3NF does not concern itself with the data per se, only the attribute types; apart from trivial examples like days of the week or months of the year the total possible legitimate values of a type are at any time unknown, but what is possible to say is whether the values must be distinct or not. This is why the table needs decomposing by projection on City, Region and Country, and the 'natural' key City needs replacing with a distinctly valued surrogate key so that, allowing for n cities of the same name, it does determine RegionID (the result of that projection is not normalized to 3NF of course, so needs further decomposition by projection on Region and Country in the same way).
Your argument is that because of the theoretical possibility of n Oregon's the table is normalized to 3NF. That this argument is based on a flawed understanding of what Functional Dependency means is easily demonstrated by a hypothetical update. Lets assume that in the row for the customer in Eugene is changed to UK. You argue that there may well be a region Oregon in the UK with a City Eugene, which is theoretically possible, but is not in fact the case, so the table does now contain inconsistent and incorrect data in the Region and Country columns in that we are told that there are Oregon regions in both countries, which is untrue.
Lets assume that there really is an Oregon region in the UK, but it does not contain a city Eugene. In this scenario the same update results in inconsistent and incorrect data in that we are told that the Oregon region in the UK contains a city Eugene, which is also untrue in our hypothetical scenario.
Also of course there is the possibility of a simple typo such as Orogon. Of the three versions of my name I mentioned, two were correct variations, one was incorrect, clearly resulting from a typo.
Decomposing the table not only results in a set of tables all in 3NF (and beyond in fact) by virtue of the non-key columns being functionally dependent on 'the key, the whole key and nothing but the key, so help me Codd', but also, and this is the real object of normalization, removes the possibility of the inconsistent and incorrect data which can result from redundancy. Customers references CityID in Cities, which in turn references RegionID in Regions, which in turn references Country in Countries. By using surrogate keys the 4 Staffords in the USA can be distinguished from the original one where I live. But its not possible to select a non-existent Stafford, Staffordshire, USA, as no row for such exists in Cities. A user could of course insert a new row of Staffordshire, USA in Regions and a new row for Stafford, Staffordshire in Cities (with a different CityID to the real Stafford, Staffordshire) but that would take a really perverse user ( I think we can assume that the US government is not going to create a new state of Staffordshire), whereas Stafford, Staffordshire, USA in the non-normalized Customers table could be very easily entered as a result of a simple typo or a moment's inattention by the user, or even possibly as a result of a level of geographical ignorance such as that exhibited by my great-grandfather.
Somehow I don't think you'll be convinced of the error of your ways, and I certainly won't be. So while these debates are entertaining any continuance would, I think be further pointless time wasting. The OP and any lurkers can judge for themselves.
Ken Sheridan Stafford, England
>Hi Ken -- > [quoted text clipped - 46 lines] > >John
|
|
|