MS Access Forum / Queries / November 2005
Crosstab Pivot column problem
|
|
Thread rating:  |
Jt3mitch1 - 18 Nov 2005 18:47 GMT I've been racking my brains to figure out the best way to approach my problem.
I have test results I need to organize:
ID date labnumber(unique) result 1 10/01/01 1 15.0 1 10/01/01 2 7.0 1 11/01/01 3 0.0 1 11/01/01 4 0.0 1 11/01/01 5 7.0 2 12/01/02 6 15.0 2 12/01/02 7 1.0 2 01/01/03 8 1.0 2 01/01/03 9 2.5
I'm trying to use a crosstab query to convert data to:
ID date rank1 rank2 rank3 1 10/01/01 7.0 15.0 1 11/01/01 0.0 0.0 7.0 2 12/01/02 1.0 15.0 2 01/01/03 1.0 2.5
I'll have up to 40 results for each ID. Too many different results in the database to use results as column headings. I need to figure out how to rank the results and have this rank as the column heading.
I'm bad at VBA, so after checking through newsgroups, I saw a post by Steve Dassin that looked promising. This is a general idea of what I'm trying to adjust to meet my needs:
TRANSFORM Max(result) AS data SELECT ID, date, DCount("*","query1","(" & "[ID]='"&ID&"'" &")" & "AND" &"(" & "[result]<"&result & "OR" & "(" & "[result]="&result & "AND" & "[labnumber]<"&labnumber & ")" & ")" ) + 1 AS alias, Count(data) AS cnt, FROM query1 GROUP BY ID, date, PIVOT DCount("*","query1","(" & "[ID]='"&ID&"'" &")" & "AND" &"(" & "[result]<"&result & "OR" & "(" & "[result]="&result & "AND" & "[labnumber]<"&labnumber & ")" & ")" ) + 1;
Thanks for any assistance,
JT
Tom Ellison - 19 Nov 2005 00:03 GMT Dear JT:
Since you only want the top 2 unique values of [result] for each ID, you can eliminate duplications:
SELECT DISTINCT ID, [date], result FROM query1
Make this query2.
Building up from this, you only care about the top 2 for each ID:
SELECT ID, [date], result FROM query2 Q WHERE (SELECT COUNT(*) FROM query2 QA WHERE QA.ID = Q.ID AND QA.result > Q.result) < 2
I'm expecting this to give you:
ID date result 1 10/01/01 15.0 1 10/01/01 7.0 2 12/01/02 15.0 2 01/01/03 2.5
Make this query3.
Next, we can add a rank column and then crosstab this, or we can just subquery it (replacing the crosstab).
However, before continuing, please consider what you'd want from this data:
ID date labnumber(unique) result 1 10/01/01 1 15.0 1 10/01/01 2 7.0 1 11/01/01 3 0.0 1 11/01/01 4 0.0 1 11/02/01 5 7.0
All I've done is change the date of the last row. Now, when you report the result = 7 in the Rank2 column, what date do you want to show for it?
Finally, ignoring the above problem for the moment, you can produce your final result:
SELECT DISTINCT ID, date, (SELECT MAX(result) FROM query3 Q3A WHERE Q3A.ID = Q3.ID GROUP BY ID) Rank1, (SELECT MIN(result) FROM query3 Q3A WHERE Q3A.ID = Q3.ID GROUP BY ID) Rank2 FROM query3 Q3
Please let me know if this helps, and whether I can be of any further assistance.
Tom Ellison
> I've been racking my brains to figure out the best way to approach my > problem. [quoted text clipped - 50 lines] > > JT Jt3mitch1 - 19 Nov 2005 06:40 GMT Tom, Thanks for taking a look at my problem.
I'm basically trying to convert rows in a query (specifically results field) to columns (based on id by date as rows). My attempt at using crosstabs includes a lack of access knowledge.
I'll want all the results, not just the min and max to be listed in the columns (1stresult, 2ndresult, 3rdresult....40thresult). I thought I could use rank as the column headings, but now realize that having more than one result with the same value breaks the rank idea. Each row in my query1 is unique by labnumber
Id Date Labnum Result 1 10/01/01 1 15.0 1 10/01/01 2 7.0 1 11/01/01 3 0.0 1 11/01/01 4 0.0 1 11/01/01 5 7.0 2 12/01/02 6 15.0 2 12/01/02 7 1.0 2 01/01/03 8 1.0 2 01/01/03 9 2.5
For Query2, I originally thought I could rank the results for each id and date from lowest to hightest (up to 40 results for an ID on any given date):
ID date labnum result Rank 1 10/1/2001 2 7 1 1 10/1/2001 1 15 2 1 11/1/2001 4 0 1 1 11/1/2001 3 0 2 1 11/1/2001 5 7 3 <3 results for ID#1 on 11/1/2001 2 12/1/2002 7 1 1 2 12/1/2002 6 15 2 2 1/1/2003 8 1 1 2 1/1/2003 9 2 2
Query3 would convert the rank values to column headings and I'd list the count of results per ID and date:
ID date Count Rank1 Rank2 Rank3 up to . . Rank 40 1 10/1/2001 2 7 15 1 11/1/2001 3 0 0 7 2 12/1/2002 2 1 15 2 1/1/2003 2 1 2
My ultimate goal was to get the count of results for each id on a given date. Then I would identify which result was the 90th percentile based on the count. 90th percentile="rank" & roundup(0.9*count)
I'm sure this is the long way around, but I couldn't figure out how to get 90th percentile (for each id by date) straight from query1.
I don't know if I've made it more clear or more complicated.
Thanks again,
JT
>"Tom Ellison" wrote: > Dear JT: [quoted text clipped - 57 lines] > > Tom Ellison Tom Ellison - 20 Nov 2005 03:58 GMT Dear JT:
It is not necessarily the case that you cannot rank within multiple values. If the values are A, B, B, and C, the rank can be 1, 2, 2, 4. Your could also ignore duplicates, resulting in values A, B, and C having ranks 1, 2, and 3. Either is readily possible.
Are you wanting to find the 90th percentile of all the results in a set which is a single Id, or a combination of Id and date? Do you want the query to simply provide this result, or do you want up to 40 Ranks displayed?
I don't see part of what you show as your final (Query 3) results. I understand the ID, date, and Count columns. I understand the 7 being in the Rank1 column. But what are the two zeros just before that?
I'm guessing a solution might be to have a 2 column "Crosstab Cell." In this "cell" you could see both the Result value and a count of how many of those there are. When there are multiple Results at the same rank, you would see this information in that "cell". While this cannot be done in a crosstab (it limits you to a single value in each cell), it can be done using another technique.
If this is desirable, then it must be the case that you do not want to ignore duplicates, and you want to group by ID and date together.
So, there are several avenues along which we can proceed.
- ignore duplicates or not - show all ranks or just return the 90th percentile
Please let me know how to proceed. Please also create a new sample data set that can be used to illustrate and test this more complex set of requirements.
Tom Ellison
> Tom, > Thanks for taking a look at my problem. [quoted text clipped - 125 lines] >> >> Tom Ellison Jt3mitch1 - 21 Nov 2005 04:50 GMT Tom,
My use of the term "rank" probably confuses the issue. I was using it to identify the results listed in ascending order (for each id and date combination), with the lowest at the lowest "rank1" and the highest at the highest "rankXX", while including duplicates. I got the 0 results from ID#1 on 11/01/01 and had the first 0 listed as rank 1 and the second 0 listed as rank2.
Because I already have the results individually printed in a report (sorted by ID and date), I could drop the ranking issue and focus on just returning the 90th percentile (for each id and date combination) from the original query.
Each row in my query1 is unique by labnumber
Id_____Date_____ Labnum_____Result 1_____10/01/01_____1_____12.0 1_____10/01/01_____2_____15.0 1_____10/01/01_____3_____10.0 1_____10/01/01_____4_____9.0 1_____10/01/01_____5_____6.0 1_____10/01/01_____9_____8.0 1_____10/01/01_____10_____4.0 1_____10/01/01_____11_____3.0 1_____10/01/01_____12_____0.0 1_____10/01/01_____13_____0.0 1_____11/01/01_____14_____22.0 1_____11/01/01_____15_____17.0 1_____11/01/01_____16_____14.0 1_____11/01/01_____17_____16.0 1_____11/01/01_____18_____10.0 1_____11/01/01_____19_____9.0 1_____11/01/01_____20_____9.0 1_____11/01/01_____30_____8.0 1_____11/01/01_____31_____4.0 1_____11/01/01_____32_____0.0 1_____11/01/01_____33_____0.0 2_____01/01/03_____34_____20.0 2_____01/01/03_____35_____18.5 2_____01/01/03_____36_____17.5 2_____01/01/03_____38_____16.5 2_____01/01/03_____39_____5.5 2_____01/01/03_____40_____3.5 2_____01/01/03_____42_____4.5 2_____01/01/03_____43_____2.5 2_____01/01/03_____44_____1.0 2_____01/01/03_____45_____1.0
The final query would include a count of results for each id/date combination. The 90th percentile result would be a specific result (roundup(0.9*count)) starting from lowest to highest, ascending. I round it because I may have odd result counts.
ID_____date______count_____90th percentile result 1_____10/01/01_____10_____12.0 <-roundup (0.9*10)=9th result 1_____11/01/01_____11_____17.0 <-roundup (0.9*11)=10th result 2_____01/01/03_____10_____18.5 <-roundup (0.9*10)=9th result
I imagine I would have query2 give the count and calculate which result would be the 90th percentile.
SELECT query1.ID, query1.date, round(0.9*Count([result])) AS 90th FROM query1 GROUP BY query1.ID, query1.date;
A query3 would then pull the 90th percentile result from the original query1. All I need is to figure out how to query the "9th highest" or "10th highest" result (based on query2 for each Id/date combination). I think I'll have to rank them as you suggested with multiple values ranked (eg. A,B,B,C..) as (1,2,2,4..)
Getting close, thanks again, JT
> Dear JT: > [quoted text clipped - 32 lines] > > Tom Ellison Tom Ellison - 21 Nov 2005 05:28 GMT Dear JT:
I believe it will be possible to calculate the 90th percentile rank number as you suggest. Now there is a problem. Let's say there are 12 results. The 90th percentile rank would then be the 11th highest. However, there may not be an 11th highest. If there is a two way tie for 10th place, then there will be no 11th place. However, there WILL be 2 10th place results. Now, while it is possible to return just the Result value of the 10th place result in this case, it will not be possible to return the Date and Labnum of this.
It would be possible to use Labnum as a secondary ranking value. Being unique, there would then never be any ties.
I'm not really quite up to the task at the moment, but if you'll carefully consider this and reply I'll try to tackle it in the morning when I'm fresh. Please specifically answer whether you need only the Result value, or do you need Date and/or Labnum. If you look at this carefully, choosing between two tied rows with the same Result value would seem to be a rather arbitrary thing to do. So, given that you have duplicate Result values, showing which Labnum and Date may not be realistic, since you must arbitrarily choose one when there is a tie.
Tom Ellison
> Tom, > [quoted text clipped - 120 lines] >> >> Tom Ellison Jt3mitch1 - 21 Nov 2005 21:16 GMT Tom,
If the 90th percentile was the 11th result and we had two results ranked 10th, I would return the 10th ranked value.
I include labnumber in the query1 only to have a unique value for each record. The labnumbers won't be in the same order as the results, so I'm not sure they could be used for a secondary rank. In the end, I wouldn't return the labnumber, I only need the 90thpercentile value for each id/date combination.
Thanks again for the extra effort, JT
> Dear JT: > [quoted text clipped - 20 lines] > > Tom Ellison Tom Ellison - 21 Nov 2005 22:43 GMT Dear JT:
Since you don't require the query to return any other specific information from the row that is the 90th percentile, then we don't have to worry about the possibility that there could be more than one row that is tied for the same value or Result.
Now, if you rank all the rows in each group by ascending values for Result you can then get the maximum rank value that is less than or equal to the total number of rows in the group multiplied by 0.9. Once you have this rank value, you can obtain the row(s) that rank that way and return the Result value in that set of rows. Since all the rows with the same rank have the same Result value, you can use the MIN(Result) or the MAX(Result) of all these, since those will all be the same.
OK, that's probably easier said than done. So, here goes an attempt at coding all that.
I'm going to make this in pieces, each of which needs to be a saved query. This allows you to test each piece and make sure it works correctly. It also avoids problems where the Jet database really has very limited capabilities to perform subqueries.
Query3:
SELECT ID, [date], labnumber, result, (SELECT COUNT(*) + 1 AS Rank FROM query1 Q1A WHERE Q1A.ID = Q1.ID AND Q1A.[date] = Q1.[date] AND Q1A.result < Q1.result) AS Rank FROM query1 Q1 GROUP BY ID, [date]
This should rank the rows within each ID/date group. This rank value will be used to find the 90th percentile.
Query4:
SELECT ID, [date], MAX(Rank) AS Ninetieth, FROM query3 Q3 WHERE Rank <= 0.9 * (SELECT COUNT(*) FROM query3 Q3A WHERE Q3A.ID = Q3.ID AND Q3A.[date] = Q3.[date]) GROUP BY ID, [date]
One note on this on: if there is only one row for an ID/date combination, then this cannot work. The Rank of that one row would be 1, and there are then no rows whose rank is less than or equal to 0.9 * 1.
Query5:
SELECT ID, [date], MIN(result) AS result FROM Query3 Q3 INNER JOIN Query4 Q4 ON Q4.ID= Q3.ID AND Q4.[date] = Q3.[date] AND Q4.Ninetieth = Q3.Rank GROUP BY ID, [date]
Perhaps, if I've done this carefully enough, this may work and be what you need. Please test each query and see what it does, confirming its results.
Just remember, I'm creating progressive program steps here without being able to test any of them.
Please let me know if this helped and if I can be of any other assistance.
Tom Ellison
> ID date labnumber(unique) result > 1 10/01/01 1 15.0 [quoted text clipped - 6 lines] > 2 01/01/03 8 1.0 > 2 01/01/03 9 2.5 Tom Ellison
> Tom, > [quoted text clipped - 47 lines] >> >> Tom Ellison Jt3mitch1 - 22 Nov 2005 18:51 GMT Tom, Query3 worked fine. I'm having problems with Query4.
Query3: SELECT Q1.id, Q1.date, Q1.labnumber, Q1.result, (SELECT COUNT(*) + 1 FROM query1 Q1A WHERE Q1A.ID = Q1.ID AND Q1A.[date] = Q1.[date] AND Q1A.result < Q1.result) AS Rank FROM query1 AS Q1;
Query4: SELECT Q3.id, Q3.date, Max(Q3.Rank) AS Ninetieth FROM query3 AS Q3 WHERE (((Q3.Rank)<=0.9*(SELECT COUNT(*) FROM query3 Q3A WHERE Q3A.ID = Q3.ID AND Q3A.[date] = Q3.[date]))) GROUP BY Q3.id, Q3.date;
It looks like query4 should work like expected. Access 2000 doesn't seem to like the WHERE criteria. I'm able to see design and SQL view, but can't run the query. Nothing happens when I try to open or run the query. When I close query4 and try to close another query or table, it gives me a "this action will reset the current code" error message and I have to close the objects in design view. I tried creating a test database with only 1 table and the 3 queries, but had the same problem.
Thanks again, JT
> Dear JT: > [quoted text clipped - 67 lines] > > Tom Ellison Tom Ellison - 22 Nov 2005 20:25 GMT Dear JT:
I'm looking at the code, edited slightly for my convenience, below:
SELECT Q3.id, Q3.date, Max(Q3.Rank) AS Ninetieth FROM query3 AS Q3 WHERE Q3.Rank <= 0.9* (SELECT COUNT(*) FROM query3 Q3A WHERE Q3A.ID = Q3.ID AND Q3A.[date] = Q3.[date]) GROUP BY Q3.id, Q3.date;
You say this "doesn't seem to like the WHERE criteria."
Try filtering it to only a single ID.
SELECT Q3.id, Q3.date, Max(Q3.Rank) AS Ninetieth FROM query3 AS Q3 WHERE Q3.Rank <= 0.9* (SELECT COUNT(*) FROM query3 Q3A WHERE Q3A.ID = Q3.ID AND Q3A.[date] = Q3.[date]) WHERE id = 111 (put your actual number in here) GROUP BY Q3.id, Q3.date;
Try letting it run overnight, just to see.
It doesn't seem to me there's something wrong. However, Access Jet has long been very limited. I switched to MSDE because of this. One of my best decisions ever. I can practically guarantee you you would not have such problems with MSDE.
Tom Ellison
> Tom, > Query3 worked fine. I'm having problems with Query4. [quoted text clipped - 110 lines] >> >> Tom Ellison Jt3mitch1 - 22 Nov 2005 21:31 GMT Tom, I was running the queries on a table of 31 records, so I thought it shouldn't need to run long. Because we didn't see anything wrong with the SQL, I decided to jump to another computer and run it there. It worked magnificently. I'm not sure why it blows up on my computer, I'll probably try and reinstall Access and hope that fixes the problem.
Thanks for all the help, JT
> Dear JT: > [quoted text clipped - 31 lines] > > Tom Ellison Tom Ellison - 22 Nov 2005 22:25 GMT Dear JT:
Only "magnificently"? Well, I guess that will just have to do.
Tom Ellison
> Tom, > I was running the queries on a table of 31 records, so I thought it [quoted text clipped - 42 lines] >> >> Tom Ellison
|
|
|