Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsFormsForms ProgrammingQueriesModules / DAO / VBAReports / PrintingMacrosDatabase DesignSecurityConversionImporting / LinkingSQL Server / ADPMultiuser / NetworkingReplicationSetup / ConfigurationDeveloper ToolkitsActiveX ControlsNew UsersGeneral 1General 2
Access DirectoryToolsTutorialsUser Groups
Related Topics
SQL ServerOther DB ProductsMS OfficeMore Topics ...

MS Access Forum / Modules / DAO / VBA / May 2007

Tip: Looking for answers? Try searching our database.

Parse and merge data

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
JonoB - 06 May 2007 13:00 GMT
Hi all,

Need some tips to help me approach this problem. I would consider myself
advanced in Access and VBA, so just looking for conceptual pointers.

In essence, I have a number of text files, that needs to be parsed and
merged. Each text file will contain around 70,000 records. For now, lets
assume that I have only 2 text files that will need merging.

I have managed to succesfully parse the files and sort each line in the text
file into a database record that will contain 3 fields; a time stamp field
(converted to a long number as time is measured in thousandth's of seconds),
a Datasource field (i.e. who actually generated the text file), and a string
field (containing a text string of up to 255 characters).

Basically, what I would like to do is merge the data from the two files, in
such a way that I am only left with one unique set of records; i.e. any
duplicates in the second datasource that are identical to the first
datasource will be deleted, and any records that are in the second datasource
but not in the first will be added. I will therefore end up with a "superset"
of data.

The primary difficulty in the above is the time stamp field in each
datasource is dependant on the user's own PC clock. Therefore, there may be
time differences between records that would otherwise be identical. These
time differences are never more than 30 seconds. Moreover, the time
difference is not exactly the same for each matched record. A SELECT DISTINCT
query would have done the job for me, other than the fact that the time stamp
field prevents this from happening. So I though of finding out the difference
in times between the two recordsources, and as it turns out the difference
may vary every so slightly through the datasources, so some records may be
out by 0.15 of a second and others by 0.14 and others by 0.13, for example.

So, what I have done up until now is to create a loop through the recordset,
but as expected, this takes a longgggg time (note that records are not yet
deleted/added; they are just marked as being matched in the following code).

I would love to hear if there are better ways of approaching this problem.
Many thanks for any help that you can provide.

Dim CurrTime As Long

Dim varArray

CurrTime = Time()
'Recordset to retrieve the combat log
   Set cnn = CurrentProject.Connection
   Set rs1 = New ADODB.Recordset
   
   strSql1 = "Select ParseID, lngTime, strCharName, strMsg, lngParseID from
tblParsedData ORDER BY lngTime ASC"
   
   rs1.Open strSql1, cnn, adOpenKeyset, adLockOptimistic
   
   
   varArray = rs1.GetRows
   lngRecordCount1 = rs1.RecordCount
   
   lngRecordCount1 = rs1.RecordCount - 1

   strDataSource1 = "Leandra"
   strDataSource2 = "Spuge"
   
   For i = 0 To lngRecordCount1

   
       If Not IsNull(varArray(4, i)) Then 'Looks to see if we have matched
it yet
           'Do nothing as its aleady been matched, so move onto next i
       Else
           If varArray(2, i) = strDataSource1 Then
           strMsg = varArray(3, i)
           lngParseID = varArray(0, i)
           lngTime1 = varArray(1, i)
               For j = lngCurrRecord To lngRecordCount1
                   If (varArray(1, j) - lngTime1) > 30000 Then '>30
seconds, stop looking for a match
                       Exit For
                   Else
                       If IsNull(varArray(4, j)) Then 'Looks to see if we
have matched it yet
                           If varArray(2, j) = strDataSource2 Then
                               If varArray(3, j) = strMsg Then
                                   varArray(4, j) = lngParseID
                                   varArray(4, i) = varArray(0, j)
                                   lngCurrRecord = j
                                   Exit For
                               End If
                           End If
                       End If
                   End If
               Next
           End If
           
       End If

   
   Next i
       
   Debug.Print Time() - CurrTime
John Spencer - 06 May 2007 15:17 GMT
I might try truncating the "time" field to the nearest second (or other
appropriate interval).  Int(LngTime/1000) as NewTime

Then you could either import ALL the records and use distinct in the query

SELECT Distinct Int(LngTime/1000) as CalcLongTime, ...

Or do the truncation during the Append

'====================================================
 John Spencer
 Access MVP 2002-2005, 2007
 Center for Health Program Development and Management
 University of Maryland Baltimore County
'====================================================

> Hi all,
>
[quoted text clipped - 97 lines]
>     Debug.Print Time() - CurrTime
>    
JonoB - 06 May 2007 15:54 GMT
Hi John,

Thanks very much for replying.

Obviously we are thinking along the same lines, because I have been playing
around with the *exact* methodology before I read your post. Weird.

It does give me very good results, but the issue remains that its still not
100% accurate because the time differences are not always the same. So,
whilst most records are correctly removed by the SELECT DISCTINCT, some
remain because they are out by a fraction of a second. So, for matching
records, it will be out by 1 second.

Not quite sure how to get around this....and I am not sure if its actually
possible.

Thanks again.

> I might try truncating the "time" field to the nearest second (or other
> appropriate interval).  Int(LngTime/1000) as NewTime
[quoted text clipped - 113 lines]
> >     Debug.Print Time() - CurrTime
> >    
JonoB - 06 May 2007 17:05 GMT
OK, so what I have done is looped through the first 100 or so records, to
obtain an average time differential, and then applied that as a time offset.

It seems pretty accurate.

Thanks again.

> Hi John,
>
[quoted text clipped - 131 lines]
> > >     Debug.Print Time() - CurrTime
> > >    
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.