c# - Amazon S3, Syncing, Modified date vs. Uploaded Date -
we're using aws sdk .net , i'm trying pinpoint seem having sync problem our consumer applications. have push-service generates changeset files uploaded s3, , our consumer applications supposed download these files , apply them in order sync correct state, not happening.
there's conflicting views on what/where correct datestamps represented. our consumers written @ s3 file's "lastmodified" field sort downloaded files processing, , don't know anymore field represents. @ first thought represented date modified/created of file uploaded, (as seen here) represents new date stamp of when file uploaded, , likewise in same link seems imply when file downloaded reverts old datestamp (but cannot confirm this).
we're using snippet of code pull files
// list of latest changesets since last successful full update. amazon.s3.amazons3client client = ...; list<amazon.s3.model.s3object> listobjects = client.getfullobjectlist( this.settings.gets3listobjectsrequest(this.settings.s3changesetsubbucket), amazon.s3.amazons3client.datecomparisontype.greaterthan, lastmodifieddate, amazon.s3.amazons3client.stringtokencomparisontype.mustcontainall, this.settings.requiredchangesetpathtokens);
and sort s3object's lastmodified (which think our assumption wrong)
foreach (amazon.s3.model.s3object obj in listobjects) { if (datetime.parse(obj.lastmodified) > lastmodifieddate) { //it's new file, use insertion sort put file in ordered list //based on lastmodified } }
am correct in assuming should doing more preserve our own datestamps need, such using custom header/metadata objects put correct datestamps on files need, or putting in filename itself?
edit
perhaps question can answer problem: if service has 2 files upload s3 , goes through process of doing that, guaranteed these files show in s3 in order uploaded (via lastmodified) or s3 amount of asynchronous processing lead files showing in list of s3 object out of order? i'm worried case where, example, service uploaded files b, b shows first in s3, consumers + process b, shows up, , consumers may or may not , incorrectly process thinking it's newer when it's not?
edit 2
it , person below suspected , had racing conditions trying apply changesets in order while blindly relying on s3's datestamps. addendum, ended making 2 fixes try , address problem, might useful others well:
firstly, address race condition between when our uploads finish , modified dates reported s3, decided make our queries past 1 second last date modified read pulled file in s3. in examining fix saw problem in s3 wasn't apparent before, namely s3 not preserve milliseconds on timestamps, rather rounded them next second timestamps. looking in time 1 second circumvented this.
secondly, since looking in time have problem of downloading same file multiple times if there weren't new changeset files download, added filename buffer files saw in our last request, skipped files had seen, , refreshed buffer when saw new files.
hope helps.
when listing objects in s3 bucket, api response received s3 return them in alphabetical order.
the s3 api not allow filter or sort objects based on lastmodified value. such filtering or sorting done exclusively in client libraries use connect s3.
http://docs.aws.amazon.com/amazons3/latest/api/restbucketget.html
as accuracy of lastmodified value , it's possible use sort list of objects based on time uploaded, knowledge, lastmodified value set time upload finishes (when server returns 200 ok response) , not time upload started.
this means if start upload that's 100mb in size , second later start upload b that's 1k in size, in end, last modified timestamp after last modified timestamp b.
if need preserve time upload started, it's best use custom metadata header original put request.
Comments
Post a Comment