r - Extracting binary data from a mixed data file -


i trying read binary data mixed data file (ascii , binary) using r, data file constructed in pseudo-xml format. idea had use scan function, read specific lines , convert binary numerical values can't seem in r. have python script this, job in r, python script below. binary section within data file enclosed start , end tags , .

the data file proprietary format containing spectroscopic data, link example data file included below. quote user manual:

data of bindata elements written binary array of bytes. each 8 bytes of binary array represent 1 double-precision floating-point value. therefore size of binary array numberofpoints * 8 bytes. two-dimensional arrays, data layout follows row-major form used safearrays. means moving next array element increments last index. example, if two-dimensional array (e.g. data(i,j)) written in such one-dimensional binary byte array form, moving next 8 byte element of binary array increments last index of original two-dimensional array (i.e. data(i,j+1)). after last element of binary array combination of carriage return , linefeed characters (ansi characters 13 , 10) written.

thanks suggestions in advance!

link example data file:

https://docs.google.com/file/d/0b5f27d7b1emfqwg0qvrhuwuwdk0/edit?usp=sharing

python script:

import sys, struct, csv f=open(sys.argv[1], 'rb') # t = f.read() = t.find("<bindata>") + len("<bindata>") + 2 # add \r\n line end header = t[:i] # t = t[i:] = t.find("\r\n</bindata>") bin = t[:i] # doubles=[] in range(len(bin)/8):   doubles.append(struct.unpack('d', bin[i*8:(i+1)*8])[0]) # footer = t[i+2:] # myfile = open("output.csv", 'wb') wr = csv.writer(myfile, quoting=csv.quote_all) wr.writerow(doubles) 

i wrote pack package make easier. still have search start/end of binary data though.

b <- readbin("120713b01.ols", "raw", 4000) # raw version of start of bindata tag beg.raw <- chartoraw("<bindata>\r\n") # take first match, in case binary data randomly contains "<bindata>\r\n" beg.loc <- grepraw(beg.raw,b,fixed=true)[1] + length(beg.raw) # convert header text header <- scan(text=rawtochar(b[1:beg.loc]),what="",sep="\n") # search "<number of points"> tags , calculate total number of points numpts <- prod(as.numeric(header[grep("<number of points>",header)+1]))  library(pack) data <- unlist(unpack(rep("d", numpts), b[beg.loc:length(b)])) 

Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -