View Full Version : Parsing Strings...
  
TerdRatchett
- 10th February 2009, 03:55
Hi Everyone,
I have written a big chunk of code an it is working well, but I am stuck on something having to do with strings.
I have received a string from a serial device which contains two target strings seperated by a comma. The data looks like this:
1432,765
the length may vary like this:
22567,8345
so the the comma is key as a locator. I can't figure out with what little documentation is in the users manual how to parse the string for these two numbers. 
Thanks in advance!
TR
mackrackit
- 10th February 2009, 04:14
Look in the manual under SERIN2.
Look for the part something like:
WAIT STR\n followed by optional end character...for the first part and then use something like WAIT[,] to get the last part.
Jerson
- 10th February 2009, 10:16
Can you clarify what you mean when you say "I have received a string from a serial device"  Does it mean it has been stored somewhere and you have to parse it from there?  SERIN2 will not do that for you.
TerdRatchett
- 11th February 2009, 04:02
I appreciate all of your help!
I had written the code using Hserin to capture the raw data from my serial port into a single string of [10]. It works so now I need to separate the two values. So... 
14575,354 would end up as two longs,
one with 14575 and another with 345. Either number could be as large as 99999 or as small as 1.
Thanks in advance,
TR
mackrackit
- 11th February 2009, 04:13
HSERIN uses the same syntax as SERIN2, so do the "separation" as the data is coming in.
Jerson
- 11th February 2009, 10:57
Ok, since you have the string to be parsed, you will need a parser too!  I dont have anything ready at hand but, I'll give it a shot anyway.  UNTESTED CODE FOLLOWS
' assume the string is TOPARSE
' it will return you 2 numbers,  one from before the comma, one from after the comma
Number1 var Long
Number2 var Long
Cntr       var byte      ' counter to index the string
ParseNumber:
 Number1 = 0              ' start with 0 in both numbers
 Number2 = 0
 'collect Number1
 for Cntr = 0 to 10        ' you said your string is 10 places long
    if TOPARSE[Cntr] <> "," then
         Number1=Number1*10                             ' x10 to make place for the new digit
         Number1 = Number1+TOPARSE[Cntr]-'0'      ' I'm assuming this is an ASCII string
    else
         goto GetNum2     ' collect the remainder as number2
    endif
 next
 return
GetNum2:
 for Cntr=Cntr+1 to 10
    if TOPARSE[Cntr] <> "," then
         Number2=Number2*10
         Number2 = Number2+TOPARSE[Cntr]-'0'      ' I'm assuming this is an ASCII string
    else
         return                ' because, you said there are only 2 numbers ;)
    endif
 next
 return
You could modify this code to do a repetitive parse.  Every time you call it, it would return you a number, but I'll leave that to you.
Good luck.
mackrackit
- 11th February 2009, 15:06
I still think separating the numbers on the way in is the way to go.  Although Jerson's code is pretty slick.
Here is what I was thinking.  Just tested it using SERIN2 as the chip I have on the bench is not set up for HSERIN right now.
The first WAIT is to keep garbage out.
N1	VAR	LONG
N2	VAR	LONG
LOOP:
SERIN2 PORTD.2, 16416, [WAIT("X"),DEC N1,WAIT(","),DEC N2]
GOTO DISPLAY
DISPLAY:
Serout2 PORTC.6, 16416, [ DEC N1, $d, $a]
Serout2 PORTC.6, 16416, [ DEC N2, $d, $a]
GOTO LOOP
If you need the data in an array just modify the above with STR...
TerdRatchett
- 11th February 2009, 18:41
Both great suggestions for which I appreciate!!! I'm pondering....
TerdRatchett
- 13th February 2009, 05:08
I've tried dozens of ideas along those lines and no solution works....
Like:
hserIN 65535, oops, [STR smallp\5, WAIT(","), STR largep\5\13]
problem here is the string size I expect to receive is variable, but the STR forces you to put in the string length. If it's shorter the function doesn't return anything, yet it doesn't time out. I'm ending up with the comma in my first string, even though I'm only looking for it as a seperator and after I receive the first serial data, every time thereafter there is a non-acsii character in the first byte of smallp.
If the data was always the same length it would be a breeze...
mackrackit
- 13th February 2009, 05:43
From the manual:
STR followed by a byte array variable, count and optional ending
character will receive a string of characters. The string length is
determined by the count or when the optional character is
encountered in the input.
[STR smallp\10\,]
See if that will get the first part correctly.
Does the data comming in have a qualifying character before the data you want?  If so then the first example I gave will work.  If it does not have a qualifying character then leave that part out.
TerdRatchett
- 13th February 2009, 15:25
I appreciate your help! Very confused :(
hserIN 65535, oops, [STR smallp\10\","]
The first time thru I get the correct first part of the string. the device was sent 777,53 and smallp received 777.
The second time and every time thereafter the data is skewed. For example if the next packet that comes in is 766,59 smallp receives a 53766. The 53 from the last packet sent....
Charles Linquis
- 13th February 2009, 16:07
It would seem to be a lot easier to capture the whole sequence, then later scan the array for any commas and deal with it that way, especially since when you are using SERIN you have very little processing time available between characters.
TerdRatchett
- 13th February 2009, 16:19
I think that's a great suggestion and I agree. I have been unable to successfully parse it after the fact.
TerdRatchett
- 13th February 2009, 16:45
and to make matters worse I am unable to consistantly and correctly receive serial packets.
TerdRatchett
- 13th February 2009, 17:30
Well, I got the serial function working reliably by doing this:
        hserIN 65535, oops, [STR smallp\12\10]
rather than this:
        hserIN 65535, oops, [STR smallp\12\13]
aparently the linefeed was the last command sent, so by terminating with a carraige return, the linefeed that followed was polluting the serial stream for the next packet
TerdRatchett
- 13th February 2009, 18:22
So now that I'm just looking at a parsing issue I've changed the serial in code to:
        hserIN 65535, oops, [STR smallp\5, WAIT(","), STR largep\5\10]   
It doesn't work as expected. When receiving 1480,95...
STR smallp\5 actually captures 5 characters including the comma even though it's the "wait for" character. I can't make smallp\4 because sometimes the data received will be 5 characters. It could also be 3 or 2 or 1 characters.
Luckyborg
- 13th February 2009, 18:27
Why don't you try it this way
hserIN 65535, oops, [STR smallp\5\","]
hserIN 65535, oops, [STR largep\5\13]
then you have the values in 2 streams, each left justified-
David
Archangel
- 13th February 2009, 18:37
Wow, second post and you are already helping . . . <B><H2>WELCOME TO THE FORUM DAVID !
TerdRatchett
- 13th February 2009, 18:44
Hi David,
That would almost work. The problem is that neither string is guaranteed to be exactly 5 characters. So if the sent data was 1499,136 I would actually get 1499, likewise if the data is only 3 characters like 210,152 I would actually get 210,1
confusing!!
Luckyborg
- 13th February 2009, 19:24
That is what the option end character is good for. By setting the end character to a "," then the command will take up to but not necessarily 5 characters until it gets the "," or in the second line the Line feed.  I put a 13-should be 10, as you mentioned earlier the line feed comes after the carriage return.  So assuming you reset the stream before each update to all 0s if you sent the stream
1499,136(CR)(LF)
then 
hserIN 65535, oops, [STR smallp\5\","]
would give you an array with "1"-"4"-"9"-"9"-0 in smallp and 
hserIN 65535, oops, [STR largep\5\10]
would give you "1"-"3"-"6"-13-0 in largep
You would just then ignore the 13 or re zero it when you do your parsing
Thanks for the welcome Joe, I've been in the forum reading all week, mostly older threads.  I'm finally sort of caught up on what is still relevant.
TerdRatchett
- 13th February 2009, 20:34
That was the way I read the manual as well. I thought that as soon as the function saw a character meeting the WAIT criteria it would terminate collecting any more characters. In fact what I'm seeing is that it continues to gather all 5 characters, so it collects the comma as well if the string is less than 5!
The manual says the string length is determined by the count OR when the optional character is encountered. Unfortunately that sentence can be interpreted two ways.
mackrackit
- 14th February 2009, 15:54
Hi, 
Played with this some more...
Without a qualifier for some reason the buffer keeps the last data sent and returns it with the next.  I guess you do not have a qualifier at the beginning of the string  so use the LF or CR at the end.
This is what I am sending
123456789,987654321
and I have my terminal set to terminate the line with a CR LF.
The working code
###############
<html>
<head></head>
<body><!--StartFragment--><pre><code><font color="#000080">
'18F6680   02/14/09  INFEED PARSE TEST BAUD 9600
    </i></font><font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>OSC </b></font>20
    @ <font color="#0000FF"><b>__CONFIG    _CONFIG1H</b></font>, <font color="#0000FF"><b>_OSC_HS_1H
    </b></font>@ <font color="#0000FF"><b>__CONFIG    _CONFIG2H</b></font>, <font color="#0000FF"><b>_WDT_ON_2H </b></font>& <font color="#0000FF"><b>_WDTPS_128_2H
    </b></font>@ <font color="#0000FF"><b>__CONFIG    _CONFIG4L</b></font>, <font color="#0000FF"><b>_LVP_OFF_4L
    </b></font><font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_DREG     PORTG 
    </b></font><font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_DBIT     </b></font>0
    <font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_RSREG    PORTE 
    </b></font><font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_RSBIT    </b></font>0
    <font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_EREG     PORTE 
    </b></font><font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_EBIT     </b></font>1
    <font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_BITS     </b></font>4 
    <font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_LINES    </b></font>4
    <font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_COMMANDUS    </b></font>3000 
    <font color="#FF0000"><b>DEFINE </b></font><font color="#0000FF"><b>LCD_DATAUS   </b></font>150
  <font color="#000080"><i>'###############################################
    </i></font><font color="#FF0000"><b>PAUSE </b></font>100 : <font color="#FF0000"><b>LCDOUT </b></font>$FE,1,<font color="#00FF00"><b><i>"TEST"
    </i></b></font><font color="#0000FF"><b>N1 </b></font><font color="#FF0000"><b>VAR LONG</b></font>:<font color="#0000FF"><b>N2 </b></font><font color="#FF0000"><b>VAR LONG
    </b></font><font color="#0000FF"><b>START</b></font>: <font color="#0000FF"><b>N1 </b></font>= 0 : <font color="#0000FF"><b>N2 </b></font>= 0
    <font color="#FF0000"><b>HIGH </b></font><font color="#0000FF"><b>PORTG</b></font>.4 :<font color="#FF0000"><b>PAUSE </b></font>250:<font color="#FF0000"><b>LOW </b></font><font color="#0000FF"><b>PORTG</b></font>.4
    <font color="#0000FF"><b>RCSTA</b></font>.4 = 0 : <font color="#0000FF"><b>RCSTA</b></font>.4 = 1
    <font color="#000080"><i>'CHANGE LINE FEED AND CARRIAGE RETURN AS REQUIRED 
    </i></font><font color="#0000FF"><b>RCSTA</b></font>=$90:<font color="#0000FF"><b>TXSTA</b></font>=$24:<font color="#0000FF"><b>SPBRG</b></font>=129:<font color="#FF0000"><b>HSERIN</b></font>[<font color="#0000FF"><b>WAIT</b></font>($a),<font color="#0000FF"><b>WAIT</b></font>($d),<font color="#FF0000"><b>DEC </b></font><font color="#0000FF"><b>N1</b></font>,<font color="#0000FF"><b>WAIT</b></font>(<font color="#00FF00"><b><i>","</i></b></font>),<font color="#FF0000"><b>DEC </b></font><font color="#0000FF"><b>N2</b></font>] 
    <font color="#FF0000"><b>LCDOUT </b></font>$FE,1,<font color="#FF0000"><b>DEC </b></font><font color="#0000FF"><b>N1 </b></font>: <font color="#FF0000"><b>LCDOUT </b></font>$FE,$C0,<font color="#FF0000"><b>DEC </b></font><font color="#0000FF"><b>N2 </b></font>: <font color="#FF0000"><b>GOTO </b></font><font color="#0000FF"><b>START
</b></font></code></pre><!--EndFragment--></body>
</html>
TerdRatchett
- 15th February 2009, 03:38
Hi Dave,
I think your latest solution might have worked, but I had already got it working by parsing it after the fact based on Jersons parse routine. I really appreciate everyones input. Here is what the working code looks like:
        hserIN 65535, oops, [STR inputdata\12\10] 'get the serial data   
        gosub ParseNumber   'split the raw data into 2 numbers
ParseNumber:        ' collect both pieces of the data in sep arrays
        for Cntr = 0 to 10        
        if inputdata[Cntr] <> "," then  ' Found comma? We've got first
            smallp[Cntr] = inputdata[Cntr]
        else
            
            Cntr2 = Cntr    'marks where the comma was
            'goto earlyexit
            goto GetNum2     ' Now get the second number
        endif
        next
earlyexit:        
        return
GetNum2:
        CNTR = Cntr + 1     'get past the comma
        for Cntr = Cntr to 10
        if inputdata[Cntr] < "0" then
            RETURN
        else 
        If inputdata[Cntr] > "9" then
            RETURN
        ELSE    
            largep[Cntr-Cntr2-1] = inputdata[Cntr]  ' It's a valid number
        endif
        ENDIF
        next
        
        return
I had to modify the array pointer for the second number by subtracting the point at which the comma was found on the first pass from it's pointer. In other words the pointer for the input data can't be used directly to point to the second number.
Archangel
- 15th February 2009, 04:13
<b>Proof Again</b> There is more than 1 right way to do almost everything.
 
Powered by vBulletin® Version 4.1.7 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.