diff mbox series

[1/2] docparse: Replace \t with space

Message ID 20210121130033.20764-1-pvorel@suse.cz
State New
Headers show
Series [1/2] docparse: Replace \t with space | expand

Commit Message

Petr Vorel Jan. 21, 2021, 1 p.m. UTC
to avoid constant failures because tabs are forbidden in JSON.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
---
Hi,

Currently required for "Convert CAN tests to new LTP API" patchset
https://patchwork.ozlabs.org/project/ltp/patch/20210120143723.26483-5-rpalethorpe@suse.com/
https://patchwork.ozlabs.org/project/ltp/patch/20210120143723.26483-6-rpalethorpe@suse.com/

Kind regards,
Petr

 docparse/data_storage.h | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Petr Vorel Jan. 21, 2021, 1:05 p.m. UTC | #1
Hi,

...
>  	size_t size = sizeof(struct data_node_string) + strlen(string) + 1;
>  	struct data_node *node = malloc(size);
> +	char *ix = node->string.val;

>  	if (!node)
>  		return NULL;
> @@ -61,6 +62,9 @@ static inline struct data_node *data_node_string(const char *string)
>  	node->type = DATA_STRING;
>  	strcpy(node->string.val, string);

Probably better to comment this:
/* tabs are not allowed in JSON */
> +	while ((ix = strchr(ix, '\t')))
> +		*ix++ = ' ';

Kind regards,
Petr
Richard Palethorpe Jan. 21, 2021, 2:47 p.m. UTC | #2
Hello,

Petr Vorel <pvorel@suse.cz> writes:

> to avoid constant failures because tabs are forbidden in JSON.
>
> Signed-off-by: Petr Vorel <pvorel@suse.cz>
> ---
> Hi,
>
> Currently required for "Convert CAN tests to new LTP API" patchset
> https://patchwork.ozlabs.org/project/ltp/patch/20210120143723.26483-5-rpalethorpe@suse.com/
> https://patchwork.ozlabs.org/project/ltp/patch/20210120143723.26483-6-rpalethorpe@suse.com/
>
> Kind regards,
> Petr
>
>  docparse/data_storage.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/docparse/data_storage.h b/docparse/data_storage.h
> index ef420c08f..99c2514b7 100644
> --- a/docparse/data_storage.h
> +++ b/docparse/data_storage.h
> @@ -54,6 +54,7 @@ static inline struct data_node *data_node_string(const char *string)
>  {
>  	size_t size = sizeof(struct data_node_string) + strlen(string) + 1;
>  	struct data_node *node = malloc(size);
> +	char *ix = node->string.val;
>  
>  	if (!node)
>  		return NULL;
> @@ -61,6 +62,9 @@ static inline struct data_node *data_node_string(const char *string)
>  	node->type = DATA_STRING;
>  	strcpy(node->string.val, string);
>  
> +	while ((ix = strchr(ix, '\t')))
> +		*ix++ = ' ';


JQ says "control characters from U+0000 through U+001F must be
escaped". So I expect it is only a matter of time until some other
control character is used.

Perhaps we should escape all control characters into the \uXXXX
hexidecimal form?

http://www.json.org/json-en.html

> +
>  	return node;
>  }
Cyril Hrubis Jan. 21, 2021, 2:50 p.m. UTC | #3
Hi!
> JQ says "control characters from U+0000 through U+001F must be
> escaped". So I expect it is only a matter of time until some other
> control character is used.
> 
> Perhaps we should escape all control characters into the \uXXXX
> hexidecimal form?

Or fail the compilation if we get one of these into the parser?

There is no point in having them in the metadata anyways.
Petr Vorel Jan. 21, 2021, 3:43 p.m. UTC | #4
Hi,

> > JQ says "control characters from U+0000 through U+001F must be
> > escaped". So I expect it is only a matter of time until some other
> > control character is used.
+1

> > Perhaps we should escape all control characters into the \uXXXX
> > hexidecimal form?

> Or fail the compilation if we get one of these into the parser?
We do fail already, but it's a bit hidden now.
I don't know why build continues for so long.

Also there would have to be striking error message.
But docparser is not mandatory (people might have it disabled), thus mostly it
will be us the maintainers who is going to fix these whitespace issues :(.

> There is no point in having them in the metadata anyways.
That would be solved by replacing some reasonable subset. So far it was the tab
character.

Kind regards,
Petr
Richard Palethorpe Jan. 21, 2021, 3:59 p.m. UTC | #5
Hello,

Petr Vorel <pvorel@suse.cz> writes:

> Hi,
>
>> > JQ says "control characters from U+0000 through U+001F must be
>> > escaped". So I expect it is only a matter of time until some other
>> > control character is used.
> +1
>
>> > Perhaps we should escape all control characters into the \uXXXX
>> > hexidecimal form?
>
>> Or fail the compilation if we get one of these into the parser?
> We do fail already, but it's a bit hidden now.
> I don't know why build continues for so long.
>
> Also there would have to be striking error message.
> But docparser is not mandatory (people might have it disabled), thus mostly it
> will be us the maintainers who is going to fix these whitespace issues :(.
>
>> There is no point in having them in the metadata anyways.
> That would be solved by replacing some reasonable subset. So far it was the tab
> character.
>
> Kind regards,
> Petr

I suppose actually I could just escape the tab in the C string. But as
Petr says, docparse is not mandatory so anything which can pass C
compilation, but fails docparse is likely to create trouble.

It would be possible to force running docparse and doing some validation
on the JSON. As this would not require any more dependencies. In fact it
would be nice to run docparse to produce just the JSON without having to
install asciidoc[tor].

The Makefile doesn't seem to allow this. Although it is quite easy to
compile docparse without it.
Petr Vorel Jan. 21, 2021, 4:20 p.m. UTC | #6
> Hello,

> Petr Vorel <pvorel@suse.cz> writes:

> > Hi,

> >> > JQ says "control characters from U+0000 through U+001F must be
> >> > escaped". So I expect it is only a matter of time until some other
> >> > control character is used.
> > +1

> >> > Perhaps we should escape all control characters into the \uXXXX
> >> > hexidecimal form?

> >> Or fail the compilation if we get one of these into the parser?
> > We do fail already, but it's a bit hidden now.
> > I don't know why build continues for so long.

> > Also there would have to be striking error message.
> > But docparser is not mandatory (people might have it disabled), thus mostly it
> > will be us the maintainers who is going to fix these whitespace issues :(.

> >> There is no point in having them in the metadata anyways.
> > That would be solved by replacing some reasonable subset. So far it was the tab
> > character.

> > Kind regards,
> > Petr

> I suppose actually I could just escape the tab in the C string. But as
"escape the tab in C string" that's what it's being done in this patch.

> Petr says, docparse is not mandatory so anything which can pass C
> compilation, but fails docparse is likely to create trouble.
+1.

> It would be possible to force running docparse and doing some validation
> on the JSON. As this would not require any more dependencies. In fact it
> would be nice to run docparse to produce just the JSON without having to
> install asciidoc[tor].

> The Makefile doesn't seem to allow this. Although it is quite easy to
> compile docparse without it.
That'd be easy to change.

But, there is perl package dependency. If possible I'd allow people to compile
LTP without bothering with CPAN (mainly due these embedded build source distros,
e.g. Buildroot, Yocto).

Generally I'd decouple requirements for C source code content and JSON.
\t in C source will be printed as tab, that's ok.
That's why I changed the formatting in docparse/data_storage.h.
I wonder how many escape strings we want to use. Maybe \n (if that's not already
handled).

I wonder if we can do some string validation in git commit or push hooks.
Although not sure if it's a good idea.

Kind regards,
Petr
Cyril Hrubis Jan. 21, 2021, 4:28 p.m. UTC | #7
Hi!
> > The Makefile doesn't seem to allow this. Although it is quite easy to
> > compile docparse without it.
> That'd be easy to change.
> 
> But, there is perl package dependency. If possible I'd allow people to compile
> LTP without bothering with CPAN (mainly due these embedded build source distros,
> e.g. Buildroot, Yocto).

I guess that we can easily catch any non-ascii characters in the C
docparse tool, no perl needed.
diff mbox series

Patch

diff --git a/docparse/data_storage.h b/docparse/data_storage.h
index ef420c08f..99c2514b7 100644
--- a/docparse/data_storage.h
+++ b/docparse/data_storage.h
@@ -54,6 +54,7 @@  static inline struct data_node *data_node_string(const char *string)
 {
 	size_t size = sizeof(struct data_node_string) + strlen(string) + 1;
 	struct data_node *node = malloc(size);
+	char *ix = node->string.val;
 
 	if (!node)
 		return NULL;
@@ -61,6 +62,9 @@  static inline struct data_node *data_node_string(const char *string)
 	node->type = DATA_STRING;
 	strcpy(node->string.val, string);
 
+	while ((ix = strchr(ix, '\t')))
+		*ix++ = ' ';
+
 	return node;
 }