diff mbox

dtc: Add python source code output

Message ID 1225958144.25986.9.camel@localhost (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Michael Ellerman Nov. 6, 2008, 7:55 a.m. UTC
This commit adds an output format, which produces python
code. When run, the python produces a data structure that
can then be inspected in order to do various things.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---

I'm not sure if this is generally useful (or sane) but it was for me so
I thought I'd post it.

I have a dts that I want to use to configure a simulator, and this
seemed like the nicest way to get there. dtc spits out the pythonised
device tree, and then I have a 10 line python script that does the
configuring.

cheers


 Makefile.dtc |    1 +
 dtc.c        |    3 +
 dtc.h        |    1 +
 python.c     |  129 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 134 insertions(+), 0 deletions(-)
 create mode 100644 python.c

Comments

David Gibson Nov. 7, 2008, 2:31 a.m. UTC | #1
On Thu, Nov 06, 2008 at 06:55:44PM +1100, Michael Ellerman wrote:
> This commit adds an output format, which produces python
> code. When run, the python produces a data structure that
> can then be inspected in order to do various things.
> 
> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
> ---
> 
> I'm not sure if this is generally useful (or sane) but it was for me so
> I thought I'd post it.

Hrm, well the idea of langauge source output seems reasonable.  But
the actual data structure emitted, and the method of construction in
Python both seem a bit odd to me.

> I have a dts that I want to use to configure a simulator, and this
> seemed like the nicest way to get there. dtc spits out the pythonised
> device tree, and then I have a 10 line python script that does the
> configuring.

[snip]
> diff --git a/python.c b/python.c
> new file mode 100644
> index 0000000..8ad0433
> --- /dev/null
> +++ b/python.c

AFAICT this is based roughly on the output side of treesource.c.  It
would be kind of nice if the two could be combined, with the same
basic structure looping over the device tree, and different emitters
for either python or dts source.  This would be similar to what we do
in flattree.c to emit either binary or asm versions of the flat tree.

> @@ -0,0 +1,129 @@
> +/*
> + * (C) Copyright David Gibson <dwg@au1.ibm.com>, IBM Corporation.  2005.
> + * (C) Copyright Michael Ellerman, IBM Corporation.  2008.
> + *
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License, or (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + *  General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, write to the Free Software
> + *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> + *                                                                   USA
> + */
> +
> +#include "dtc.h"
> +#include "srcpos.h"
> +
> +
> +static void write_propval_cells(FILE *f, struct property *prop)
> +{
> +	cell_t *cp = (cell_t *)prop->val.val;
> +	int i;
> +
> +	fprintf(f, "    p = Property('%s', [", prop->name);
> +
> +	for (i = 0; i < prop->val.len / sizeof(cell_t); i++)
> +		fprintf(f, "0x%x, ", fdt32_to_cpu(cp[i]));
> +
> +	fprintf(f, "])\n");
> +}
> +
> +static int isstring(char c)
> +{
> +	return (isprint(c)
> +		|| (c == '\0')
> +		|| strchr("\a\b\t\n\v\f\r", c));
> +}
> +
> +static void write_property(FILE *f, struct property *prop)
> +{
> +	const char *p = prop->val.val;
> +	int i, strtype, len = prop->val.len;
> +
> +	if (len == 0) {
> +		fprintf(f, "    p = Property('%s', None)\n", prop->name);
> +		goto out;
> +	}
> +
> +	strtype = 1;
> +	for (i = 0; i < len; i++) {
> +		if (!isstring(p[i])) {
> +			strtype = 0;
> +			break;
> +		}
> +	}
> +
> +	if (strtype)
> +		fprintf(f, "    p = Property('%s', '%s')\n", prop->name,
> +			prop->val.val);

This isn't correct.  The property value could contain \0s or other
control characters which won't be preserved properly if emitted
directly into the python source.  You'd need to escape the string, as
write_propval_string() does in treesource.c.

Uh.. there's also an interesting ambiguity here.  In OF and flat
trees, strings are NUL-terminated and the final '\0' is included as
part of the property length.  Python strings are not NUL-terminated,
they're bytestrings that know their own length.  I think making sure
all the conversions correctly preserve the presence/lack of a terminal
NUL, requires a bit more care here..

> +	else if (len == 4)
> +		fprintf(f, "    p = Property('%s', 0x%x)\n", prop->name,
> +			fdt32_to_cpu(*(const cell_t *)p));

There's a propval_cell() function in livetree.c you can use to
simplify this.

> +	else
> +		write_propval_cells(f, prop);

Uh.. this branch could be called in the case where prop is not a
string, but also doesn't have length a multiple of 4, which
write_propval_cells() won't correctly deal with.

These branches also result in the value having different Python types
depending on the context.  That's not necessarily a bad thing, but
since which Python type is chosen depends on a heuristic only, it
certainly needs some care.  You certainly need to be certain that you
can always deduce the exact, byte-for-byte correct version of the
property value from whatever you put into the Python data structure.

> +	
> +out:
> +	fprintf(f, "    n.properties.append(p)\n");

So, emitting Python procedural code to build up the data structure,
rather than a great big Python literal that the Python parser will
just turn into the right thing seems a bit of a roundabout way of
doing this.

> +}
> +
> +static void write_tree_source_node(FILE *f, struct node *tree, int level)
> +{
> +	char name[MAX_NODENAME_LEN + 1] = "root";

Why not just have the root node's name be the empty string, as we do
in the flat tree?

> +	struct property *prop;
> +	struct node *child;
> +
> +	if (tree->name && (*tree->name))
> +		strncpy(name, tree->name, MAX_NODENAME_LEN);
> +
> +	fprintf(f, "    n = Node('%s', parents[-1])\n", name);
> +
> +	if (level > 0)
> +		fprintf(f, "    parents[-1].children.append(n)\n");
> +	else
> +		fprintf(f, "    root = n\n");
> +
> +	for_each_property(tree, prop)
> +		write_property(f, prop);
> +
> +	fprintf(f, "    parents.append(n)\n");
> +
> +	for_each_child(tree, child) {
> +		write_tree_source_node(f, child, level + 1);
> +	}
> +
> +	fprintf(f, "    parents.pop()\n");
> +}
> +
> +
> +static char *header = "#!/usr/bin/python\n\
> +\n\
> +class Node(object):\n\
> +    def __init__(self, name, parent, unitaddr=None):\n\

The unitaddr parameter is never used afaict.

> +        self.__dict__.update(locals())\n\
> +        self.children = []\n\
> +        self.properties = []\n\
> +\n\
> +class Property(object):\n\
> +    def __init__(self, name, value):\n\
> +        self.__dict__.update(locals())\n\
> +";
> +
> +void dt_to_python(FILE *f, struct boot_info *bi, int version)
> +{
> +	fprintf(f, "%s\n", header);
> +	fprintf(f, "def generate_tree():\n");
> +	fprintf(f, "    parents = [None]\n");
> +
> +	write_tree_source_node(f, bi->dt, 0);
> +
> +	fprintf(f, "    root.version = %d\n", version);

Since you're not emitting a flat tree, the version is not relevant
here, and should not be a parameter (again, like dt_to_source()).

> +	fprintf(f, "    return root\n");
> +}
Milton Miller Nov. 10, 2008, 4:11 p.m. UTC | #2
On 2008-11-07 at 02:31:40, David Gibson wrote:
> On Thu, Nov 06, 2008 at 06:55:44PM +1100, Michael Ellerman wrote:
>> This commit adds an output format, which produces python
>> code. When run, the python produces a data structure that
>> can then be inspected in order to do various things.
...
>> I'm not sure if this is generally useful (or sane) but it was for me 
>> so
>> I thought I'd post it.
>
> Hrm, well the idea of langauge source output seems reasonable.  But
> the actual data structure emitted, and the method of construction in
> Python both seem a bit odd to me.
>
>> I have a dts that I want to use to configure a simulator, and this
>> seemed like the nicest way to get there. dtc spits out the pythonised
>> device tree, and then I have a 10 line python script that does the
>> configuring.

[snip]
> These branches also result in the value having different Python types
> depending on the context.  That's not necessarily a bad thing, but
> since which Python type is chosen depends on a heuristic only, it
> certainly needs some care.  You certainly need to be certain that you
> can always deduce the exact, byte-for-byte correct version of the
> property value from whatever you put into the Python data structure.
>>  +
>>  +out:
>>  +     fprintf(f, "    n.properties.append(p)\n");
>
> So, emitting Python procedural code to build up the data structure,
> rather than a great big Python literal that the Python parser will
> just turn into the right thing seems a bit of a roundabout way of
> doing this.

I would think so too.   I haven't looked at the output, only at Davids 
comments.  If the data structure is ambiguous, then I do think more 
thought is needed.

Have you considered just parsing the flat tree binary?   Either 
creating a python binding to libfdt or even just parsing the dtb 
directly?

I have written perl code to parse a dtb and query it for nodes and 
properties, it wasn't too bad.  I need to look at a bug report by 
another user and comment it, then I should seek the okays post it.  It 
is currently read-only and iterative callback based (like the kernels 
early-scan-flat-tree stuff), but I have planned creating a tree for 
querying, editing, and re-flattening.  Perl strings are counted length 
binary blobs, so property contents are interpreted with pack and 
unpack.  The library has been used to search a dtb to build a list of 
cpu instances and memory blocks, and it has been used to query the 
properties of a known node in the tree.

milton
Jimi Xenidis Nov. 10, 2008, 5 p.m. UTC | #3
On Nov 10, 2008, at 10:11 AM, Milton Miller wrote:

> On 2008-11-07 at 02:31:40, David Gibson wrote:
>> On Thu, Nov 06, 2008 at 06:55:44PM +1100, Michael Ellerman wrote:
>>> This commit adds an output format, which produces python
>>> code. When run, the python produces a data structure that
>>> can then be inspected in order to do various things.
> ...
>>> I'm not sure if this is generally useful (or sane) but it was for  
>>> me so
>>> I thought I'd post it.
>>
>> Hrm, well the idea of langauge source output seems reasonable.  But
>> the actual data structure emitted, and the method of construction in
>> Python both seem a bit odd to me.
>>
>>> I have a dts that I want to use to configure a simulator, and this
>>> seemed like the nicest way to get there. dtc spits out the  
>>> pythonised
>>> device tree, and then I have a 10 line python script that does the
>>> configuring.
>
> [snip]
>> These branches also result in the value having different Python types
>> depending on the context.  That's not necessarily a bad thing, but
>> since which Python type is chosen depends on a heuristic only, it
>> certainly needs some care.  You certainly need to be certain that you
>> can always deduce the exact, byte-for-byte correct version of the
>> property value from whatever you put into the Python data structure.
>>> +
>>> +out:
>>> +     fprintf(f, "    n.properties.append(p)\n");
>>
>> So, emitting Python procedural code to build up the data structure,
>> rather than a great big Python literal that the Python parser will
>> just turn into the right thing seems a bit of a roundabout way of
>> doing this.
>
> I would think so too.   I haven't looked at the output, only at  
> Davids comments.  If the data structure is ambiguous, then I do  
> think more thought is needed.

There is value in the DTC (optionally) emitting a python library and  
then having the DTC result use it.
It would allow for python to easily, at runtime, be able to modify the  
contents and not have to inline-edit, emit, compile a DTS.

BTW: it would also be nice if the python library to dump the dts (or  
even dtb)

>
>
> Have you considered just parsing the flat tree binary?   Either  
> creating a python binding to libfdt or even just parsing the dtb  
> directly?
>
> I have written perl code to parse a dtb and query it for nodes and  
> properties, it wasn't too bad.  I need to look at a bug report by  
> another user and comment it, then I should seek the okays post it.   
> It is currently read-only and iterative callback based (like the  
> kernels early-scan-flat-tree stuff), but I have planned creating a  
> tree for querying, editing, and re-flattening.  Perl strings are  
> counted length binary blobs, so property contents are interpreted  
> with pack and unpack.  The library has been used to search a dtb to  
> build a list of cpu instances and memory blocks, and it has been  
> used to query the properties of a known node in the tree.
>
> milton
>
> _______________________________________________
> devicetree-discuss mailing list
> devicetree-discuss@ozlabs.org
> https://ozlabs.org/mailman/listinfo/devicetree-discuss
>
Milton Miller Nov. 11, 2008, 3:54 p.m. UTC | #4
On Nov 10, 2008, at 11:00 AM, Jimi Xenidis wrote:
> On Nov 10, 2008, at 10:11 AM, Milton Miller wrote:
>> On 2008-11-07 at 02:31:40, David Gibson wrote:
>>> On Thu, Nov 06, 2008 at 06:55:44PM +1100, Michael Ellerman wrote:
>>>> This commit adds an output format, which produces python
>>>> code. When run, the python produces a data structure that
>>>> can then be inspected in order to do various things.
>> ...
>>>> I'm not sure if this is generally useful (or sane) but it was for 
>>>> me so
>>>> I thought I'd post it.
>>>
>>> Hrm, well the idea of langauge source output seems reasonable.  But
>>> the actual data structure emitted, and the method of construction in
>>> Python both seem a bit odd to me.
>>>
>>>> I have a dts that I want to use to configure a simulator, and this
>>>> seemed like the nicest way to get there. dtc spits out the 
>>>> pythonised
>>>> device tree, and then I have a 10 line python script that does the
>>>> configuring.
>>
>> [snip]
>>> These branches also result in the value having different Python types
>>> depending on the context.  That's not necessarily a bad thing, but
>>> since which Python type is chosen depends on a heuristic only, it
>>> certainly needs some care.  You certainly need to be certain that you
>>> can always deduce the exact, byte-for-byte correct version of the
>>> property value from whatever you put into the Python data structure.
...
>>> So, emitting Python procedural code to build up the data structure,
>>> rather than a great big Python literal that the Python parser will
>>> just turn into the right thing seems a bit of a roundabout way of
>>> doing this.
>>
>> I would think so too.   I haven't looked at the output, only at 
>> Davids comments.  If the data structure is ambiguous, then I do think 
>> more thought is needed.
>
> There is value in the DTC (optionally) emitting a python library and 
> then having the DTC result use it.

I'm not sure what you are trying to say here, Jimi.   Are you asking 
that dtc emit dtlib.py?  And then have it parse the python later?

> It would allow for python to easily, at runtime, be able to modify the 
> contents and not have to inline-edit, emit, compile a DTS.

Are you saying you want to modify a device tree in some python-specific 
syntax, and just dump it and have dtc understand that format so we 
don't have to translate to a dts?

Admittedly this is not the impression I got when I interrogated you 
over chat.  But its still how I'm parsing this email.

> BTW: it would also be nice if the python library to dump the dts (or 
> even dtb)

Ok so you want to see the standard output too.

>> Have you considered just parsing the flat tree binary?   Either 
>> creating a python binding to libfdt or even just parsing the dtb 
>> directly?

I know that just parsing the dtb in python (and even changing and 
emitting a changed dtb) will be easier than teaching dtc to read 
something that looks like python code.  Because I have written the perl 
and have dabbled in others python code (but I don't plan on writing the 
python version).


Based on my experience with parsing dtb in perl, I think handling 
property conversion in python, where one can explicitly request type 
conversion by how one intends to use the property value, is preferable 
to emitting the data structure in another language and relying on 
heuristics to guess the right type based on its value.   I'm saying 
lets add decode_string / decode_int (direct translations to pack and 
unpack, or just call them explicitly) to interpret the properties 
rather than expect a translated python string but get a byte array 
because it had some special character, or worse expect an integer or 
byte array but get a string because its value happened to look like a 
string.  Doing these heuristics when creating a dts is ok because the 
result will still compile correctly back to a dtb -- it just makes it 
harder for the human to read, not the machine to parse, but expecting 
another language environment to use the result without having 
encode/decode available is likely to lead to data dependent bugs.

So then the question becomes what is the value to emit a python tree 
structure more natively for python to read  versus decoding dtb and 
building the tree in python?

milton
diff mbox

Patch

diff --git a/Makefile.dtc b/Makefile.dtc
index bece49b..92164de 100644
--- a/Makefile.dtc
+++ b/Makefile.dtc
@@ -12,6 +12,7 @@  DTC_SRCS = \
 	livetree.c \
 	srcpos.c \
 	treesource.c \
+	python.c \
 	util.c
 
 DTC_GEN_SRCS = dtc-lexer.lex.c dtc-parser.tab.c
diff --git a/dtc.c b/dtc.c
index 84bee2d..496aebf 100644
--- a/dtc.c
+++ b/dtc.c
@@ -92,6 +92,7 @@  static void  __attribute__ ((noreturn)) usage(void)
 	fprintf(stderr, "\t\t\tdts - device tree source text\n");
 	fprintf(stderr, "\t\t\tdtb - device tree blob\n");
 	fprintf(stderr, "\t\t\tasm - assembler source\n");
+	fprintf(stderr, "\t\t\tpy  - python source\n");
 	fprintf(stderr, "\t-V <output version>\n");
 	fprintf(stderr, "\t\tBlob version to produce, defaults to %d (relevant for dtb\n\t\tand asm output only)\n", DEFAULT_FDT_VERSION);
 	fprintf(stderr, "\t-R <number>\n");
@@ -219,6 +220,8 @@  int main(int argc, char *argv[])
 		dt_to_blob(outf, bi, outversion);
 	} else if (streq(outform, "asm")) {
 		dt_to_asm(outf, bi, outversion);
+	} else if (streq(outform, "py")) {
+		dt_to_python(outf, bi, outversion);
 	} else if (streq(outform, "null")) {
 		/* do nothing */
 	} else {
diff --git a/dtc.h b/dtc.h
index 5cb9f58..45252fe 100644
--- a/dtc.h
+++ b/dtc.h
@@ -237,6 +237,7 @@  void process_checks(int force, struct boot_info *bi);
 
 void dt_to_blob(FILE *f, struct boot_info *bi, int version);
 void dt_to_asm(FILE *f, struct boot_info *bi, int version);
+void dt_to_python(FILE *f, struct boot_info *bi, int version);
 
 struct boot_info *dt_from_blob(const char *fname);
 
diff --git a/python.c b/python.c
new file mode 100644
index 0000000..8ad0433
--- /dev/null
+++ b/python.c
@@ -0,0 +1,129 @@ 
+/*
+ * (C) Copyright David Gibson <dwg@au1.ibm.com>, IBM Corporation.  2005.
+ * (C) Copyright Michael Ellerman, IBM Corporation.  2008.
+ *
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ *  General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
+ *                                                                   USA
+ */
+
+#include "dtc.h"
+#include "srcpos.h"
+
+
+static void write_propval_cells(FILE *f, struct property *prop)
+{
+	cell_t *cp = (cell_t *)prop->val.val;
+	int i;
+
+	fprintf(f, "    p = Property('%s', [", prop->name);
+
+	for (i = 0; i < prop->val.len / sizeof(cell_t); i++)
+		fprintf(f, "0x%x, ", fdt32_to_cpu(cp[i]));
+
+	fprintf(f, "])\n");
+}
+
+static int isstring(char c)
+{
+	return (isprint(c)
+		|| (c == '\0')
+		|| strchr("\a\b\t\n\v\f\r", c));
+}
+
+static void write_property(FILE *f, struct property *prop)
+{
+	const char *p = prop->val.val;
+	int i, strtype, len = prop->val.len;
+
+	if (len == 0) {
+		fprintf(f, "    p = Property('%s', None)\n", prop->name);
+		goto out;
+	}
+
+	strtype = 1;
+	for (i = 0; i < len; i++) {
+		if (!isstring(p[i])) {
+			strtype = 0;
+			break;
+		}
+	}
+
+	if (strtype)
+		fprintf(f, "    p = Property('%s', '%s')\n", prop->name,
+			prop->val.val);
+	else if (len == 4)
+		fprintf(f, "    p = Property('%s', 0x%x)\n", prop->name,
+			fdt32_to_cpu(*(const cell_t *)p));
+	else
+		write_propval_cells(f, prop);
+	
+out:
+	fprintf(f, "    n.properties.append(p)\n");
+}
+
+static void write_tree_source_node(FILE *f, struct node *tree, int level)
+{
+	char name[MAX_NODENAME_LEN + 1] = "root";
+	struct property *prop;
+	struct node *child;
+
+	if (tree->name && (*tree->name))
+		strncpy(name, tree->name, MAX_NODENAME_LEN);
+
+	fprintf(f, "    n = Node('%s', parents[-1])\n", name);
+
+	if (level > 0)
+		fprintf(f, "    parents[-1].children.append(n)\n");
+	else
+		fprintf(f, "    root = n\n");
+
+	for_each_property(tree, prop)
+		write_property(f, prop);
+
+	fprintf(f, "    parents.append(n)\n");
+
+	for_each_child(tree, child) {
+		write_tree_source_node(f, child, level + 1);
+	}
+
+	fprintf(f, "    parents.pop()\n");
+}
+
+
+static char *header = "#!/usr/bin/python\n\
+\n\
+class Node(object):\n\
+    def __init__(self, name, parent, unitaddr=None):\n\
+        self.__dict__.update(locals())\n\
+        self.children = []\n\
+        self.properties = []\n\
+\n\
+class Property(object):\n\
+    def __init__(self, name, value):\n\
+        self.__dict__.update(locals())\n\
+";
+
+void dt_to_python(FILE *f, struct boot_info *bi, int version)
+{
+	fprintf(f, "%s\n", header);
+	fprintf(f, "def generate_tree():\n");
+	fprintf(f, "    parents = [None]\n");
+
+	write_tree_source_node(f, bi->dt, 0);
+
+	fprintf(f, "    root.version = %d\n", version);
+	fprintf(f, "    return root\n");
+}