[cgiapp] file uploads and encodings

Nicholas Bamber nicholas at periapt.co.uk
Tue Oct 5 12:21:56 EDT 2010


Todd,

You may find Test::CGI::Multipart useful in testing code for this 
situation. I wrote it because I found testing file upload so impossibly 
difficult. However I have to admit that it has not seen the sort of 
situation you describe, and that is just the sort of situation that will 
break it. But assuming we can iron out any bugs of that sort you should 
be able to replicate all sorts of situations.

Nicholas


cgiapp-request at lists.erlbaum.net wrote:
> Send cgiapp mailing list submissions to
> 	cgiapp at lists.erlbaum.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://www.erlbaum.net/mailman/listinfo/cgiapp
> or, via email, send a message with subject or body 'help' to
> 	cgiapp-request at lists.erlbaum.net
>
> You can reach the person managing the list at
> 	cgiapp-owner at lists.erlbaum.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of cgiapp digest..."
>
>
> Today's Topics:
>
>    1. file uploads and encodings (Todd Ross)
>    2. Re: file uploads and encodings (?ohn say??r)
>    3. Re: file uploads and encodings (Michael Peters)
>    4. Re: file uploads and encodings (Joshua Miller)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 4 Oct 2010 13:34:25 -0700 (PDT)
> From: Todd Ross <tar.lists at yahoo.com>
> Subject: [cgiapp] file uploads and encodings
> To: CGI Application Listserv <cgiapp at lists.erlbaum.net>
> Message-ID: <164630.64169.qm at web45908.mail.sp1.yahoo.com>
> Content-Type: text/plain; charset=us-ascii
>
> Hello,
>
> I think I have an impossible problem.  Or at least, it looks dire from where I'm 
> sitting.
>
> I support a website that accepts file uploads.  I accept uploads of all types 
> from text/plain (csv) to image/jpeg to application/pdf; it's currently 
> unconstrained.  The file upload happens over a very typical setup of:
>
> <form enctype="multipart/form-data" method="post">
>     <input type="file" name="my_file">
> </form>
>
> using CGI.pm for the form processing on the server.
>
> Most file uploads are routed elsewhere for processing.  One of our targets is a 
> COBOL application on z/OS and we need to perform some platform conversion.  
> Namely, we need to convert text/plain files to EBCDIC.
>
> In order to convert _to_ EBCDIC, I need to know what I'm converting _from_.  And 
> therein lies my impossible problem; how does one determine the encoding of a 
> file upload?  The browser does provide some information in the form of the file 
> name and the mime type but neither would indicate whether the (text/plain) file 
> was encoded with ISO-8859-1 or UTF-8 or something else entirely.
>
> These are uploads from a variety of clients running on a variety of platforms, 
> the details of which are largely unknown to me.  Consequently, I'm reluctant to 
> assume any particular character encoding.
>
> I can't imagine a character encoding field (or prompt) as being effective.  My 
> users are business users not computer specialists.  They might be responsible 
> for uploading the file, but they probably aren't responsible for creating it in 
> the first place.
>
> Thoughts?
>
> Thanks,
>
> Todd
>
>
>
>       
>
> ------------------------------
>
> Message: 2
> Date: Mon, 04 Oct 2010 17:13:05 -0400
> From: ?ohn say??r <jsaylor at liaison-intl.com>
> Subject: Re: [cgiapp] file uploads and encodings
> To: CGI Application <cgiapp at lists.erlbaum.net>
> Message-ID: <1286226785.6953.12.camel at saylor-linux>
> Content-Type: text/plain; charset="UTF-8"
>
> hola
>
> On Mon, 2010-10-04 at 13:34 -0700, Todd Ross wrote:
>   
>> In order to convert _to_ EBCDIC, I need to know what I'm converting _from_.  And 
>> therein lies my impossible problem; how does one determine the encoding of a 
>> file upload?  The browser does provide some information in the form of the file 
>> name and the mime type but neither would indicate whether the (text/plain) file 
>> was encoded with ISO-8859-1 or UTF-8 or something else entirely.
>>     
>
> i think you have to do this programmatically by examining the characters
> in the file. there may be libraries to do this already somewhere, but i
> have exerted no effort to find them.
>
> as you mention, you can't count on users, and they can [and will] upload
> just about anything.
>
> good luck!
>
>   



More information about the cgiapp mailing list