Re: [intel-vaapi-media] [Intel-gfx] [RFC 2/2] drm/i915: Select engines via class and instance in execbuffer2
by Tvrtko Ursulin
On 18/04/2017 22:10, Chris Wilson wrote:
> On Tue, Apr 18, 2017 at 05:56:15PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
>>
>> Building on top of the previous patch which exported the concept
>> of engine classes and instances, we can also use this instead of
>> the current awkward engine selection uAPI.
>>
>> This is primarily interesting for the VCS engine selection which
>> is a) currently done via disjoint set of flags, and b) the
>> current I915_EXEC_BSD flags has different semantics depending on
>> the underlying hardware which is bad.
>>
>> Proposed idea here is to reserve 16-bits of flags, to pass in
>> the engine class and instance (8 bits each), and a new flag
>> named I915_EXEC_CLASS_INSTACE to tell the kernel this new engine
>> selection API is in use.
>>
>> The new uAPI also removes access to the weak VCS engine
>> balancing as currently existing in the driver.
>>
>> Example usage to send a command to VCS0:
>>
>> eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 0);
>>
>> Or to send a command to VCS1:
>>
>> eb.flags = i915_execbuffer2_engine(DRM_I915_ENGINE_CLASS_VIDEO_DECODE, 1);
>
> To save a bit of space, we can use the ring selector as a class selector
> if bit18 is set, with 19-27 as instance. That limits us to 64 classes -
> hopefully not a problem for near future. At least I might have you sold
> you on a flexible execbuf3 by then.
I was considering re-using those bits yes. I was thinking about the pro
of keeping it completely separate but I suppose there is not much value
in that. So I can re-use the ring selector just as well and have a
smaller impact on number of bits left over.
> (As a digression, some cryptic notes for an implementation I did over Easter:
> /*
> * Execbuf3!
> *
> * ringbuffer
> * - per context
> * - per engine
We have this already so I am missing something I guess.
> * - PAGE_SIZE ctl [ro head, rw tai] + user pot
> * - kthread [i915/$ctx-$engine] (optional?)
No idea what these two are. :)
> * - assumes NO_RELOC-esque awareness
Ok ok NO_RELOC. :)
> *
> * SYNC flags [wait/signal], handle [semaphore/fence]
Sync fence in out just as today, but probably more?
> *
> * BIND handle, offset [user provided]
> * ALLOC[32,64] handle, flags, *offset [kernel provided, need RELOC]
> * RELOC[32,64] handle, target_handle, offset, delta
> * CLEAR flags, handle
> * UNBIND handle
Explicit VMA management? Separate ioctl maybe would be better?
> *
> * BATCH flags, handle, offset
> * [or SVM flags, address]
> * PIN flags (MAY_RELOC), count, handle[count]
> * FENCE flags, count, handle[count]
> * SUBMIT handle [fence/NULL with error]
> */
No idea again. :)
> At the moment it is just trying to do execbuf2, but more compactly and
> with fewer ioctls. But one of the main selling points is that we can
> extend the information passed around more freely than execbuf2.)
I have nothing against a better eb since I trust you know much better it
is needed and when. But I don't know how long it will take to get there.
This class/instance idea could be implemented quickly to solve the sore
point of VCS/VCS2 engine selection. But yeah, it is another uABI to keep
in that case.
Regards,
Tvrtko
3 years, 10 months
Re: [intel-vaapi-media] [Intel-gfx] [Mesa-dev] [RFC 1/2] drm/i915: Engine discovery uAPI
by Tvrtko Ursulin
On 19/04/2017 06:22, Kenneth Graunke wrote:
> On Tuesday, April 18, 2017 9:56:14 AM PDT Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
>>
>> Engine discovery uAPI allows userspace to probe for engine
>> configuration and features without needing to maintain the
>> internal PCI id based database.
>
> I don't understand why I would want to query the existence of engines
> from the kernel. As a userspace driver developer, I have to know how to
> program the specific generation of hardware I'm on. I better know what
> engines that GPU has, or else there's no way I can competently program it.
>
> In Mesa, we recently imported libdrm and deleted all the engine checks
> (a460e1eb51406e5ca54abda42112bfb8523ff046). All generations have an
> RCS, Gen6+ has a separate BLT, and we don't need to use the others.
> It's completely statically determinable with a simple check. Runtime
> checks make sense for optional things...but not "is there a 3D engine?".
>
> Plus, even if you add this to the kernel, we still support back to 3.6
> (and ChromeOS needs us to continue supporting 3.8), so we won't be able
> to magically use the new uABI - we'd need to support both. Which, if
> the point is to delete code...we'd actually have /more/ code for a few
> years. Or, we could not use it...but then nobody would be testing it,
> and if a bug creeps in...that pushes it back more years still.
Okay the argument of more code for a while is I suppose always true with
these types of work. But in general, the idea is to consolidate the
queries and avoid (may be only partially) duplicate PCI databases across
components.
Because I suspect today you do some device discovery via libpciaccess
(or something) and some via i915 GET_PARAMs and so. So the idea is to
consolidate all that and do it via i915. Since another argument you
raise, is that you have to know how does the GPU looks like to be able
to competently program it, in which case who knows better than the
kernel driver?
But I think the main part of the argument is that why collect and derive
this information from various sources when perhaps it could only be one.
Maybe the exact idea is not so interesting for Mesa, which I wouldn't be
surprised at all since the idea was born from libva and the BSD engine
usage there. In which case perhaps Mesa could become more interested if
the proposal was exporting some other data to userspace?
I don't think it is critical to find something like that in Mesa, but it
may be interesting. I think Ben mentioned at one point he had some ideas
in this area, or something was discussed in the past which may be
similar. I forgot the exact details now.
So I think for now, if there is nothing which would be interesting in
Mesa along the lines described so far, please just keep an eye on this.
Just to make sure if some other component will be interested, and we end
up starting to implement something, it is at least not hindering you, if
we cannot find anything useful for you in here.
Regards,
Tvrtko
3 years, 10 months
Re: [intel-vaapi-media] [Intel-gfx] [RFC 1/2] drm/i915: Engine discovery uAPI
by Tvrtko Ursulin
On 18/04/2017 21:13, Chris Wilson wrote:
> On Tue, Apr 18, 2017 at 05:56:14PM +0100, Tvrtko Ursulin wrote:
>> +enum drm_i915_gem_engine_class {
>> + DRM_I915_ENGINE_CLASS_OTHER = 0,
>> + DRM_I915_ENGINE_CLASS_RENDER = 1,
>> + DRM_I915_ENGINE_CLASS_COPY = 2,
>> + DRM_I915_ENGINE_CLASS_VIDEO_DECODE = 3,
>> + DRM_I915_ENGINE_CLASS_VIDEO_ENHANCE = 4,
>> + DRM_I915_ENGINE_CLASS_MAX /* non-ABI */
>> +};
>> +
>> +struct drm_i915_engine_info {
>> + /** Engine instance number. */
>> + __u32 instance;
>> + __u32 rsvd;
>> +
>> + /** Engine specific info. */
>> +#define DRM_I915_ENGINE_HAS_HEVC BIT(0)
>> + __u64 info;
>> +};
>
> So the main question is how can we extend this in future, keeping
> forwards/backwards compat?
>
> I think if we put a query version into info and then the kernel supplies
> an array matching that version (or reports the most recent version
> supported if the request is too modern.)
>
> The kernel has to keep all the old struct variants and exporters
> indefinitely.
Versioning sounds good to me.
> Another alternative would get an ENGINE_GETPARAM where we just have a
> switch of all possibily questions. Maybe better as a CONTEXT_GETPARAM if
> we start thinking about allowing CONTEXT_SETPARAM to fine tune
> individual clients.
This idea I did not get - what is the switch of all possible questions?
You mean new ioctl like ENGINE_GETPARAM which would return a list of
queries supported by CONTEXT_GETPARAM? Which would effectively be a
dispatcher-in-dispatcher kind of thing?
>> +struct drm_i915_gem_engine_info {
>> + /** in: Engine class to probe (enum drm_i915_gem_engine_class). */
>> + __u32 engine_class;
>
> __u32 [in/out] version ? (see above)
>
>> +
>> + /** out: Actual number of hardware engines. */
>> + __u32 num_engines;
>> +
>> + /**
>> + * in: Number of struct drm_i915_engine_ifo entries in the provided
>> + * info array.
>> + */
>> + __u32 info_size;
>
> This is superfluous with num_engines. The standard 2 pass, discovery of
> size, followed by allocation and final query.
This is also fine. I was one the fence whether to actually condense it
to one field in the first posting or not myself.
Regards,
Tvrtko
3 years, 10 months
Re: [intel-vaapi-media] high GPU resources with VAAPI
by Sreerenj
Hi,
On 04/13/2017 10:25 AM, Sreerenj wrote:
>
>
>
> On 04/13/2017 06:42 AM, Benjamin Dreshaj wrote:
>> Hi Sree,
>>
>> We found the issue i think it's a bug from Gstreamer,
>> it is making interlaced content double the frame rate because my
>> source is 25 FPS and the output after vaapi encode comes out at 50
>> FPS so that is causing the high load.da
>
> gstremaer-vaapi do the deinterlacing by default, You should disable it
> manually if you don't require the deinterlacing.
> But the vaapi-intel-driver not supporting the interlaced encoding :)
>
>>
>> I will report it with GStreamer now, you close this issue on your end
>> or ignore.
>>
>> Thanks for your respond
>>
>> Best Regards
>> Ben
>>
>>
>>
>>
>> On Wed, Apr 12, 2017 at 2:37 PM, Sreerenj
>> <sreerenj.balachandran(a)intel.com
>> <mailto:sreerenj.balachandran@intel.com>> wrote:
>>
>> Hi Benjamin,
>>
>> If I correctly understood the parameters of intel-gpu-top, the
>> "render-busy" is including the shared VME block.
>>
>> If you set the "tune=low-power" , the the value you see in
>> "render-busy" should go down significantly (because there is
>> fixed function for doing vme)
>>
>> The MSDK driver you use might be using the low-power mode by
>> default, I dont know :)
>>
>> Also would be good to test with and without scaling or whatever
>> postprocessing .
>>
>>
>>
>> On 04/10/2017 03:03 PM, Benjamin Dreshaj wrote:
>>>
>>> Writing in this email list because i was told by Gstreamer-VAAPI
>>> devs that this is a driver issue, it has nothing to do with
>>> Gstreamer
>>> When transcoding live stream with Gstreamer-VAAPI a lot more of
>>> GPU resources are being used then when using Gstreamer-MSDK.
>>>
>>> Live Transcode VAAPI: GPU 27%
>>> Live Transcode MSDK: GPU 6%
>>>
>>> VAAPI seems to be using 4 times more GPU then MSDK.
>>>
>>> Please see the attached image I took screenshot of both servers.
>>>
>>> Environment:
>>>
>>> 2x identical servers with Xeon(R) CPU E3-1245 v5 SKYLAKE
>>>
>>> Server 1 with VAAPI: Ubuntu 16.04
>>> vainfo: Driver version: Intel i965 driver for Intel(R) Skylake -
>>> 1.8.2.pre1 (1.7.3-372-g2f0a844)
>>>
>>> Gstreamer-VAAPI Pipeline:
>>> gst-launch-1.0 souphttpsrc
>>> location="http://localhost:80/oranews_HD/mpegts
>>> <http://localhost:80/oranews_HD/mpegts>" is-live=true ! tsdemux
>>> name=demux ! queue max-size-buffers=1200 max-size-buffers=0
>>> max-size-time=0 ! \
>>> h264parse ! vaapih264dec ! vaapipostproc width=1280 height=720 !
>>> vaapih264enc rate-control=2 bitrate=1700 ! h264parse ! \
>>> flvmux streamable=true name=mux ! rtmpsink
>>> location="rtmp://localhost:1935/pushrtmp/vappi_outs live=1"
>>> demux. ! queue max-size-buffers=1200 max-size-buffers=0
>>> max-size-time=0 ! \
>>> mpegaudioparse ! queue ! avdec_mp2float plc=true ! audioconvert
>>> ! queue ! voaacenc bitrate=128000 ! mux.
>>>
>>>
>>>
>>> Server 2 with MSDK: Centos 7.2
>>> Media Server Studio 2017
>>> vainfo: VA-API version: 0.99 (libva 1.67.0.pre1)
>>>
>>> Gstreamer-MSDK Pipeline:
>>> gst-launch-1.0 souphttpsrc
>>> location="http://localhost:80/oranews_HD/mpegts
>>> <http://localhost:80/oranews_HD/mpegts>" is-live=true ! tsdemux
>>> name=demux ! queue max-size-buffers=1200 max-size-buffers=0
>>> max-size-time=0 \
>>> ! h264parse ! mfxh264dec ! mfxvpp width=1280 height=720 !
>>> mfxh264enc rate-control=1 bitrate=1700 ! flvmux streamable=true
>>> name=mux ! rtmpsink
>>> location="rtmp://localhost:1935/pushrtmp/mfx_out live=1" demux. \
>>> ! queue max-size-buffers=1200 max-size-buffers=0 max-size-time=0
>>> ! mpegaudioparse ! queue ! avdec_mp2float plc=true !
>>> audioconvert ! queue ! voaacenc bitrate=128000 ! mux.
>>> --
>>> Benjamin DRESHAJ - CTO
>>> SetPlex LLC.
>>> Address. 2320 Arthur Ave - Bronx, NY 10458 - USA
>>> Tel. ++1 718.213.4282 <tel:%28718%29%20213-4282> - Fax. ++1
>>> 718.701.4407 <tel:%28718%29%20701-4407>
>>> Mobile. ++1 646.283.3439 <tel:%28646%29%20283-3439> - Mail.
>>> ben(a)ftamarket.com <mailto:ben@ftamarket.com>
>>> Web. www.setplex.com <http://www.setplex.com> - www.tvalb.com
>>> <http://www.tvalb.com> - www.italotv.com <http://www.italotv.com>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------
>>> This message contains confidential information and is intended
>>> only for the intended recipient(s). If you
>>> are not the named recipient you should not read, distribute or
>>> copy this e-mail. Please notify the sender
>>> immediately via e-mail if you have received this e-mail by
>>> mistake; then, delete this e-mail from your system.
>>> ------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> intel-vaapi-media mailing list
>>> intel-vaapi-media(a)lists.01.org
>>> <mailto:intel-vaapi-media@lists.01.org>
>>> https://lists.01.org/mailman/listinfo/intel-vaapi-media
>>> <https://lists.01.org/mailman/listinfo/intel-vaapi-media>
>>
>> --
>> Thanks
>> Sree
>>
>> _______________________________________________ intel-vaapi-media
>> mailing list intel-vaapi-media(a)lists.01.org
>> <mailto:intel-vaapi-media@lists.01.org>
>> https://lists.01.org/mailman/listinfo/intel-vaapi-media
>> <https://lists.01.org/mailman/listinfo/intel-vaapi-media>
>>
>> --
>> Benjamin DRESHAJ - CTO SetPlex LLC. Address. 1 Barker Avenue, suite
>> 290, White Plains, NY, 10601 Tel. +1-855-SETPLEX ext. 104 - Fax. ++1
>> 718.701.4407 Mobile. ++1 646.283.3439 - Mail. ben(a)ftamarket.com
>> <mailto:ben@ftamarket.com>, ben(a)setplex.com <mailto:ben@setplex.com>
>> Web. www.setplex.com <http://www.setplex.com> - www.tvalb.com
>> <http://www.tvalb.com> - www.italotv.com <http://www.italotv.com>
>> ------------------------------------------------------------------------------------------------------------------------------------------------
>> This message contains confidential information and is intended only
>> for the intended recipient(s). If you are not the named recipient you
>> should not read, distribute or copy this e-mail. Please notify the
>> sender immediately via e-mail if you have received this e-mail by
>> mistake; then, delete this e-mail from your system.
>> ------------------------------------------------------------------------------------------------------------------------------------------------
> --
> Thanks
> Sree
--
Thanks
Sree
3 years, 10 months
high GPU resources with VAAPI
by Benjamin Dreshaj
[image: Inline image 1]Writing in this email list because i was told by
Gstreamer-VAAPI devs that this is a driver issue, it has nothing to do with
Gstreamer
When transcoding live stream with Gstreamer-VAAPI a lot more of GPU
resources are being used then when using Gstreamer-MSDK.
Live Transcode VAAPI: GPU 27%
Live Transcode MSDK: GPU 6%
VAAPI seems to be using 4 times more GPU then MSDK.
Please see the attached image I took screenshot of both servers.
Environment:
2x identical servers with Xeon(R) CPU E3-1245 v5 SKYLAKE
Server 1 with VAAPI: Ubuntu 16.04
vainfo: Driver version: Intel i965 driver for Intel(R) Skylake - 1.8.2.pre1
(1.7.3-372-g2f0a844)
Gstreamer-VAAPI Pipeline:
gst-launch-1.0 souphttpsrc location="http://localhost:80/oranews_HD/mpegts"
is-live=true ! tsdemux name=demux ! queue max-size-buffers=1200
max-size-buffers=0 max-size-time=0 ! \
h264parse ! vaapih264dec ! vaapipostproc width=1280 height=720 !
vaapih264enc rate-control=2 bitrate=1700 ! h264parse ! \
flvmux streamable=true name=mux ! rtmpsink
location="rtmp://localhost:1935/pushrtmp/vappi_outs live=1" demux. ! queue
max-size-buffers=1200 max-size-buffers=0 max-size-time=0 ! \
mpegaudioparse ! queue ! avdec_mp2float plc=true ! audioconvert ! queue !
voaacenc bitrate=128000 ! mux.
Server 2 with MSDK: Centos 7.2
Media Server Studio 2017
vainfo: VA-API version: 0.99 (libva 1.67.0.pre1)
Gstreamer-MSDK Pipeline:
gst-launch-1.0 souphttpsrc location="http://localhost:80/oranews_HD/mpegts"
is-live=true ! tsdemux name=demux ! queue max-size-buffers=1200
max-size-buffers=0 max-size-time=0 \
! h264parse ! mfxh264dec ! mfxvpp width=1280 height=720 ! mfxh264enc
rate-control=1 bitrate=1700 ! flvmux streamable=true name=mux ! rtmpsink
location="rtmp://localhost:1935/pushrtmp/mfx_out live=1" demux. \
! queue max-size-buffers=1200 max-size-buffers=0 max-size-time=0 !
mpegaudioparse ! queue ! avdec_mp2float plc=true ! audioconvert ! queue !
voaacenc bitrate=128000 ! mux.
--
Benjamin DRESHAJ - CTO
SetPlex LLC.
Address. 2320 Arthur Ave - Bronx, NY 10458 - USA
Tel. ++1 718.213.4282 - Fax. ++1 718.701.4407
Mobile. ++1 646.283.3439 - Mail. ben(a)ftamarket.com
Web. www.setplex.com - www.tvalb.com - www.italotv.com
------------------------------------------------------------------------------------------------------------------------------------------------
This message contains confidential information and is intended only for the
intended recipient(s). If you
are not the named recipient you should not read, distribute or copy this
e-mail. Please notify the sender
immediately via e-mail if you have received this e-mail by mistake; then,
delete this e-mail from your system.
------------------------------------------------------------------------------------------------------------------------------------------------
3 years, 10 months